Wichtige Hinweise
Aktivieren Sie die Conda-Umgebung bevor Sie beginnen.
Geben Sie als
NAME
ihr HdM-Kürzel an.Ändern Sie nicht den Namen der Datei und löschen Sie keine Zellen.
Bearbeiten Sie alle Zellen mit dem Hinweis # YOUR CODE HERE
Die Funktion NotImplementedError() soll die Abgabe von leeren Zellen verhindern. Löschen Sie die Funktion, sobald Sie in einer dieser Zellen arbeiten.
Stellen Sie sicher, dass alles wie erwartet läuft, bevor Sie die Prüfung abgeben: Starten Sie den Kernel neu und führen Sie alle Zellen aus: wählen Sie “Restart” und dann “Run All”
Ich wünsche Ihnen viel Erfolg!
NAME = ""
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."
E-Exam Notebook#
Setup#
Importieren Sie die Bibliotheken (Sie werden nicht alle benötigen):
# Nehmen Sie keine Änderungen an dieser Zelle vor
import pandas as pd
import altair as alt
alt.renderers.enable('mimetype')
# Scikit-learn Bibliotheken
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.datasets import make_moons
from sklearn.datasets import make_blobs
from sklearn.cluster import AgglomerativeClustering
# Weitere Hilfsbibliotheken
import io
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
1) Daten#
Daten importieren#
Importieren Sie die Daten:
LINK = "https://raw.githubusercontent.com/kirenz/datasets/master/mini_test_drives.csv"
df = pd.read_csv(LINK)
Verschaffen Sie sich eine Übersicht über die Daten:
df.head()
campaign | spendings | test_drives | exposure | rating | |
---|---|---|---|---|---|
0 | 1 | 10.256 | 330 | 43 | 10 |
1 | 2 | 985.685 | 120 | 28 | 7 |
2 | 3 | 1445.563 | 360 | 35 | 7 |
3 | 4 | 1188.193 | 270 | 33 | 7 |
4 | 5 | 574.513 | 220 | 44 | 5 |
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 campaign 200 non-null int64
1 spendings 200 non-null float64
2 test_drives 200 non-null int64
3 exposure 200 non-null int64
4 rating 200 non-null int64
dtypes: float64(1), int64(4)
memory usage: 7.9 KB
Daten für Clustering vorbereiten#
Führen Sie eine K-Means-Cluster-Analyse durch. Verwenden Sie dafür die Variablen “exposure” und “rating”.
# Daten vorbereiten
scaler = StandardScaler()
X = df[['exposure', 'rating']]
X_std = scaler.fit_transform(X)
X_std
array([[ 1.26645653, 2.32074007],
[ 0.04085344, 0.16525394],
[ 0.61280155, 0.16525394],
[ 0.4493878 , 0.16525394],
[ 1.3481634 , -1.27173682],
[-0.69450842, -1.27173682],
[-0.61280155, -4.14571834],
[-0.4493878 , 1.6022447 ],
[-0.53109467, 0.16525394],
[ 1.02133591, 0.16525394],
[ 0.36768093, 0.16525394],
[-0.61280155, -3.42722296],
[-0.28597406, 0.88374932],
[ 0.85792217, -0.55324144],
[-0.28597406, 0.16525394],
[-0.20426718, -1.27173682],
[ 0.61280155, -1.27173682],
[ 0.69450842, 0.88374932],
[-0.04085344, 0.88374932],
[ 0.4493878 , 0.88374932],
[ 0.4493878 , 0.16525394],
[ 0.04085344, -0.55324144],
[ 0.20426718, 1.6022447 ],
[ 0.53109467, 0.16525394],
[ 1.75669777, 0.16525394],
[ 1.02133591, 0.88374932],
[-0.61280155, -1.9902322 ],
[ 1.18474966, 0.16525394],
[ 0.61280155, 0.88374932],
[-1.59328402, -0.55324144],
[ 1.75669777, 0.16525394],
[-0.69450842, 0.88374932],
[ 1.18474966, -0.55324144],
[-1.75669777, 0.16525394],
[ 0.69450842, -0.55324144],
[ 0.36768093, 0.16525394],
[ 0.04085344, -0.55324144],
[-0.20426718, 0.88374932],
[ 0.53109467, 0.88374932],
[-0.53109467, -0.55324144],
[ 0.53109467, 0.16525394],
[ 2.90059399, 0.16525394],
[ 0.28597406, 0.16525394],
[-0.20426718, 0.16525394],
[ 1.18474966, 0.16525394],
[ 0.77621529, 0.16525394],
[-0.20426718, 0.88374932],
[-0.12256031, 0.16525394],
[ 0.93962904, 0.16525394],
[ 1.51157715, 0.16525394],
[ 0.69450842, -0.55324144],
[-1.26645653, -1.9902322 ],
[-2.08352526, 0.88374932],
[ 0.12256031, 0.88374932],
[ 0.4493878 , 0.88374932],
[ 0.04085344, 0.16525394],
[-1.42987028, -0.55324144],
[ 0.85792217, 0.16525394],
[-0.69450842, -1.9902322 ],
[-1.18474966, -0.55324144],
[ 0.20426718, 0.16525394],
[ 0.85792217, 0.88374932],
[-0.4493878 , 0.88374932],
[-0.36768093, -0.55324144],
[-0.4493878 , 1.6022447 ],
[-0.61280155, 0.16525394],
[-0.77621529, -0.55324144],
[ 0.77621529, 0.16525394],
[-0.93962904, 0.88374932],
[ 0.36768093, 0.88374932],
[-0.12256031, -1.27173682],
[ 2.08352526, 1.6022447 ],
[ 0.04085344, 0.16525394],
[ 0.36768093, 0.16525394],
[-0.28597406, 0.16525394],
[ 0.77621529, -0.55324144],
[ 0.20426718, 0.88374932],
[-0.69450842, 0.16525394],
[ 1.59328402, -0.55324144],
[-0.4493878 , -1.27173682],
[-1.42987028, 0.88374932],
[-2.16523213, -1.9902322 ],
[-2.16523213, -0.55324144],
[ 0.93962904, -0.55324144],
[-1.59328402, -1.27173682],
[ 0.85792217, 0.16525394],
[ 0.61280155, -1.27173682],
[ 1.02133591, -1.27173682],
[-0.4493878 , 0.16525394],
[-0.04085344, 0.88374932],
[ 0.28597406, -0.55324144],
[-0.69450842, 0.16525394],
[-0.28597406, 1.6022447 ],
[-2.16523213, 0.16525394],
[ 0.85792217, 0.88374932],
[-0.12256031, -0.55324144],
[-1.3481634 , -0.55324144],
[ 0.53109467, -0.55324144],
[ 2.24693901, 0.16525394],
[-1.83840464, 0.16525394],
[ 0.53109467, -0.55324144],
[ 0.77621529, 1.6022447 ],
[-1.18474966, 0.88374932],
[-0.36768093, -0.55324144],
[ 2.16523213, -0.55324144],
[-0.77621529, 0.16525394],
[-2.08352526, 0.16525394],
[-1.3481634 , -0.55324144],
[ 0.20426718, 0.88374932],
[-0.4493878 , 0.16525394],
[ 0.69450842, -0.55324144],
[ 0.77621529, 0.88374932],
[-1.51157715, -2.70872758],
[-2.08352526, 0.16525394],
[-1.26645653, 0.88374932],
[-1.83840464, 1.6022447 ],
[-1.10304278, 0.16525394],
[-0.61280155, -1.27173682],
[ 2.41035275, -0.55324144],
[-0.69450842, 0.88374932],
[ 0.61280155, 0.16525394],
[-0.4493878 , -1.27173682],
[-0.93962904, 0.16525394],
[ 2.08352526, 0.16525394],
[-1.92011151, -0.55324144],
[ 0.12256031, -0.55324144],
[ 1.26645653, 0.16525394],
[-0.12256031, 0.16525394],
[ 0.04085344, 0.16525394],
[ 0.77621529, 0.88374932],
[ 0.36768093, -1.27173682],
[ 0.53109467, 0.16525394],
[ 0.20426718, 0.16525394],
[-1.02133591, 0.16525394],
[-0.36768093, 0.16525394],
[ 0.28597406, 0.88374932],
[-0.53109467, -1.27173682],
[ 0.04085344, -4.14571834],
[-0.77621529, 0.16525394],
[ 0.04085344, 0.16525394],
[ 0.77621529, 0.88374932],
[-0.12256031, -0.55324144],
[ 0.20426718, 0.88374932],
[-1.42987028, 0.16525394],
[-0.53109467, 0.16525394],
[-0.77621529, 1.6022447 ],
[-1.10304278, 0.16525394],
[ 0.85792217, 0.16525394],
[ 0.69450842, -0.55324144],
[ 0.36768093, 0.16525394],
[-1.51157715, -1.27173682],
[ 0.93962904, -1.27173682],
[-0.28597406, 0.16525394],
[ 1.42987028, 0.16525394],
[-1.18474966, 0.88374932],
[-0.85792217, -0.55324144],
[ 0.36768093, 0.16525394],
[-0.36768093, -0.55324144],
[-2.24693901, -0.55324144],
[ 0.61280155, -1.27173682],
[-0.12256031, -0.55324144],
[-0.69450842, -0.55324144],
[-0.12256031, 0.16525394],
[ 2.08352526, 0.88374932],
[ 0.12256031, 0.88374932],
[ 0.04085344, 0.16525394],
[-0.85792217, -0.55324144],
[-0.12256031, 0.88374932],
[ 1.18474966, 0.88374932],
[-0.85792217, -0.55324144],
[ 0.69450842, 0.88374932],
[ 1.42987028, 0.16525394],
[-0.61280155, 0.16525394],
[-1.02133591, 0.16525394],
[ 0.04085344, 0.88374932],
[-0.04085344, -0.55324144],
[-0.85792217, 0.88374932],
[ 0.36768093, 0.88374932],
[ 1.51157715, 1.6022447 ],
[ 1.59328402, 0.88374932],
[-0.69450842, -4.14571834],
[-0.4493878 , 0.88374932],
[ 2.24693901, 1.6022447 ],
[ 0.28597406, -1.27173682],
[ 0.93962904, 0.88374932],
[-0.53109467, -0.55324144],
[ 0.69450842, -0.55324144],
[-0.53109467, 0.16525394],
[ 1.3481634 , 0.16525394],
[-0.04085344, 0.16525394],
[-0.93962904, -0.55324144],
[ 0.4493878 , -0.55324144],
[ 0.4493878 , 0.88374932],
[-0.53109467, 0.16525394],
[ 0.61280155, 1.6022447 ],
[-0.12256031, 0.16525394],
[-1.10304278, -0.55324144],
[ 0.53109467, -0.55324144],
[-1.3481634 , 0.88374932],
[-0.61280155, 1.6022447 ]])
df_std = pd.DataFrame(X_std, columns = ['exposure','rating'])
df_std
exposure | rating | |
---|---|---|
0 | 1.266457 | 2.320740 |
1 | 0.040853 | 0.165254 |
2 | 0.612802 | 0.165254 |
3 | 0.449388 | 0.165254 |
4 | 1.348163 | -1.271737 |
... | ... | ... |
195 | -0.122560 | 0.165254 |
196 | -1.103043 | -0.553241 |
197 | 0.531095 | -0.553241 |
198 | -1.348163 | 0.883749 |
199 | -0.612802 | 1.602245 |
200 rows × 2 columns
2) Explorative Analyse#
3a) Erzeugen Sie eine Abbildung, in welcher Sie den Zusammenhang zwischen den beiden Variablen darstellen. Wählen Sie eine geeignete Darstellungsform.
# YOUR CODE HERE
alt.Chart(df_std).mark_point().encode(
x='exposure',
y='rating'
)
<VegaLite 4 object>
If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/troubleshooting.html
# Nehmen Sie keine Änderungen an dieser Zelle vor
# Nehmen Sie keine Änderungen an dieser Zelle vor
# Nehmen Sie keine Änderungen an dieser Zelle vor
# Nehmen Sie keine Änderungen an dieser Zelle vor
3) Clustering#
Führen Sie die Cluster-Analyse durch:
kmeans = KMeans(n_clusters=4, n_init=10)
kmeans.fit(X_std)
KMeans(n_clusters=4, n_init=10)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
KMeans(n_clusters=4, n_init=10)
y_kmeans = kmeans.predict(X_std)
y_kmeans
array([1, 2, 1, 1, 0, 0, 3, 2, 2, 1, 1, 3, 2, 0, 2, 0, 0, 1, 2, 1, 1, 0,
1, 1, 1, 1, 3, 1, 1, 2, 1, 2, 0, 2, 0, 1, 0, 2, 1, 0, 1, 1, 1, 2,
1, 1, 2, 2, 1, 1, 0, 3, 2, 1, 1, 2, 2, 1, 3, 2, 1, 1, 2, 0, 2, 2,
2, 1, 2, 1, 0, 1, 2, 1, 2, 0, 1, 2, 1, 0, 2, 3, 2, 0, 2, 1, 0, 0,
2, 2, 0, 2, 2, 2, 1, 0, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 2, 2, 1, 2,
0, 1, 3, 2, 2, 2, 2, 0, 1, 2, 1, 0, 2, 1, 2, 0, 1, 2, 2, 1, 0, 1,
1, 2, 2, 1, 0, 3, 2, 2, 1, 0, 1, 2, 2, 2, 2, 1, 0, 1, 2, 0, 2, 1,
2, 2, 1, 0, 2, 0, 0, 2, 2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 1, 0,
2, 1, 1, 1, 3, 2, 1, 0, 1, 0, 0, 2, 1, 2, 2, 0, 1, 2, 1, 2, 2, 0,
2, 2], dtype=int32)
df['cluster'] = y_kmeans
df
campaign | spendings | test_drives | exposure | rating | cluster | |
---|---|---|---|---|---|---|
0 | 1 | 10.256 | 330 | 43 | 10 | 1 |
1 | 2 | 985.685 | 120 | 28 | 7 | 2 |
2 | 3 | 1445.563 | 360 | 35 | 7 | 1 |
3 | 4 | 1188.193 | 270 | 33 | 7 | 1 |
4 | 5 | 574.513 | 220 | 44 | 5 | 0 |
... | ... | ... | ... | ... | ... | ... |
195 | 196 | 910.851 | 190 | 26 | 7 | 2 |
196 | 197 | 888.569 | 240 | 14 | 6 | 2 |
197 | 198 | 800.615 | 250 | 34 | 6 | 0 |
198 | 199 | 1500.000 | 230 | 11 | 8 | 2 |
199 | 200 | 785.694 | 110 | 20 | 9 | 2 |
200 rows × 6 columns
# Nehmen Sie keine Änderungen an dieser Zelle vor
Glückwunsch! Dies war die letzte Aufgabe in dem Notebook.