Wichtige Hinweise

  • Aktivieren Sie die Conda-Umgebung bevor Sie beginnen.

  • Geben Sie als NAME ihr HdM-Kürzel an.

  • Ändern Sie nicht den Namen der Datei und löschen Sie keine Zellen.

  • Bearbeiten Sie alle Zellen mit dem Hinweis # YOUR CODE HERE

  • Die Funktion NotImplementedError() soll die Abgabe von leeren Zellen verhindern. Löschen Sie die Funktion, sobald Sie in einer dieser Zellen arbeiten.

  • Stellen Sie sicher, dass alles wie erwartet läuft, bevor Sie die Prüfung abgeben: Starten Sie den Kernel neu und führen Sie alle Zellen aus: wählen Sie “Restart” und dann “Run All”

Ich wünsche Ihnen viel Erfolg!

NAME = ""
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."

E-Exam Notebook#

Setup#

Importieren Sie die Bibliotheken (Sie werden nicht alle benötigen):

# Nehmen Sie keine Änderungen an dieser Zelle vor

import pandas as pd
import altair as alt
alt.renderers.enable('mimetype')

# Scikit-learn Bibliotheken

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.datasets import make_moons
from sklearn.datasets import make_blobs
from sklearn.cluster import AgglomerativeClustering


# Weitere Hilfsbibliotheken
import io
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

1) Daten#

Daten importieren#

  • Importieren Sie die Daten:

LINK = "https://raw.githubusercontent.com/kirenz/datasets/master/mini_test_drives.csv"

df = pd.read_csv(LINK)

Verschaffen Sie sich eine Übersicht über die Daten:

df.head()
campaign spendings test_drives exposure rating
0 1 10.256 330 43 10
1 2 985.685 120 28 7
2 3 1445.563 360 35 7
3 4 1188.193 270 33 7
4 5 574.513 220 44 5
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   campaign     200 non-null    int64  
 1   spendings    200 non-null    float64
 2   test_drives  200 non-null    int64  
 3   exposure     200 non-null    int64  
 4   rating       200 non-null    int64  
dtypes: float64(1), int64(4)
memory usage: 7.9 KB

Daten für Clustering vorbereiten#

Führen Sie eine K-Means-Cluster-Analyse durch. Verwenden Sie dafür die Variablen “exposure” und “rating”.

# Daten vorbereiten
scaler = StandardScaler()
X = df[['exposure', 'rating']]
X_std = scaler.fit_transform(X)
X_std
array([[ 1.26645653,  2.32074007],
       [ 0.04085344,  0.16525394],
       [ 0.61280155,  0.16525394],
       [ 0.4493878 ,  0.16525394],
       [ 1.3481634 , -1.27173682],
       [-0.69450842, -1.27173682],
       [-0.61280155, -4.14571834],
       [-0.4493878 ,  1.6022447 ],
       [-0.53109467,  0.16525394],
       [ 1.02133591,  0.16525394],
       [ 0.36768093,  0.16525394],
       [-0.61280155, -3.42722296],
       [-0.28597406,  0.88374932],
       [ 0.85792217, -0.55324144],
       [-0.28597406,  0.16525394],
       [-0.20426718, -1.27173682],
       [ 0.61280155, -1.27173682],
       [ 0.69450842,  0.88374932],
       [-0.04085344,  0.88374932],
       [ 0.4493878 ,  0.88374932],
       [ 0.4493878 ,  0.16525394],
       [ 0.04085344, -0.55324144],
       [ 0.20426718,  1.6022447 ],
       [ 0.53109467,  0.16525394],
       [ 1.75669777,  0.16525394],
       [ 1.02133591,  0.88374932],
       [-0.61280155, -1.9902322 ],
       [ 1.18474966,  0.16525394],
       [ 0.61280155,  0.88374932],
       [-1.59328402, -0.55324144],
       [ 1.75669777,  0.16525394],
       [-0.69450842,  0.88374932],
       [ 1.18474966, -0.55324144],
       [-1.75669777,  0.16525394],
       [ 0.69450842, -0.55324144],
       [ 0.36768093,  0.16525394],
       [ 0.04085344, -0.55324144],
       [-0.20426718,  0.88374932],
       [ 0.53109467,  0.88374932],
       [-0.53109467, -0.55324144],
       [ 0.53109467,  0.16525394],
       [ 2.90059399,  0.16525394],
       [ 0.28597406,  0.16525394],
       [-0.20426718,  0.16525394],
       [ 1.18474966,  0.16525394],
       [ 0.77621529,  0.16525394],
       [-0.20426718,  0.88374932],
       [-0.12256031,  0.16525394],
       [ 0.93962904,  0.16525394],
       [ 1.51157715,  0.16525394],
       [ 0.69450842, -0.55324144],
       [-1.26645653, -1.9902322 ],
       [-2.08352526,  0.88374932],
       [ 0.12256031,  0.88374932],
       [ 0.4493878 ,  0.88374932],
       [ 0.04085344,  0.16525394],
       [-1.42987028, -0.55324144],
       [ 0.85792217,  0.16525394],
       [-0.69450842, -1.9902322 ],
       [-1.18474966, -0.55324144],
       [ 0.20426718,  0.16525394],
       [ 0.85792217,  0.88374932],
       [-0.4493878 ,  0.88374932],
       [-0.36768093, -0.55324144],
       [-0.4493878 ,  1.6022447 ],
       [-0.61280155,  0.16525394],
       [-0.77621529, -0.55324144],
       [ 0.77621529,  0.16525394],
       [-0.93962904,  0.88374932],
       [ 0.36768093,  0.88374932],
       [-0.12256031, -1.27173682],
       [ 2.08352526,  1.6022447 ],
       [ 0.04085344,  0.16525394],
       [ 0.36768093,  0.16525394],
       [-0.28597406,  0.16525394],
       [ 0.77621529, -0.55324144],
       [ 0.20426718,  0.88374932],
       [-0.69450842,  0.16525394],
       [ 1.59328402, -0.55324144],
       [-0.4493878 , -1.27173682],
       [-1.42987028,  0.88374932],
       [-2.16523213, -1.9902322 ],
       [-2.16523213, -0.55324144],
       [ 0.93962904, -0.55324144],
       [-1.59328402, -1.27173682],
       [ 0.85792217,  0.16525394],
       [ 0.61280155, -1.27173682],
       [ 1.02133591, -1.27173682],
       [-0.4493878 ,  0.16525394],
       [-0.04085344,  0.88374932],
       [ 0.28597406, -0.55324144],
       [-0.69450842,  0.16525394],
       [-0.28597406,  1.6022447 ],
       [-2.16523213,  0.16525394],
       [ 0.85792217,  0.88374932],
       [-0.12256031, -0.55324144],
       [-1.3481634 , -0.55324144],
       [ 0.53109467, -0.55324144],
       [ 2.24693901,  0.16525394],
       [-1.83840464,  0.16525394],
       [ 0.53109467, -0.55324144],
       [ 0.77621529,  1.6022447 ],
       [-1.18474966,  0.88374932],
       [-0.36768093, -0.55324144],
       [ 2.16523213, -0.55324144],
       [-0.77621529,  0.16525394],
       [-2.08352526,  0.16525394],
       [-1.3481634 , -0.55324144],
       [ 0.20426718,  0.88374932],
       [-0.4493878 ,  0.16525394],
       [ 0.69450842, -0.55324144],
       [ 0.77621529,  0.88374932],
       [-1.51157715, -2.70872758],
       [-2.08352526,  0.16525394],
       [-1.26645653,  0.88374932],
       [-1.83840464,  1.6022447 ],
       [-1.10304278,  0.16525394],
       [-0.61280155, -1.27173682],
       [ 2.41035275, -0.55324144],
       [-0.69450842,  0.88374932],
       [ 0.61280155,  0.16525394],
       [-0.4493878 , -1.27173682],
       [-0.93962904,  0.16525394],
       [ 2.08352526,  0.16525394],
       [-1.92011151, -0.55324144],
       [ 0.12256031, -0.55324144],
       [ 1.26645653,  0.16525394],
       [-0.12256031,  0.16525394],
       [ 0.04085344,  0.16525394],
       [ 0.77621529,  0.88374932],
       [ 0.36768093, -1.27173682],
       [ 0.53109467,  0.16525394],
       [ 0.20426718,  0.16525394],
       [-1.02133591,  0.16525394],
       [-0.36768093,  0.16525394],
       [ 0.28597406,  0.88374932],
       [-0.53109467, -1.27173682],
       [ 0.04085344, -4.14571834],
       [-0.77621529,  0.16525394],
       [ 0.04085344,  0.16525394],
       [ 0.77621529,  0.88374932],
       [-0.12256031, -0.55324144],
       [ 0.20426718,  0.88374932],
       [-1.42987028,  0.16525394],
       [-0.53109467,  0.16525394],
       [-0.77621529,  1.6022447 ],
       [-1.10304278,  0.16525394],
       [ 0.85792217,  0.16525394],
       [ 0.69450842, -0.55324144],
       [ 0.36768093,  0.16525394],
       [-1.51157715, -1.27173682],
       [ 0.93962904, -1.27173682],
       [-0.28597406,  0.16525394],
       [ 1.42987028,  0.16525394],
       [-1.18474966,  0.88374932],
       [-0.85792217, -0.55324144],
       [ 0.36768093,  0.16525394],
       [-0.36768093, -0.55324144],
       [-2.24693901, -0.55324144],
       [ 0.61280155, -1.27173682],
       [-0.12256031, -0.55324144],
       [-0.69450842, -0.55324144],
       [-0.12256031,  0.16525394],
       [ 2.08352526,  0.88374932],
       [ 0.12256031,  0.88374932],
       [ 0.04085344,  0.16525394],
       [-0.85792217, -0.55324144],
       [-0.12256031,  0.88374932],
       [ 1.18474966,  0.88374932],
       [-0.85792217, -0.55324144],
       [ 0.69450842,  0.88374932],
       [ 1.42987028,  0.16525394],
       [-0.61280155,  0.16525394],
       [-1.02133591,  0.16525394],
       [ 0.04085344,  0.88374932],
       [-0.04085344, -0.55324144],
       [-0.85792217,  0.88374932],
       [ 0.36768093,  0.88374932],
       [ 1.51157715,  1.6022447 ],
       [ 1.59328402,  0.88374932],
       [-0.69450842, -4.14571834],
       [-0.4493878 ,  0.88374932],
       [ 2.24693901,  1.6022447 ],
       [ 0.28597406, -1.27173682],
       [ 0.93962904,  0.88374932],
       [-0.53109467, -0.55324144],
       [ 0.69450842, -0.55324144],
       [-0.53109467,  0.16525394],
       [ 1.3481634 ,  0.16525394],
       [-0.04085344,  0.16525394],
       [-0.93962904, -0.55324144],
       [ 0.4493878 , -0.55324144],
       [ 0.4493878 ,  0.88374932],
       [-0.53109467,  0.16525394],
       [ 0.61280155,  1.6022447 ],
       [-0.12256031,  0.16525394],
       [-1.10304278, -0.55324144],
       [ 0.53109467, -0.55324144],
       [-1.3481634 ,  0.88374932],
       [-0.61280155,  1.6022447 ]])
df_std = pd.DataFrame(X_std, columns = ['exposure','rating'])
df_std
exposure rating
0 1.266457 2.320740
1 0.040853 0.165254
2 0.612802 0.165254
3 0.449388 0.165254
4 1.348163 -1.271737
... ... ...
195 -0.122560 0.165254
196 -1.103043 -0.553241
197 0.531095 -0.553241
198 -1.348163 0.883749
199 -0.612802 1.602245

200 rows × 2 columns

2) Explorative Analyse#

  • 3a) Erzeugen Sie eine Abbildung, in welcher Sie den Zusammenhang zwischen den beiden Variablen darstellen. Wählen Sie eine geeignete Darstellungsform.

# YOUR CODE HERE
alt.Chart(df_std).mark_point().encode(
    x='exposure',
    y='rating'
)
<VegaLite 4 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/troubleshooting.html
# Nehmen Sie keine Änderungen an dieser Zelle vor
# Nehmen Sie keine Änderungen an dieser Zelle vor
# Nehmen Sie keine Änderungen an dieser Zelle vor
# Nehmen Sie keine Änderungen an dieser Zelle vor

3) Clustering#

Führen Sie die Cluster-Analyse durch:

kmeans = KMeans(n_clusters=4, n_init=10)
kmeans.fit(X_std)
KMeans(n_clusters=4, n_init=10)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
y_kmeans = kmeans.predict(X_std)
y_kmeans
array([1, 2, 1, 1, 0, 0, 3, 2, 2, 1, 1, 3, 2, 0, 2, 0, 0, 1, 2, 1, 1, 0,
       1, 1, 1, 1, 3, 1, 1, 2, 1, 2, 0, 2, 0, 1, 0, 2, 1, 0, 1, 1, 1, 2,
       1, 1, 2, 2, 1, 1, 0, 3, 2, 1, 1, 2, 2, 1, 3, 2, 1, 1, 2, 0, 2, 2,
       2, 1, 2, 1, 0, 1, 2, 1, 2, 0, 1, 2, 1, 0, 2, 3, 2, 0, 2, 1, 0, 0,
       2, 2, 0, 2, 2, 2, 1, 0, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 2, 2, 1, 2,
       0, 1, 3, 2, 2, 2, 2, 0, 1, 2, 1, 0, 2, 1, 2, 0, 1, 2, 2, 1, 0, 1,
       1, 2, 2, 1, 0, 3, 2, 2, 1, 0, 1, 2, 2, 2, 2, 1, 0, 1, 2, 0, 2, 1,
       2, 2, 1, 0, 2, 0, 0, 2, 2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 1, 0,
       2, 1, 1, 1, 3, 2, 1, 0, 1, 0, 0, 2, 1, 2, 2, 0, 1, 2, 1, 2, 2, 0,
       2, 2], dtype=int32)
df['cluster'] = y_kmeans
df
campaign spendings test_drives exposure rating cluster
0 1 10.256 330 43 10 1
1 2 985.685 120 28 7 2
2 3 1445.563 360 35 7 1
3 4 1188.193 270 33 7 1
4 5 574.513 220 44 5 0
... ... ... ... ... ... ...
195 196 910.851 190 26 7 2
196 197 888.569 240 14 6 2
197 198 800.615 250 34 6 0
198 199 1500.000 230 11 8 2
199 200 785.694 110 20 9 2

200 rows × 6 columns

# Nehmen Sie keine Änderungen an dieser Zelle vor

Glückwunsch! Dies war die letzte Aufgabe in dem Notebook.