Wichtige Hinweise

  • Aktivieren Sie die Conda-Umgebung bevor Sie beginnen.

  • Geben Sie als NAME ihr HdM-Kürzel an.

  • Ändern Sie nicht den Namen der Datei und löschen Sie keine Zellen.

  • Bearbeiten Sie alle Zellen mit dem Hinweis # YOUR CODE HERE

  • Die Funktion NotImplementedError() soll die Abgabe von leeren Zellen verhindern. Löschen Sie die Funktion, sobald Sie in einer dieser Zellen arbeiten.

  • Stellen Sie sicher, dass alles wie erwartet läuft, bevor Sie die Prüfung abgeben: Starten Sie den Kernel neu und führen Sie alle Zellen aus: wählen Sie “Restart” und dann “Run All”

Ich wünsche Ihnen viel Erfolg!

NAME = ""
import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."

E-Exam Notebook#


Importieren Sie die Bibliotheken (Sie werden nicht alle benötigen):

# Nehmen Sie keine Änderungen an dieser Zelle vor

import pandas as pd
import altair as alt

# Scikit-learn Bibliotheken

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.datasets import make_moons
from sklearn.datasets import make_blobs
from sklearn.cluster import AgglomerativeClustering

# Weitere Hilfsbibliotheken
import io
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

1) Daten#

Daten importieren#

  • Importieren Sie die Daten:

LINK = "https://raw.githubusercontent.com/kirenz/datasets/master/mini_test_drives.csv"

df = pd.read_csv(LINK)

Verschaffen Sie sich eine Übersicht über die Daten:

campaign spendings test_drives exposure rating
0 1 10.256 330 43 10
1 2 985.685 120 28 7
2 3 1445.563 360 35 7
3 4 1188.193 270 33 7
4 5 574.513 220 44 5
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   campaign     200 non-null    int64  
 1   spendings    200 non-null    float64
 2   test_drives  200 non-null    int64  
 3   exposure     200 non-null    int64  
 4   rating       200 non-null    int64  
dtypes: float64(1), int64(4)
memory usage: 7.9 KB

Daten für Clustering vorbereiten#

Führen Sie eine K-Means-Cluster-Analyse durch. Verwenden Sie dafür die Variablen “exposure” und “rating”.

# Daten vorbereiten
scaler = StandardScaler()
X = df[['exposure', 'rating']]
X_std = scaler.fit_transform(X)
array([[ 1.26645653,  2.32074007],
       [ 0.04085344,  0.16525394],
       [ 0.61280155,  0.16525394],
       [ 0.4493878 ,  0.16525394],
       [ 1.3481634 , -1.27173682],
       [-0.69450842, -1.27173682],
       [-0.61280155, -4.14571834],
       [-0.4493878 ,  1.6022447 ],
       [-0.53109467,  0.16525394],
       [ 1.02133591,  0.16525394],
       [ 0.36768093,  0.16525394],
       [-0.61280155, -3.42722296],
       [-0.28597406,  0.88374932],
       [ 0.85792217, -0.55324144],
       [-0.28597406,  0.16525394],
       [-0.20426718, -1.27173682],
       [ 0.61280155, -1.27173682],
       [ 0.69450842,  0.88374932],
       [-0.04085344,  0.88374932],
       [ 0.4493878 ,  0.88374932],
       [ 0.4493878 ,  0.16525394],
       [ 0.04085344, -0.55324144],
       [ 0.20426718,  1.6022447 ],
       [ 0.53109467,  0.16525394],
       [ 1.75669777,  0.16525394],
       [ 1.02133591,  0.88374932],
       [-0.61280155, -1.9902322 ],
       [ 1.18474966,  0.16525394],
       [ 0.61280155,  0.88374932],
       [-1.59328402, -0.55324144],
       [ 1.75669777,  0.16525394],
       [-0.69450842,  0.88374932],
       [ 1.18474966, -0.55324144],
       [-1.75669777,  0.16525394],
       [ 0.69450842, -0.55324144],
       [ 0.36768093,  0.16525394],
       [ 0.04085344, -0.55324144],
       [-0.20426718,  0.88374932],
       [ 0.53109467,  0.88374932],
       [-0.53109467, -0.55324144],
       [ 0.53109467,  0.16525394],
       [ 2.90059399,  0.16525394],
       [ 0.28597406,  0.16525394],
       [-0.20426718,  0.16525394],
       [ 1.18474966,  0.16525394],
       [ 0.77621529,  0.16525394],
       [-0.20426718,  0.88374932],
       [-0.12256031,  0.16525394],
       [ 0.93962904,  0.16525394],
       [ 1.51157715,  0.16525394],
       [ 0.69450842, -0.55324144],
       [-1.26645653, -1.9902322 ],
       [-2.08352526,  0.88374932],
       [ 0.12256031,  0.88374932],
       [ 0.4493878 ,  0.88374932],
       [ 0.04085344,  0.16525394],
       [-1.42987028, -0.55324144],
       [ 0.85792217,  0.16525394],
       [-0.69450842, -1.9902322 ],
       [-1.18474966, -0.55324144],
       [ 0.20426718,  0.16525394],
       [ 0.85792217,  0.88374932],
       [-0.4493878 ,  0.88374932],
       [-0.36768093, -0.55324144],
       [-0.4493878 ,  1.6022447 ],
       [-0.61280155,  0.16525394],
       [-0.77621529, -0.55324144],
       [ 0.77621529,  0.16525394],
       [-0.93962904,  0.88374932],
       [ 0.36768093,  0.88374932],
       [-0.12256031, -1.27173682],
       [ 2.08352526,  1.6022447 ],
       [ 0.04085344,  0.16525394],
       [ 0.36768093,  0.16525394],
       [-0.28597406,  0.16525394],
       [ 0.77621529, -0.55324144],
       [ 0.20426718,  0.88374932],
       [-0.69450842,  0.16525394],
       [ 1.59328402, -0.55324144],
       [-0.4493878 , -1.27173682],
       [-1.42987028,  0.88374932],
       [-2.16523213, -1.9902322 ],
       [-2.16523213, -0.55324144],
       [ 0.93962904, -0.55324144],
       [-1.59328402, -1.27173682],
       [ 0.85792217,  0.16525394],
       [ 0.61280155, -1.27173682],
       [ 1.02133591, -1.27173682],
       [-0.4493878 ,  0.16525394],
       [-0.04085344,  0.88374932],
       [ 0.28597406, -0.55324144],
       [-0.69450842,  0.16525394],
       [-0.28597406,  1.6022447 ],
       [-2.16523213,  0.16525394],
       [ 0.85792217,  0.88374932],
       [-0.12256031, -0.55324144],
       [-1.3481634 , -0.55324144],
       [ 0.53109467, -0.55324144],
       [ 2.24693901,  0.16525394],
       [-1.83840464,  0.16525394],
       [ 0.53109467, -0.55324144],
       [ 0.77621529,  1.6022447 ],
       [-1.18474966,  0.88374932],
       [-0.36768093, -0.55324144],
       [ 2.16523213, -0.55324144],
       [-0.77621529,  0.16525394],
       [-2.08352526,  0.16525394],
       [-1.3481634 , -0.55324144],
       [ 0.20426718,  0.88374932],
       [-0.4493878 ,  0.16525394],
       [ 0.69450842, -0.55324144],
       [ 0.77621529,  0.88374932],
       [-1.51157715, -2.70872758],
       [-2.08352526,  0.16525394],
       [-1.26645653,  0.88374932],
       [-1.83840464,  1.6022447 ],
       [-1.10304278,  0.16525394],
       [-0.61280155, -1.27173682],
       [ 2.41035275, -0.55324144],
       [-0.69450842,  0.88374932],
       [ 0.61280155,  0.16525394],
       [-0.4493878 , -1.27173682],
       [-0.93962904,  0.16525394],
       [ 2.08352526,  0.16525394],
       [-1.92011151, -0.55324144],
       [ 0.12256031, -0.55324144],
       [ 1.26645653,  0.16525394],
       [-0.12256031,  0.16525394],
       [ 0.04085344,  0.16525394],
       [ 0.77621529,  0.88374932],
       [ 0.36768093, -1.27173682],
       [ 0.53109467,  0.16525394],
       [ 0.20426718,  0.16525394],
       [-1.02133591,  0.16525394],
       [-0.36768093,  0.16525394],
       [ 0.28597406,  0.88374932],
       [-0.53109467, -1.27173682],
       [ 0.04085344, -4.14571834],
       [-0.77621529,  0.16525394],
       [ 0.04085344,  0.16525394],
       [ 0.77621529,  0.88374932],
       [-0.12256031, -0.55324144],
       [ 0.20426718,  0.88374932],
       [-1.42987028,  0.16525394],
       [-0.53109467,  0.16525394],
       [-0.77621529,  1.6022447 ],
       [-1.10304278,  0.16525394],
       [ 0.85792217,  0.16525394],
       [ 0.69450842, -0.55324144],
       [ 0.36768093,  0.16525394],
       [-1.51157715, -1.27173682],
       [ 0.93962904, -1.27173682],
       [-0.28597406,  0.16525394],
       [ 1.42987028,  0.16525394],
       [-1.18474966,  0.88374932],
       [-0.85792217, -0.55324144],
       [ 0.36768093,  0.16525394],
       [-0.36768093, -0.55324144],
       [-2.24693901, -0.55324144],
       [ 0.61280155, -1.27173682],
       [-0.12256031, -0.55324144],
       [-0.69450842, -0.55324144],
       [-0.12256031,  0.16525394],
       [ 2.08352526,  0.88374932],
       [ 0.12256031,  0.88374932],
       [ 0.04085344,  0.16525394],
       [-0.85792217, -0.55324144],
       [-0.12256031,  0.88374932],
       [ 1.18474966,  0.88374932],
       [-0.85792217, -0.55324144],
       [ 0.69450842,  0.88374932],
       [ 1.42987028,  0.16525394],
       [-0.61280155,  0.16525394],
       [-1.02133591,  0.16525394],
       [ 0.04085344,  0.88374932],
       [-0.04085344, -0.55324144],
       [-0.85792217,  0.88374932],
       [ 0.36768093,  0.88374932],
       [ 1.51157715,  1.6022447 ],
       [ 1.59328402,  0.88374932],
       [-0.69450842, -4.14571834],
       [-0.4493878 ,  0.88374932],
       [ 2.24693901,  1.6022447 ],
       [ 0.28597406, -1.27173682],
       [ 0.93962904,  0.88374932],
       [-0.53109467, -0.55324144],
       [ 0.69450842, -0.55324144],
       [-0.53109467,  0.16525394],
       [ 1.3481634 ,  0.16525394],
       [-0.04085344,  0.16525394],
       [-0.93962904, -0.55324144],
       [ 0.4493878 , -0.55324144],
       [ 0.4493878 ,  0.88374932],
       [-0.53109467,  0.16525394],
       [ 0.61280155,  1.6022447 ],
       [-0.12256031,  0.16525394],
       [-1.10304278, -0.55324144],
       [ 0.53109467, -0.55324144],
       [-1.3481634 ,  0.88374932],
       [-0.61280155,  1.6022447 ]])
df_std = pd.DataFrame(X_std, columns = ['exposure','rating'])
exposure rating
0 1.266457 2.320740
1 0.040853 0.165254
2 0.612802 0.165254
3 0.449388 0.165254
4 1.348163 -1.271737
... ... ...
195 -0.122560 0.165254
196 -1.103043 -0.553241
197 0.531095 -0.553241
198 -1.348163 0.883749
199 -0.612802 1.602245

200 rows × 2 columns

2) Explorative Analyse#

  • 3a) Erzeugen Sie eine Abbildung, in welcher Sie den Zusammenhang zwischen den beiden Variablen darstellen. Wählen Sie eine geeignete Darstellungsform.

3) Clustering#

Führen Sie die Cluster-Analyse durch:

kmeans = KMeans(n_clusters=4, n_init=10)
KMeans(n_clusters=4, n_init=10)
y_kmeans = kmeans.predict(X_std)
array([1, 2, 1, 1, 0, 0, 3, 2, 2, 1, 1, 3, 2, 0, 2, 0, 0, 1, 2, 1, 1, 0,
       1, 1, 1, 1, 3, 1, 1, 2, 1, 2, 0, 2, 0, 1, 0, 2, 1, 0, 1, 1, 1, 2,
       1, 1, 2, 2, 1, 1, 0, 3, 2, 1, 1, 2, 2, 1, 3, 2, 1, 1, 2, 0, 2, 2,
       2, 1, 2, 1, 0, 1, 2, 1, 2, 0, 1, 2, 1, 0, 2, 3, 2, 0, 2, 1, 0, 0,
       2, 2, 0, 2, 2, 2, 1, 0, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 2, 2, 1, 2,
       0, 1, 3, 2, 2, 2, 2, 0, 1, 2, 1, 0, 2, 1, 2, 0, 1, 2, 2, 1, 0, 1,
       1, 2, 2, 1, 0, 3, 2, 2, 1, 0, 1, 2, 2, 2, 2, 1, 0, 1, 2, 0, 2, 1,
       2, 2, 1, 0, 2, 0, 0, 2, 2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 1, 0,
       2, 1, 1, 1, 3, 2, 1, 0, 1, 0, 0, 2, 1, 2, 2, 0, 1, 2, 1, 2, 2, 0,
       2, 2], dtype=int32)
df['cluster'] = y_kmeans
campaign spendings test_drives exposure rating cluster
0 1 10.256 330 43 10 1
1 2 985.685 120 28 7 2
2 3 1445.563 360 35 7 1
3 4 1188.193 270 33 7 1
4 5 574.513 220 44 5 0
... ... ... ... ... ... ...
195 196 910.851 190 26 7 2
196 197 888.569 240 14 6 2
197 198 800.615 250 34 6 0
198 199 1500.000 230 11 8 2
199 200 785.694 110 20 9 2

200 rows × 6 columns

# Nehmen Sie keine Änderungen an dieser Zelle vor

Glückwunsch! Dies war die letzte Aufgabe in dem Notebook.