E-Exam Notebook

Aktivieren Sie die Conda-Umgebung bevor Sie beginnen.
Geben Sie als NAME ihr HdM-Kürzel an.
Ändern Sie nicht den Namen der Datei und löschen Sie keine Zellen.
Bearbeiten Sie alle Zellen mit dem Hinweis # YOUR CODE HERE
Die Funktion NotImplementedError() soll die Abgabe von leeren Zellen verhindern. Löschen Sie die Funktion, sobald Sie in einer dieser Zellen arbeiten.
Stellen Sie sicher, dass alles wie erwartet läuft, bevor Sie die Prüfung abgeben: Starten Sie den Kernel neu und führen Sie alle Zellen aus: wählen Sie “Restart” und dann “Run All”

Ich wünsche Ihnen viel Erfolg!

NAME = ""

import IPython
assert IPython.version_info[0] >= 3, "Your version of IPython is too old, please update it."

E-Exam Notebook#

Setup#

Importieren Sie die Bibliotheken (Sie werden nicht alle benötigen):

# Nehmen Sie keine Änderungen an dieser Zelle vor

import pandas as pd
import altair as alt
alt.renderers.enable('mimetype')

# Scikit-learn Bibliotheken

from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.datasets import make_moons
from sklearn.datasets import make_blobs
from sklearn.cluster import AgglomerativeClustering


# Weitere Hilfsbibliotheken
import io
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

1) Daten#

Daten importieren#

Importieren Sie die Daten:

LINK = "https://raw.githubusercontent.com/kirenz/datasets/master/mini_test_drives.csv"

df = pd.read_csv(LINK)

Verschaffen Sie sich eine Übersicht über die Daten:

df.head()

	campaign	spendings	test_drives	exposure	rating
0	1	10.256	330	43	10
1	2	985.685	120	28	7
2	3	1445.563	360	35	7
3	4	1188.193	270	33	7
4	5	574.513	220	44	5

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   campaign     200 non-null    int64  
 1   spendings    200 non-null    float64
 2   test_drives  200 non-null    int64  
 3   exposure     200 non-null    int64  
 4   rating       200 non-null    int64  
dtypes: float64(1), int64(4)
memory usage: 7.9 KB

Daten für Clustering vorbereiten#

Führen Sie eine K-Means-Cluster-Analyse durch. Verwenden Sie dafür die Variablen “exposure” und “rating”.

# Daten vorbereiten

scaler = StandardScaler()

X = df[['exposure', 'rating']]

X_std = scaler.fit_transform(X)
X_std

array([[ 1.26645653,  2.32074007],
       [ 0.04085344,  0.16525394],
       [ 0.61280155,  0.16525394],
       [ 0.4493878 ,  0.16525394],
       [ 1.3481634 , -1.27173682],
       [-0.69450842, -1.27173682],
       [-0.61280155, -4.14571834],
       [-0.4493878 ,  1.6022447 ],
       [-0.53109467,  0.16525394],
       [ 1.02133591,  0.16525394],
       [ 0.36768093,  0.16525394],
       [-0.61280155, -3.42722296],
       [-0.28597406,  0.88374932],
       [ 0.85792217, -0.55324144],
       [-0.28597406,  0.16525394],
       [-0.20426718, -1.27173682],
       [ 0.61280155, -1.27173682],
       [ 0.69450842,  0.88374932],
       [-0.04085344,  0.88374932],
       [ 0.4493878 ,  0.88374932],
       [ 0.4493878 ,  0.16525394],
       [ 0.04085344, -0.55324144],
       [ 0.20426718,  1.6022447 ],
       [ 0.53109467,  0.16525394],
       [ 1.75669777,  0.16525394],
       [ 1.02133591,  0.88374932],
       [-0.61280155, -1.9902322 ],
       [ 1.18474966,  0.16525394],
       [ 0.61280155,  0.88374932],
       [-1.59328402, -0.55324144],
       [ 1.75669777,  0.16525394],
       [-0.69450842,  0.88374932],
       [ 1.18474966, -0.55324144],
       [-1.75669777,  0.16525394],
       [ 0.69450842, -0.55324144],
       [ 0.36768093,  0.16525394],
       [ 0.04085344, -0.55324144],
       [-0.20426718,  0.88374932],
       [ 0.53109467,  0.88374932],
       [-0.53109467, -0.55324144],
       [ 0.53109467,  0.16525394],
       [ 2.90059399,  0.16525394],
       [ 0.28597406,  0.16525394],
       [-0.20426718,  0.16525394],
       [ 1.18474966,  0.16525394],
       [ 0.77621529,  0.16525394],
       [-0.20426718,  0.88374932],
       [-0.12256031,  0.16525394],
       [ 0.93962904,  0.16525394],
       [ 1.51157715,  0.16525394],
       [ 0.69450842, -0.55324144],
       [-1.26645653, -1.9902322 ],
       [-2.08352526,  0.88374932],
       [ 0.12256031,  0.88374932],
       [ 0.4493878 ,  0.88374932],
       [ 0.04085344,  0.16525394],
       [-1.42987028, -0.55324144],
       [ 0.85792217,  0.16525394],
       [-0.69450842, -1.9902322 ],
       [-1.18474966, -0.55324144],
       [ 0.20426718,  0.16525394],
       [ 0.85792217,  0.88374932],
       [-0.4493878 ,  0.88374932],
       [-0.36768093, -0.55324144],
       [-0.4493878 ,  1.6022447 ],
       [-0.61280155,  0.16525394],
       [-0.77621529, -0.55324144],
       [ 0.77621529,  0.16525394],
       [-0.93962904,  0.88374932],
       [ 0.36768093,  0.88374932],
       [-0.12256031, -1.27173682],
       [ 2.08352526,  1.6022447 ],
       [ 0.04085344,  0.16525394],
       [ 0.36768093,  0.16525394],
       [-0.28597406,  0.16525394],
       [ 0.77621529, -0.55324144],
       [ 0.20426718,  0.88374932],
       [-0.69450842,  0.16525394],
       [ 1.59328402, -0.55324144],
       [-0.4493878 , -1.27173682],
       [-1.42987028,  0.88374932],
       [-2.16523213, -1.9902322 ],
       [-2.16523213, -0.55324144],
       [ 0.93962904, -0.55324144],
       [-1.59328402, -1.27173682],
       [ 0.85792217,  0.16525394],
       [ 0.61280155, -1.27173682],
       [ 1.02133591, -1.27173682],
       [-0.4493878 ,  0.16525394],
       [-0.04085344,  0.88374932],
       [ 0.28597406, -0.55324144],
       [-0.69450842,  0.16525394],
       [-0.28597406,  1.6022447 ],
       [-2.16523213,  0.16525394],
       [ 0.85792217,  0.88374932],
       [-0.12256031, -0.55324144],
       [-1.3481634 , -0.55324144],
       [ 0.53109467, -0.55324144],
       [ 2.24693901,  0.16525394],
       [-1.83840464,  0.16525394],
       [ 0.53109467, -0.55324144],
       [ 0.77621529,  1.6022447 ],
       [-1.18474966,  0.88374932],
       [-0.36768093, -0.55324144],
       [ 2.16523213, -0.55324144],
       [-0.77621529,  0.16525394],
       [-2.08352526,  0.16525394],
       [-1.3481634 , -0.55324144],
       [ 0.20426718,  0.88374932],
       [-0.4493878 ,  0.16525394],
       [ 0.69450842, -0.55324144],
       [ 0.77621529,  0.88374932],
       [-1.51157715, -2.70872758],
       [-2.08352526,  0.16525394],
       [-1.26645653,  0.88374932],
       [-1.83840464,  1.6022447 ],
       [-1.10304278,  0.16525394],
       [-0.61280155, -1.27173682],
       [ 2.41035275, -0.55324144],
       [-0.69450842,  0.88374932],
       [ 0.61280155,  0.16525394],
       [-0.4493878 , -1.27173682],
       [-0.93962904,  0.16525394],
       [ 2.08352526,  0.16525394],
       [-1.92011151, -0.55324144],
       [ 0.12256031, -0.55324144],
       [ 1.26645653,  0.16525394],
       [-0.12256031,  0.16525394],
       [ 0.04085344,  0.16525394],
       [ 0.77621529,  0.88374932],
       [ 0.36768093, -1.27173682],
       [ 0.53109467,  0.16525394],
       [ 0.20426718,  0.16525394],
       [-1.02133591,  0.16525394],
       [-0.36768093,  0.16525394],
       [ 0.28597406,  0.88374932],
       [-0.53109467, -1.27173682],
       [ 0.04085344, -4.14571834],
       [-0.77621529,  0.16525394],
       [ 0.04085344,  0.16525394],
       [ 0.77621529,  0.88374932],
       [-0.12256031, -0.55324144],
       [ 0.20426718,  0.88374932],
       [-1.42987028,  0.16525394],
       [-0.53109467,  0.16525394],
       [-0.77621529,  1.6022447 ],
       [-1.10304278,  0.16525394],
       [ 0.85792217,  0.16525394],
       [ 0.69450842, -0.55324144],
       [ 0.36768093,  0.16525394],
       [-1.51157715, -1.27173682],
       [ 0.93962904, -1.27173682],
       [-0.28597406,  0.16525394],
       [ 1.42987028,  0.16525394],
       [-1.18474966,  0.88374932],
       [-0.85792217, -0.55324144],
       [ 0.36768093,  0.16525394],
       [-0.36768093, -0.55324144],
       [-2.24693901, -0.55324144],
       [ 0.61280155, -1.27173682],
       [-0.12256031, -0.55324144],
       [-0.69450842, -0.55324144],
       [-0.12256031,  0.16525394],
       [ 2.08352526,  0.88374932],
       [ 0.12256031,  0.88374932],
       [ 0.04085344,  0.16525394],
       [-0.85792217, -0.55324144],
       [-0.12256031,  0.88374932],
       [ 1.18474966,  0.88374932],
       [-0.85792217, -0.55324144],
       [ 0.69450842,  0.88374932],
       [ 1.42987028,  0.16525394],
       [-0.61280155,  0.16525394],
       [-1.02133591,  0.16525394],
       [ 0.04085344,  0.88374932],
       [-0.04085344, -0.55324144],
       [-0.85792217,  0.88374932],
       [ 0.36768093,  0.88374932],
       [ 1.51157715,  1.6022447 ],
       [ 1.59328402,  0.88374932],
       [-0.69450842, -4.14571834],
       [-0.4493878 ,  0.88374932],
       [ 2.24693901,  1.6022447 ],
       [ 0.28597406, -1.27173682],
       [ 0.93962904,  0.88374932],
       [-0.53109467, -0.55324144],
       [ 0.69450842, -0.55324144],
       [-0.53109467,  0.16525394],
       [ 1.3481634 ,  0.16525394],
       [-0.04085344,  0.16525394],
       [-0.93962904, -0.55324144],
       [ 0.4493878 , -0.55324144],
       [ 0.4493878 ,  0.88374932],
       [-0.53109467,  0.16525394],
       [ 0.61280155,  1.6022447 ],
       [-0.12256031,  0.16525394],
       [-1.10304278, -0.55324144],
       [ 0.53109467, -0.55324144],
       [-1.3481634 ,  0.88374932],
       [-0.61280155,  1.6022447 ]])

df_std = pd.DataFrame(X_std, columns = ['exposure','rating'])
df_std

	exposure	rating
0	1.266457	2.320740
1	0.040853	0.165254
2	0.612802	0.165254
3	0.449388	0.165254
4	1.348163	-1.271737
...	...	...
195	-0.122560	0.165254
196	-1.103043	-0.553241
197	0.531095	-0.553241
198	-1.348163	0.883749
199	-0.612802	1.602245

200 rows × 2 columns

2) Explorative Analyse#

3a) Erzeugen Sie eine Abbildung, in welcher Sie den Zusammenhang zwischen den beiden Variablen darstellen. Wählen Sie eine geeignete Darstellungsform.

# YOUR CODE HERE
alt.Chart(df_std).mark_point().encode(
    x='exposure',
    y='rating'
)

<VegaLite 4 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/troubleshooting.html

# Nehmen Sie keine Änderungen an dieser Zelle vor

# Nehmen Sie keine Änderungen an dieser Zelle vor

# Nehmen Sie keine Änderungen an dieser Zelle vor

# Nehmen Sie keine Änderungen an dieser Zelle vor

3) Clustering#

Führen Sie die Cluster-Analyse durch:

kmeans = KMeans(n_clusters=4, n_init=10)

kmeans.fit(X_std)

KMeans(n_clusters=4, n_init=10)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

y_kmeans = kmeans.predict(X_std)
y_kmeans

array([1, 2, 1, 1, 0, 0, 3, 2, 2, 1, 1, 3, 2, 0, 2, 0, 0, 1, 2, 1, 1, 0,
       1, 1, 1, 1, 3, 1, 1, 2, 1, 2, 0, 2, 0, 1, 0, 2, 1, 0, 1, 1, 1, 2,
       1, 1, 2, 2, 1, 1, 0, 3, 2, 1, 1, 2, 2, 1, 3, 2, 1, 1, 2, 0, 2, 2,
       2, 1, 2, 1, 0, 1, 2, 1, 2, 0, 1, 2, 1, 0, 2, 3, 2, 0, 2, 1, 0, 0,
       2, 2, 0, 2, 2, 2, 1, 0, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 2, 2, 1, 2,
       0, 1, 3, 2, 2, 2, 2, 0, 1, 2, 1, 0, 2, 1, 2, 0, 1, 2, 2, 1, 0, 1,
       1, 2, 2, 1, 0, 3, 2, 2, 1, 0, 1, 2, 2, 2, 2, 1, 0, 1, 2, 0, 2, 1,
       2, 2, 1, 0, 2, 0, 0, 2, 2, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 2, 1, 0,
       2, 1, 1, 1, 3, 2, 1, 0, 1, 0, 0, 2, 1, 2, 2, 0, 1, 2, 1, 2, 2, 0,
       2, 2], dtype=int32)

df['cluster'] = y_kmeans
df

	campaign	spendings	test_drives	exposure	rating	cluster
0	1	10.256	330	43	10	1
1	2	985.685	120	28	7	2
2	3	1445.563	360	35	7	1
3	4	1188.193	270	33	7	1
4	5	574.513	220	44	5	0
...	...	...	...	...	...	...
195	196	910.851	190	26	7	2
196	197	888.569	240	14	6	2
197	198	800.615	250	34	6	0
198	199	1500.000	230	11	8	2
199	200	785.694	110	20	9	2

200 rows × 6 columns

# Nehmen Sie keine Änderungen an dieser Zelle vor

Glückwunsch! Dies war die letzte Aufgabe in dem Notebook.