SGD Classifier#

We use a classification model to predict which customers will default on their credit card debt. Our estimator implements regularized linear models with stochastic gradient descent (SGD) learning.

Data#

To learn more about the data and all of the data preparation steps, take a look at this page. Here, we simply import a Python script which includes all of the necessary steps.

from data_prep_credit import * 

Model#

SGD classifier documentation.

from sklearn.linear_model import SGDClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# We always scale the input. The most convenient way is to use a pipeline.
clf = make_pipeline(StandardScaler(),
                    SGDClassifier(loss="hinge", 
                                  alpha=0.01, 
                                  max_iter=200, 
                                  fit_intercept=True,
                                  tol=1e-3))

clf.fit(X_train, y_train)

y_pred = clf.fit(X_train, y_train).predict(X_test)
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

cm = confusion_matrix(y_test, y_pred)

disp = ConfusionMatrixDisplay(confusion_matrix=cm,
                              display_labels=clf.classes_)
disp.plot()
plt.show()
../_images/Sgd_5_0.png
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred, target_names=['No', 'Yes']))
              precision    recall  f1-score   support

          No       0.97      1.00      0.99      2909
         Yes       1.00      0.04      0.08        91

    accuracy                           0.97      3000
   macro avg       0.99      0.52      0.53      3000
weighted avg       0.97      0.97      0.96      3000