Marketing Campaign Response Prediction

Using Gradient Boosted Trees

Jan Kirenz

Introduction

  • Predict the response to a marketing campaign

Example Payback

Marketing use case

  • The goal is to predict if
    • a customer
    • will respond positively (e.g. buys a product)
    • to a future campaign
    • based on their features
  • We use data from previous campaigns to train a model

Boosting: An Intuitive Introduction

Overview

  • Boosting is an ensemble learning method

  • Combines multiple weak learners to build a strong classifier

  • Learners are trained sequentially

  • Each learner focuses on correcting the mistakes of its predecessor

Intuition

  1. Begin with a weak learner that performs slightly better than random guessing

  2. Train a new weak learner to correct the mistakes of the previous one

  3. Repeat the process, focusing on different error patterns each time

  4. Combine all weak learners into a strong classifier

Difference to Bagging

  • Bagging:
    • Learners are trained independently
    • Training samples are drawn with replacement (bootstrapping)
    • Combines learners by averaging (regression) or voting (classification)
  • Boosting:
    • Learners are trained sequentially
    • Emphasis is placed on misclassified instances
    • Combines learners by weighted averaging

Advantages of Boosting

  • Can achieve high accuracy with simple weak learners

  • Less prone to overfitting than single models

  • Can be applied to various learning algorithms

Disadvantages of Boosting

  • Sensitive to noise and outliers

  • Computationally expensive due to sequential training

  • Can overfit if weak learners are too complex

Code example

Data overview

  • age: Customer’s age (integer)
  • city: Customer’s place of residence (string: ‘Berlin’, ‘Stuttgart’)
  • income: Customer’s annual income (integer)
  • membership_days: Number of days the customer has been a member (integer)
  • campaign_engagement: Number of times the customer engaged with previous campaigns (integer)
  • target: Whether the customer responded positively to the campaign (0 or 1)

Import data

df = pd.read_csv(
    'https://raw.githubusercontent.com/kirenz/datasets/master/campaign.csv')

Data overview

df
age city income membership_days campaign_engagement target
0 56 Berlin 136748 837 3 1
1 46 Stuttgart 25287 615 8 0
2 32 Berlin 146593 2100 3 0
3 60 Berlin 54387 2544 0 0
4 25 Berlin 28512 138 6 0
... ... ... ... ... ... ...
995 22 Berlin 49241 2123 4 0
996 40 Stuttgart 116214 970 5 1
997 27 Stuttgart 64569 2552 6 0
998 61 Stuttgart 31745 2349 8 1
999 19 Berlin 46029 2185 2 0

1000 rows × 6 columns

Data info

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   age                  1000 non-null   int64 
 1   city                 1000 non-null   object
 2   income               1000 non-null   int64 
 3   membership_days      1000 non-null   int64 
 4   campaign_engagement  1000 non-null   int64 
 5   target               1000 non-null   int64 
dtypes: int64(5), object(1)
memory usage: 47.0+ KB

Data corrections

  • Encode categorical variables
df = pd.get_dummies(df, columns=['city'])
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype
---  ------               --------------  -----
 0   age                  1000 non-null   int64
 1   income               1000 non-null   int64
 2   membership_days      1000 non-null   int64
 3   campaign_engagement  1000 non-null   int64
 4   target               1000 non-null   int64
 5   city_Berlin          1000 non-null   uint8
 6   city_Stuttgart       1000 non-null   uint8
dtypes: int64(5), uint8(2)
memory usage: 41.1 KB

Data splitting

  • Split the df into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']
  • Save feature names for later evaluation steps
feature_names = X.columns
  • Make train and test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

Select model

  • Define hyperparameters as dictionary
params = {
    "n_estimators": 50,
    "max_depth": 3,
    "min_samples_split": 5,
}
  • n_estimators: Number of gradient boosted trees
  • max_depth: Maximum tree depth
  • min_samples_split: The minimum number of samples required to split an internal node
clf = GradientBoostingClassifier(**params)

Train model

  • Train the model on the training data
clf.fit(X_train, y_train)
GradientBoostingClassifier(min_samples_split=5, n_estimators=50)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluate model

  • Predict on the testing data
# Predict on the testing data
y_pred = clf.predict(X_test)
  • Calculate accuracy
accuracy_score(y_test, y_pred)
0.92

Confusion matrix

  • Print confusion matrix
print(confusion_matrix(y_test, y_pred))
[[94  7]
 [ 9 90]]

Classification report

  • Print classification report
print(classification_report(y_test, y_pred))
              precision    recall  f1-score   support

           0       0.91      0.93      0.92       101
           1       0.93      0.91      0.92        99

    accuracy                           0.92       200
   macro avg       0.92      0.92      0.92       200
weighted avg       0.92      0.92      0.92       200

Obtain feature importance

  • Obtain feature importance
feature_importance = clf.feature_importances_
  • Save as dataframe
df_features = pd.DataFrame(
    {"score": feature_importance,
     "name": feature_names})

df_features
score name
0 0.135131 age
1 0.354875 income
2 0.005587 membership_days
3 0.503843 campaign_engagement
4 0.000564 city_Berlin
5 0.000000 city_Stuttgart

Plot feature importance

alt.Chart(df_features).mark_bar().encode(
    x=alt.X('score'),
    y=alt.Y('name', sort='-x')
).properties(
    width=800,
    height=300
)

Save model

model_filename = 'gradientboosted_model.joblib'
dump(clf, model_filename)

Summary

  1. We trained a model

  2. Our model makes a prediction if a customer will respond positively or not

  3. The model does a good job and we want to use it

  4. We saved the model

  5. We want to use the model to target customers

How to use the model?

Dashboard & API

  • Integrate (“deploy”) the model in a dashboard (e.g. Streamlit)

  • Use an API (e.g. FastAPI) to allow other software applications to use the model

Streamlit dashboard

FastAPI

  • Use an FastAPI app with a single /predict endpoint

  • Accepts POST requests with JSON data containing age, city, income, membership days, and campaign engagement.

  • The app will return a JSON response with the prediction.

Test API with data

  • You can test the API using Python’s requests library:
url = "http://127.0.0.1:8000/predict"
data = {
    "data": [
        {
            "age": 25,
            "city": "city_Berlin",
            "income": 25000,
            "membership_days": 4,
            "campaign_engagement": 1
        },
        {
            "age": 35,
            "city": "city_Stuttgart",
            "income": 120000,
            "membership_days": 250,
            "campaign_engagement": 8
        }
    ]
}

Get response

response = requests.post(url, json=data)

if response.status_code == 200:
    results = response.json()['results']
    df = pd.DataFrame(results)
    df.to_csv('predictions.csv', index=False)
    print("Predictions saved to 'predictions.csv'")
else:
    print(f"Error: {response.status_code}")
    print(response.text)

Output

age,city,income,membership_days,campaign_engagement,prediction
25,city_Berlin,25000,4,1,0.01
35,city_Stuttgart,120000,250,8,0.97

Marketing campaign

  • Next, we would filter all customers at a certain threshold

  • What would be a good threshold?

  • Only target those customers with the marketing campaign

Questions?