Course overview

Course overview

Table 1 Course overview

Topic

Content

Python

1

Introduction

Data driven decision making; Programming toolkit; Programming process

Anaconda, Jupyter Notebook, Visual Studio Code

2

Data basics & study design

Types of data; data transformations; Sampling; Experiment; Observational study

pandas

3

Exploratory data analysis

Visualizing categorical data; visualizing numerical data; measures of central tendency; measures of distribution

pandas; Seaborn; Plotly

4

Statistical inference

Hypothesis testing; decision errors; p-value and statistical significance; Confidence intervals; Crosstables (Pearson’s chi-squared test); Student’s t-test; A/B-testing

pandas; statsmodels

5

Introduction to modeling

Statistical learning vs machine learning; Supervised learning vs unsupervised learning; Regression; Classification; Quality of fit; Bias-variance trade-off

Statsmodels; Scikit-Learn

6

Resampling methods

Training, evaluation and test set; Validation set approach; k-Fold Cross-Validation; Bootstrap

Statsmodels; Scikit-Learn

7

Linear regression

Fundamentals; Qualitative predictors, Interaction terms; Non-linear transformations

Statsmodels; Scikit-Learn

8

Regression diagnostics

Linearity; Normality of the residuals; Influence tests; Multicollinearity; Heteroskedasticity tests

Statsmodels; Scikit-Learn

9

Advanced methods I

Subset selection methods; Shrinkage methods (Lasso, Ridge regression, Elastic Net); Dimension reduction methods (Principal Components regression)

Statsmodels; Scikit-Learn

10

Advanced methods II

Regression Splines; Smoothing Splines; Generalized Additive Models; Stacking

Statsmodels; Scikit-Learn

11

Introduction to classification

Confusion matrix; Recall; Precision; F1-score; ROC-Curve; Unbalanced data

Statsmodels; Scikit-Learn

12

Classification models

Logistic regression; Generative models (discriminant analysis)

Statsmodels; Scikit-Learn

13

Alternative models

Time series analysis

Statsmodels; Prophet

14

Probability

Introduction to probability; Expected frequency tree; Bayes Theorem

pandas; PyMC3; bambi