{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# t-Test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A t-test can be used to determine if there is a systematic difference between the means of two groups. \n", "\n", "The simplest form of analysis that can be done is one with only one independent variable (outcome variable) that is manipulated in only two ways (group variable) and only one outcome is measured. The manipulation of the independent variable typically involves having an experimental condition and a control: \n", "\n", "- E.g., does the introduction of a new UX feature on a website lead to a higher revenue than the old one? We then could measure revenue for both conditions and compare them. This situation can be analysed with a t-test.\n", "\n", "Furthermore, we can use a t-test to draw conclusions from model estimates. In particular, the test can be used to decide whether there is any significant relationship between a dependent variable `y` and a feature `x` by testing the null hypothesis that the regression coefficient `b` equals 0.\n", "\n", "Among the most frequently used t-tests are:\n", "\n", "- **Dependent t-test**: Compares two means based on related data.\n", "- **Independent t-test**: Compares two means based on independent data\n", "- **Significance testing**: Testing the significance of the coefficient b in a regression." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Independent t-test example\n", "\n", "Let's conduct an independent two-sample t-test for the following hypothetical research question for an e commerce online shop: \n", "\n", "> Does the introduction of a new UX feature increases the time on site? \n", "\n", "We proceed as follows:\n", "\n", "- Imagine that we split a random sample of 1000 website members into two groups using random assignment. Random assignment ensures that, on average, everything else is held constant between the two groups.\n", "\n", "- Group “A” (500 members) receives the current product experience\n", "\n", "- Group “B” (500 members) receives some change that we think is an improvement to the experience. \n", "\n", "- We then compare metrics (time on site) between the two groups. \n", "\n", "\n", "Let's formulate a **nullhypothesis ($H_0)$:**\n", "\n", "- $H_0$ \"There is no difference in the time spent on site between the two groups.\"\n", "\n", "## Generate data\n", "\n", "We generate our own data for this example." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from scipy import stats\n", "\n", "# number of participants\n", "n_A = 500 # see old feature\n", "n_B = 500 # see new feature\n", "\n", "# generate random data with scipy\n", "np.random.seed(123)\n", "\n", "# mean =loc and standard deviation = scale\n", "ux_new = stats.norm.rvs(loc=4.0, scale=1.2, size = n_B)\n", "ux_old = stats.norm.rvs(loc=3.8, scale=1.5, size = n_A)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ux_newux_old
02.6974.928
15.1973.904
24.3403.402
32.1925.194
43.3065.691
\n", "
" ], "text/plain": [ " ux_new ux_old\n", "0 2.697 4.928\n", "1 5.197 3.904\n", "2 4.340 3.402\n", "3 2.192 5.194\n", "4 3.306 5.691" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "# create pandas DataFrame\n", "df = pd.DataFrame({'ux_new': ux_new, 'ux_old': ux_old})\n", "df.head(5)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import seaborn as sns\n", "%matplotlib inline\n", "sns.set_style(\"whitegrid\")\n", "\n", "sns.histplot(x=ux_old, data=df, color=\"blue\", label=\"UX old\", kde=True)\n", "sns.histplot(x=ux_new, data=df, color=\"red\", label=\"UX new\", kde=True, alpha=0.8);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate t-statistic\n", "\n", "The t-statistic is used in a t-test to determine whether to support or reject the null hypothesis. The formula of the statistic is:\n", "\n", "$$t = \\frac{\\bar{x}_{1} - \\bar{x}_{2}}{{s}_p \\sqrt{\\frac{2}{n}}}$$\n", "\n", "where $s_p$ is the pooled standard deviation:\n", "\n", "$$s_p = \\sqrt{\\frac{{s}^{2}_{X_1} + {s}^{2}_{X_2}} {2} }$$\n", "\n", "Thus, the t statistic as a way of quantifying how large the difference between groups is in relation to the sampling variability of the difference between means.\n", "\n", "Note that we perform the calculation of the t-Test manually to better understand the concept. \n", "\n", "Usually we would just use the independent t-test function below and are done:\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "t = 2.49205054533012\n", "p = 0.012862137872711413\n" ] } ], "source": [ "t, p = stats.ttest_ind(ux_new, ux_old)\n", "print(\"t =\", t)\n", "print(\"p =\", p)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First of all, let`s calculate the different components of the formula. We will start with the numerator of the formula. \n", "\n", "### Difference in mean\n", "\n", "Calculate the differeence between the two means:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.21\n" ] } ], "source": [ "diff_mean = ux_new.mean() - ux_old.mean()\n", "print(round(diff_mean,2))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pooled standard deviation\n", "\n", "Next, we calculate the variance: \n", "\n", "$$s^{2} = \\frac{\\sum (x_{i} - \\bar{x})^{2}}{N - 1}$$\n", "\n", "and standard deviation: \n", "\n", "$$s = \\sqrt{\\frac{\\sum (x_{i} - \\bar{x})^{2}}{N - 1}}$$\n", "\n", "\n", "We use the numpy function `var` with the optional argument `ddof` which means “Delta Degrees of Freedom” (the divisor used in the calculation of the variance is (N - ddof), where `N` represents the number of elements. By default `ddof` is zero). \n", "\n", "Note that `ddof` is equal to the number of parameters we use. We estimate the mean, therefore we only have one parameter. See this [documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.var.html) for more information." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Variance:\n", "UX new: 1.452143284234209 UX old: 2.247145174966161\n", "Standard deviation:\n", "UX new: 1.2050490795956026 UX old: 1.499048089610924\n" ] } ], "source": [ "import numpy as np\n", "\n", "# calculate variance\n", "var_ux_new = ux_new.var(ddof=1)\n", "var_ux_old = ux_old.var(ddof=1)\n", "\n", "print(\"Variance:\")\n", "print(\"UX new:\", var_ux_new, \"UX old:\", var_ux_old)\n", "\n", "# calculate standard deviation\n", "s_ux_new = np.sqrt(var_ux_new)\n", "s_ux_old = np.sqrt(var_ux_old)\n", "\n", "print(\"Standard deviation:\")\n", "print(\"UX new:\", s_ux_new, \"UX old:\", s_ux_old)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we calculate the pooled variance (also known as combined variance, composite variance, or overall variance) to estimate the combined variance of our two groups:\n", "\n", "$$s_{p}^2 = \\frac {(n_{1} - 1)s_{1}^2 + (n_{2} - 1)s_{2}^2}{n_{1} + n_{2} - 2}$$\n", "\n", "Afterwards, we take the square root to get the pooled standard deviatin (${s}_p$)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "1.849644229600185" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# calculate pooled variance\n", "var_p = ( (n_B - 1) * var_ux_new + (n_A - 1) * var_ux_old ) / ( n_A + n_B - 2 )\n", "var_p" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.3600162607852102" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# pooled standard deviation \n", "s_p = np.sqrt(var_p)\n", "s_p" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculate t-statistic\n", "\n", "$$t = \\frac{\\bar{x}_{1} - \\bar{x}_{2}}{{s}_p \\sqrt{\\frac{2}{n}}}$$" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2.49205054533012" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# calculate t-statistic\n", "t = (diff_mean) / (s_p * np.sqrt(2/500))\n", "t" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The probability density function for t is ([see documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html) and [wikipedia](https://en.wikipedia.org/wiki/Student%27s_t-distribution)):" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.9935689310636443\n" ] } ], "source": [ "## Compare with the critical t-value\n", "df = n_A + n_B - 2 # Degrees of freedom\n", "crit_t = stats.t.cdf(t, df=df)\n", "print(crit_t)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.006431068936355699\n" ] } ], "source": [ "#p-value after comparison with the t \n", "p = 1 - crit_t\n", "print(p)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "t = 2.4921\n", "p= 0.0129\n" ] } ], "source": [ "print('t =', t)\n", "print('p=', 2*p) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Note that we multiply the p value by 2 because it's a two tailed t-test.*\n", "\n", "You can see that after comparing the t statistic with the critical t value (computed internally) we get a good p value way below 5% and thus we reject the null hypothesis.\n", "\n", "This means we have an indication that the mean of the two distributions are different and statistically significant." ] } ], "metadata": { "interpreter": { "hash": "463226f144cc21b006ce6927bfc93dd00694e52c8bc6857abb6e555b983749e9" }, "kernelspec": { "display_name": "Python 3.8.2 64-bit ('base': conda)", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }