{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# t-Test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A t-test can be used to determine if there is a systematic difference between the means of two groups. \n", "\n", "The simplest form of analysis that can be done is one with only one independent variable (outcome variable) that is manipulated in only two ways (group variable) and only one outcome is measured. The manipulation of the independent variable typically involves having an experimental condition and a control: \n", "\n", "- E.g., does the introduction of a new UX feature on a website lead to a higher revenue than the old one? We then could measure revenue for both conditions and compare them. This situation can be analysed with a t-test.\n", "\n", "Furthermore, we can use a t-test to draw conclusions from model estimates. In particular, the test can be used to decide whether there is any significant relationship between a dependent variable `y` and a feature `x` by testing the null hypothesis that the regression coefficient `b` equals 0.\n", "\n", "Among the most frequently used t-tests are:\n", "\n", "- **Dependent t-test**: Compares two means based on related data.\n", "- **Independent t-test**: Compares two means based on independent data\n", "- **Significance testing**: Testing the significance of the coefficient b in a regression." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Independent t-test example\n", "\n", "Let's conduct an independent two-sample t-test for the following hypothetical research question for an e commerce online shop: \n", "\n", "> Does the introduction of a new UX feature increases the time on site? \n", "\n", "We proceed as follows:\n", "\n", "- Imagine that we split a random sample of 1000 website members into two groups using random assignment. Random assignment ensures that, on average, everything else is held constant between the two groups.\n", "\n", "- Group “A” (500 members) receives the current product experience\n", "\n", "- Group “B” (500 members) receives some change that we think is an improvement to the experience. \n", "\n", "- We then compare metrics (time on site) between the two groups. \n", "\n", "\n", "Let's formulate a **nullhypothesis ($H_0)$:**\n", "\n", "- $H_0$ \"There is no difference in the time spent on site between the two groups.\"\n", "\n", "## Generate data\n", "\n", "We generate our own data for this example." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from scipy import stats\n", "\n", "# number of participants\n", "n_A = 500 # see old feature\n", "n_B = 500 # see new feature\n", "\n", "# generate random data with scipy\n", "np.random.seed(123)\n", "\n", "# mean =loc and standard deviation = scale\n", "ux_new = stats.norm.rvs(loc=4.0, scale=1.2, size = n_B)\n", "ux_old = stats.norm.rvs(loc=3.8, scale=1.5, size = n_A)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | ux_new | \n", "ux_old | \n", "
---|---|---|
0 | \n", "2.697 | \n", "4.928 | \n", "
1 | \n", "5.197 | \n", "3.904 | \n", "
2 | \n", "4.340 | \n", "3.402 | \n", "
3 | \n", "2.192 | \n", "5.194 | \n", "
4 | \n", "3.306 | \n", "5.691 | \n", "