In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each case, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition.

Stroop effect

Questions For Investigation¶

1. What is our independent variable? What is our dependent variable?¶

Independent variable:- Congruent words and Incongruent words

Dependent variable:- Time it takes to name the ink colors in equally-sized lists of congruent and incongruent words

2. What is an appropriate set of hypotheses for this task? What kind of statistical test do you expect to perform? Justify your choices.¶

Hypothesis testing

Null hypothesis states that there are no differences between the response time for congruent words (Congruent) vs incongruent words (Incongruent). The mean difference between paired observations is zero.

$$H_0: \mu_{Congruent} - \mu_{Incongruent} = 0$$

where $\mu_{Congruent}$ is the population mean of response time for congruent words and $\mu_{Incongruent}$ is the population mean of response time for incongruent words.

Alternative hypothesis states that the difference between the response time for congruent words vs incongruent words is not zero.

$$H_{A}: \mu_{Congruent} - \mu_{Incongruent} \neq 0$$

Dependent t-tests for paired samples

The dependent t-test compares the mean of two paired groups to see if there are statistically significant differences between these means. The experimental design, in this case, is "within-subjects". The same subjects were tested for congruent and incongruent words. By using the same subject to test two different condition, we eliminate the individual differences that occur between subjects.

If we get a significant result, we can reject the null hypothesis and accept the alternative hypothesis that there are statistically significant differences between the mean time taken to name ink color between two test conditions.

Import the libraries needed

In [1]:

#Importing the libraries needed for reading the data and plotting
import pandas as pd
import numpy as np
from math import sqrt

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("white")
#sns.set_context('notebook', font_scale = 1.5)

Loading data into pandas dataframe

In [2]:

#Read the data into a pandas dataframe and add a subject column
stroop_data = pd.read_csv('data/stroopdata.csv')
stroop_data['Subject'] = stroop_data.index + 1
stroop_data.head()

Out[2]:

	Congruent	Incongruent	Subject
0	12.079	19.278	1
1	16.791	18.741	2
2	9.564	21.214	3
3	8.630	15.687	4
4	14.669	22.803	5

The best way to visualize this data is a slopegraph. I could not find a easy way to make slopegraph in python. So I am going to cheat a little bit here and make the slopegraph in Tableau.

In [3]:

#Write the dataframe to csv file for Tableau visualization.
stroop_data.to_csv('data/stroop_data.csv',
             index = False)

In [4]:

%%HTML

Response time for congruent vs incongruent words

In [5]:

#Include an additional column that shows the difference between 
#response time for congruent and incongruent words
stroop_data['Difference'] = stroop_data['Congruent'] - stroop_data['Incongruent']
stroop_data

Out[5]:

	Congruent	Incongruent	Subject	Difference
0	12.079	19.278	1	-7.199
1	16.791	18.741	2	-1.950
2	9.564	21.214	3	-11.650
3	8.630	15.687	4	-7.057
4	14.669	22.803	5	-8.134
5	12.238	20.878	6	-8.640
6	14.692	24.572	7	-9.880
7	8.987	17.394	8	-8.407
8	9.401	20.762	9	-11.361
9	14.480	26.282	10	-11.802
10	22.328	24.524	11	-2.196
11	15.298	18.644	12	-3.346
12	15.073	17.510	13	-2.437
13	16.929	20.330	14	-3.401
14	18.200	35.255	15	-17.055
15	12.130	22.158	16	-10.028
16	18.495	25.139	17	-6.644
17	10.639	20.429	18	-9.790
18	11.344	17.425	19	-6.081
19	12.369	34.288	20	-21.919
20	12.944	23.894	21	-10.950
21	14.233	17.960	22	-3.727
22	19.710	22.058	23	-2.348
23	16.004	21.157	24	-5.153

3. Report some descriptive statistics regarding this dataset. Include at least one measure of central tendency and at least one measure of variability.¶

In [6]:

stroop_data[['Congruent', 'Incongruent', 'Difference']].describe()

Out[6]:

	Congruent	Incongruent	Difference
count	24.000000	24.000000	24.000000
mean	14.051125	22.015917	-7.964792
std	3.559358	4.797057	4.864827
min	8.630000	15.687000	-21.919000
25%	11.895250	18.716750	-10.258500
50%	14.356500	21.017500	-7.666500
75%	16.200750	24.051500	-3.645500
max	22.328000	35.255000	-1.950000

We can see that the mean, median, minimum and maximum response time for incongruent words are higher than the congruent words. Participants took more time to name the color of incongruent words compared to congruent words.

4. Provide one or two visualizations that show the distribution of the sample data. Write one or two sentences noting what you observe about the plot or plots.¶

In [7]:

#Distribution of time taken to name ink colors 
#with a kernel density estimate,histogram and rug plot
sns.distplot(stroop_data['Congruent'], rug = True, label = "congruent");
sns.distplot(stroop_data['Incongruent'], rug = True, label = "incongruent");
plt.xlabel("Time taken to name ink colors");
plt.ylabel("Frequency");
plt.title("Response time for congruent vs incongruent words");
plt.legend();

In [8]:

sns.boxplot(data=stroop_data[['Congruent', 'Incongruent']], orient="h");
plt.ylabel("Time taken to name ink colors");

Distribution of response time for incongruent words are higher than the congruent words

The distribution of time taken to name ink color for congruent words is between 8.63s to 22.328s, whereas the distribution of time taken to name ink color for incongruent words is between 15.687s to 35.255s. Also for each participant, the response time for incongruent word is always higher than congruent words. The difference in mean and median between response time for incongruent words vs congruent words are 7.965s and 7.666s.

5. Now, perform the statistical test and report your results. What is your confidence level and your critical statistic value? Do you reject the null hypothesis or fail to reject it? Come to a conclusion in terms of the experiment task. Did the results match up with your expectations?¶

I will go through the series of steps usually involved in hypothesis testing.

Degrees of freedom is the number of independent pieces of information available to estimate another piece of information. It is the number of information that can be freely varied without violating any given restrictions. The degree of freedom in our case is n − 1, where n represents the number of pairs (subjects in this case).

In [9]:

# Degrees of freedom
n = 24
df = n - 1
df

Out[9]:

Paired t-test

t statistic to test whether the means are different can be calculated as follows:

$t_{statistic} = \frac{\bar x_{D} - \mu_0}{s_{D}/\sqrt{n}}$

where $\bar x_{D}$ is the mean of difference between all the pairs and $s_D$ is the sample standard deviation of the difference betweeen all the pairs. The constant $\mu_0$ in our case is 0, since $H_0: \mu_{Congruent} - \mu_{Incongruent} = 0$

In [10]:

# Point estimates
# Computes mean of the difference
mean_of_the_differences = stroop_data['Difference'].mean() 
# Computes std deviation of the difference 
stdev_of_the_differences = stroop_data['Difference'].std()  

print "mean of the differences: {:.4f}".format(mean_of_the_differences)
print "standard deviation of the difference: {:.4f}".format(stdev_of_the_differences)

mean of the differences: -7.9648
standard deviation of the difference: 4.8648

In [11]:

#t-statistic
se = stdev_of_the_differences/float(sqrt(n))
t_statistic = mean_of_the_differences/float(se)

print "t-statistic: {:.4f}".format(t_statistic)

t-statistic: -8.0207

t-statistic tells us how much the sample mean deviates from the null hypothesis. If the t-statistic lies outside the critical values of the t-distribution corresponding to our confidence level and degrees of freedom, we reject the null hypothesis.

t-critical values for two-tailed t-test

Significance level($\alpha$) is the criterion used for rejecting null hypothesis. Statisticians have commonly used either the 0.05 level (5% chance) or the 0.01 level (1% chance).

Confidence level is 1 - significance level.

Our alternative hypothesis states $H_{A}: \mu_{Congruent} - \mu_{Incongruent} \neq 0$. Since we hypothesized the possibility of relationship in both directions, we will use a two-tailed test to test our hypothesis. If we are using a significance level of 0.05, two-tailed t-test allocates 0.025 in each tail(shaded area). Both left and right shaded area are 2.5% of the total area under the curve.

A two-tailed test will test both, if $\mu_{Congruent} - \mu_{Incongruent}$ is significantly greater than or less than $\mu_0$(0). The $\mu_{Congruent} - \mu_{Incongruent}$ is considered significantly different from $\mu_0$(0) if the test statistic is in the top 2.5% or bottom 2.5% of its probability distribution, resulting in a p-value less than 0.05.

t critical

In [12]:

from scipy import stats

# t-critical values at alpha = 0.05 and df = 23 for two-tailed t-test, q = Quantile to check

t_critical_values = (stats.t.ppf(q=0.025,df=df), stats.t.ppf(q=0.975,df=df))  
print "t-critical values at alpha of 0.05 for two-tailed t-test:\
({:.4f}, {:.4f})".format(*t_critical_values)

t-critical values at alpha of 0.05 for two-tailed t-test:(-2.0687, 2.0687)

Our t-statistic(-8.0207) is less than t-critical value of -2.0687 at alpha = 0.05 and degrees of freedom 23. It means that the probability of finding t-statistic as extreme as ours is less than 5% if the null hypothesis was true. This probability is defined by the term p-value.

p-value

Probability of null hypothesis given that the null hypothesis ($H_0$) is true (usually that the observations are a result of pure chance). The lower the p-value the greater the confidence with which we can reject the null hypothesis.

In [13]:

#Cumulative distribution function. Multiply by 2 for two-tailed test
pval = stats.t.cdf(t_statistic, df)*2 

print "p-value: {:.4e}".format(pval)

p-value: 4.1030e-08

We obtained a p-value of 4.1030e-08. This means we'd expect a 0.000004103 chance of finding a difference as large as (or larger than) the one in our study if the null hypothesis was true. Our p-value is way lower than our significance level α (0.05) so we should reject the null hypothesis.

Paired t-test in scipy

We can use the following function in scipy to directly perform t-test:

In [14]:

#Paired t-test on response time for congruent vs incongruent words
print stats.ttest_rel(stroop_data['Congruent'],stroop_data['Incongruent'])

Ttest_relResult(statistic=-8.020706944109957, pvalue=4.1030005857111781e-08)

Confidence intervals

Confidence intervals (CI) are a useful statistic to include because they indicate where the true population mean might be. It is common to report 95% confidence intervals.

$CI = (\bar x_D - t_{critical}\frac{s_D}{\sqrt n},\bar x_D + t_{critical}\frac{s_D}{\sqrt n})$

where $t_{critical}\frac{s_D}{\sqrt n}$ is called the Margin of error.

In [15]:

#95% CI
stats.norm.interval(0.95, loc = mean_of_the_differences, scale = se)

Out[15]:

(-9.9110920264491931, -6.0184913068841395)

The experiment proved that when a color word is printed in the same color as the word, people can name the ink color more quickly compared with when a color word is printed with an ink color not denoted by the word. The results are congruent with my intuition.

6. Optional: What do you think is responsible for the effects observed? Can you think of an alternative or similar task that would result in a similar effect?¶

There are many theories to Stroop effect but not one possible explanation. The most common theory for Stroop effect is called interference. Reading is a habit practiced very early on from school days. We become so good at it that brain automatically understands the meaning of words; whereas recognizing colors is not an “automatic process”. When the brain has to read incongruent words it has to override its initial impulse of automatically reading the word rather so that it can recognize its color.

Another similar effect is Warped words.

References¶

1) Stroop effect image

2) Dependent t-test for paired samples

3) Critical regions for two tailed t-test

4) scipy.stats.t

5) scipy.stats.ttest_rel

6) Stroop effect

Test a Perceptual Phenomenon