Reproducing results from “The Sound of Intellect” Experiment 1 and Experiment 3a.
Do the following for paired samples t tests:
A paired samples t test is used to compare two means that were sampled from the same set of participants.
Seen in:
Statistically, a paired samples t test is just a one-sample t test on the difference scores.
Difference scores (AKA change scores):
\(\Delta{X} = X_2 - X_1\)
Was the average change greater than/less than/different from zero?
What cues do people use to infer intellect? Differences between reading, hearing, and watching (and hearing) a job candidate’s pitch.
Two research questions from experiment 1 that we’re looking at.
Do job candidates think their written pitch will be perceived more or less positively than their spoken pitch?
Do job candidates expect their chances of being hired to be different for their written and spoken pitches?
“Theoretically, such expectations matter because they indicate whether the cues that convey mental capacities in social interaction are obvious to those in the midst of the interaction. Practically, such expectations matter because they could guide how candidates approach potential employers. Candidates who believe their spoken pitch will be judged exactly the same as their written pitch may see no reason to seek voice time with a potential employer.”
The (implied) hypotheses are…
Candidates will predict written and spoken pitches will be perceived differently.
Candidates will predict that employers’ interest will vary based on whether they observed the written or spoken pitches.
Conduct a paired samples t test comparing participants’ predicted positivity ratings for their spoken and written pitches.
These are the packages we’ll be using in the lab today.
“…these predictions were underpowered given the sample size of only 18 candidates…” (p. 880)
How underpowered? How are we defining underpowered?
What is the smallest population effect 50% of samples of N = 18 would detect?
What is the smallest population effect 80% of samples of N = 18 would detect?
What is the smallest population effect 95% of samples of N = 18 would detect?
Use readxl::read_excel()
to import the Excel file.
## # A tibble: 18 x 10
## `P#` Company PosWrit HireWrit PosSpoke HireSpoke `Times given` Age Gender
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 1 Google 3 3 4 4 3 to 5 26 M
## 2 2 BCG 4 4 3 3 0 27 M
## 3 3 Sprint 4 4 4 4 2 31 F
## 4 4 Micros… 4 4 3 2 3 29 F
## 5 5 Kleine… 3 3 4 3 3 26 M
## 6 6 Raymon… 5 5 4 4 2-3 times, n… 29 M
## 7 7 McKins… 3 3 3 3 2 for job in… 28 F
## 8 8 Wilson… 4 3 4 4 0 28 M
## 9 9 Samsun… 4 4 3 3 0 32 M
## 10 10 Kraft … 1 1 5 4 1 28 M
## 11 11 Gates … 3 1 4 4 0 24 F
## 12 12 Spotify 2 2 4 3 0 28 M
## 13 13 Mattel 3 3 4 4 0 28 F
## 14 14 Coca C… 3 3 5 5 0 28 F
## 15 15 Accent… 3 3 2 2 1 30 M
## 16 16 MetLife 2 2 3 3 0 27 M
## 17 17 McKins… 3 2 3 2 2 32 F
## 18 18 Kaiser… 4 4 3 2 2 to 3 27 M
## # … with 1 more variable: Ethnicity <chr>
## tibble [18 × 10] (S3: tbl_df/tbl/data.frame)
## $ P# : num [1:18] 1 2 3 4 5 6 7 8 9 10 ...
## $ Company : chr [1:18] "Google" "BCG" "Sprint" "Microsoft" ...
## $ PosWrit : num [1:18] 3 4 4 4 3 5 3 4 4 1 ...
## $ HireWrit : num [1:18] 3 4 4 4 3 5 3 3 4 1 ...
## $ PosSpoke : num [1:18] 4 3 4 3 4 4 3 4 3 5 ...
## $ HireSpoke : num [1:18] 4 3 4 2 3 4 3 4 3 4 ...
## $ Times given: chr [1:18] "3 to 5" "0" "2" "3" ...
## $ Age : num [1:18] 26 27 31 29 26 29 28 28 32 28 ...
## $ Gender : chr [1:18] "M" "M" "F" "F" ...
## $ Ethnicity : chr [1:18] "Asian American" "White European" "Indian-American (Sub-continent)" "Indian" ...
## P# Company PosWrit HireWrit
## Min. : 1.00 Length:18 Min. :1.000 Min. :1.00
## 1st Qu.: 5.25 Class :character 1st Qu.:3.000 1st Qu.:2.25
## Median : 9.50 Mode :character Median :3.000 Median :3.00
## Mean : 9.50 Mean :3.222 Mean :3.00
## 3rd Qu.:13.75 3rd Qu.:4.000 3rd Qu.:4.00
## Max. :18.00 Max. :5.000 Max. :5.00
## PosSpoke HireSpoke Times given Age
## Min. :2.000 Min. :2.000 Length:18 Min. :24.00
## 1st Qu.:3.000 1st Qu.:3.000 Class :character 1st Qu.:27.00
## Median :4.000 Median :3.000 Mode :character Median :28.00
## Mean :3.611 Mean :3.278 Mean :28.22
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:29.00
## Max. :5.000 Max. :5.000 Max. :32.00
## Gender Ethnicity
## Length:18 Length:18
## Class :character Class :character
## Mode :character Mode :character
##
##
##
Compute means and standard deviations of how positively participants expected to be evaluated.
psych::describe()
This is a handy function that will describe columns in your data frame.
## vars n mean sd median trimmed mad min max range skew kurtosis
## PosSpoke 1 18 3.61 0.78 4 3.62 1.48 2 5 3 0.01 -0.67
## PosWrit 2 18 3.22 0.94 3 3.25 1.48 1 5 4 -0.42 -0.11
## se
## PosSpoke 0.18
## PosWrit 0.22
These participants did not predict that they would be evaluated differently when employers listened to their spoken pitches (M = 3.61, SD = 0.78) than when employers read their written pitches (M = 3.22, SD = 0.94), paired t(17) = 1.20, p = .25, d = 0.45.
“They also did not expect any difference in their likelihood of getting hired depending on whether employers listened to their spoken pitches (M = 3.28, SD = 0.89) or read their written pitches (M = 3.00, SD = 1.08), paired t(17) = 0.80, p = .44, d = 0.29.”
Add a column to your data frame that contains the difference between expected evaluation of spoken and written pitches.
There are two ways to do a paired-samples t test. The first is to do a one-sample t test of the difference scores.
Conduct a one sample t test against the null hypothesis that the mean of the difference scores is 0.
##
## One Sample t-test
##
## data: dta$PosDiff
## t = 1.1974, df = 17, p-value = 0.2476
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.2963399 1.0741177
## sample estimates:
## mean of x
## 0.3888889
Try the second way of getting the same result, which is using t.test()
with the argument paired = TRUE
.
##
## Paired t-test
##
## data: dta$PosSpoke and dta$PosWrit
## t = 1.1974, df = 17, p-value = 0.2476
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2963399 1.0741177
## sample estimates:
## mean of the differences
## 0.3888889
##
## One Sample t-test
##
## data: dta$PosDiff
## t = 1.1974, df = 17, p-value = 0.2476
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.2963399 1.0741177
## sample estimates:
## mean of x
## 0.3888889
##
## Paired t-test
##
## data: dta$PosSpoke and dta$PosWrit
## t = 1.1974, df = 17, p-value = 0.2476
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2963399 1.0741177
## sample estimates:
## mean of the differences
## 0.3888889
The assumptions are the same as for a one-sample t, but they are assumptions about the difference scores.
We can assess how tenable the normality assumption is in the same ways we did for the one-sample t test:
##
## Shapiro-Wilk normality test
##
## data: dta$PosDiff
## W = 0.86361, p-value = 0.01398
Kurtosis has to do with the tails of the distribution.
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 18 0.39 1.38 0 0.25 1.48 -1 4 5 0.86 0.18 0.32
The mean of the difference scores (which is the same as the mean difference).
These participants did not predict that they would be evaluated differently when employers listened to their spoken pitches (M = 3.61, SD = 0.78) than when employers read their written pitches (M = 3.22, SD = 0.94), paired t(17) = 1.20, p = .25, d = 0.45.
Remember Cohen’s dz from last week?
\(d_z =\frac{M - \mu}{SD}\)
Since a paired samples t is just a one samples t of the difference scores, Cohen recommended dz as a standardized ES.
## [1] 0.2822268
## t
## 0.2822268
With paired data, it is also common to report Cohen’s d instead of dz.
\(d = \frac{M_1 - M_2}{\sqrt{\frac{SD_1^2 + SD_2^2}{2}}}\)
m1 <- mean(dta$PosSpoke)
sd1 <- sd(dta$PosSpoke)
m2 <- mean(dta$PosWrit)
sd2 <- sd(dta$PosWrit)
mean_diff <- m1 - m2
pooled_sd <- sqrt((sd1^2 + sd2^2) / 2)
mean_diff / pooled_sd
## [1] 0.4500317
You need to use the t statistic from an independent samples t test.
## t
## 1.350095
## t
## 0.4500317
These participants did not predict that they would be evaluated differently when employers listened to their spoken pitches (M = 3.61, SD = 0.78) than when employers read their written pitches (M = 3.22, SD = 0.94), paired t(17) = 1.20, p = .25, d = 0.45.
The plot has type of pitch on the x-axis and positivity on the y-axis. But our data are not laid out in this way.
## # A tibble: 18 x 3
## `P#` PosSpoke PosWrit
## <dbl> <dbl> <dbl>
## 1 1 4 3
## 2 2 3 4
## 3 3 4 4
## 4 4 3 4
## 5 5 4 3
## 6 6 4 5
## 7 7 3 3
## 8 8 4 4
## 9 9 3 4
## 10 10 5 1
## 11 11 4 3
## 12 12 4 2
## 13 13 4 3
## 14 14 5 3
## 15 15 2 3
## 16 16 3 2
## 17 17 3 3
## 18 18 3 4
## pid Pitch Positivity
## 1 1 Spoken 4
## 2 2 Spoken 3
## 3 3 Spoken 4
## 4 4 Spoken 3
## 5 5 Spoken 4
## 6 6 Spoken 4
## 7 7 Spoken 3
## 8 8 Spoken 4
## 9 9 Spoken 3
## 10 10 Spoken 5
## 11 11 Spoken 4
## 12 12 Spoken 4
## 13 13 Spoken 4
## 14 14 Spoken 5
## 15 15 Spoken 2
## 16 16 Spoken 3
## 17 17 Spoken 3
## 18 18 Spoken 3
## 19 1 Written 3
## 20 2 Written 4
## 21 3 Written 4
## 22 4 Written 4
## 23 5 Written 3
## 24 6 Written 5
## 25 7 Written 3
## 26 8 Written 4
## 27 9 Written 4
## 28 10 Written 1
## 29 11 Written 3
## 30 12 Written 2
## 31 13 Written 3
## 32 14 Written 3
## 33 15 Written 3
## 34 16 Written 2
## 35 17 Written 3
## 36 18 Written 4
dta_long <- data.frame(
pid = rep(dta$`P#`, 2),
Pitch = rep(c("Spoken", "Written"), each = 18),
Positivity = c(dta$PosSpoke, dta$PosWrit)
)
dta_long
## pid Pitch Positivity
## 1 1 Spoken 4
## 2 2 Spoken 3
## 3 3 Spoken 4
## 4 4 Spoken 3
## 5 5 Spoken 4
## 6 6 Spoken 4
## 7 7 Spoken 3
## 8 8 Spoken 4
## 9 9 Spoken 3
## 10 10 Spoken 5
## 11 11 Spoken 4
## 12 12 Spoken 4
## 13 13 Spoken 4
## 14 14 Spoken 5
## 15 15 Spoken 2
## 16 16 Spoken 3
## 17 17 Spoken 3
## 18 18 Spoken 3
## 19 1 Written 3
## 20 2 Written 4
## 21 3 Written 4
## 22 4 Written 4
## 23 5 Written 3
## 24 6 Written 5
## 25 7 Written 3
## 26 8 Written 4
## 27 9 Written 4
## 28 10 Written 1
## 29 11 Written 3
## 30 12 Written 2
## 31 13 Written 3
## 32 14 Written 3
## 33 15 Written 3
## 34 16 Written 2
## 35 17 Written 3
## 36 18 Written 4
Take 5 minutes to attempt to recreate plot 2 (don’t worry about the jitter—we’ll do that together).
## pid Pitch Positivity PositivityJitter
## 1 1 Spoken 4 3.979290
## 2 2 Spoken 3 3.045460
## 3 3 Spoken 4 3.964090
## 4 4 Spoken 3 2.953722
## 5 5 Spoken 4 3.982265
## 6 6 Spoken 4 3.970553
## 7 7 Spoken 3 2.996247
## 8 8 Spoken 4 3.997742
## 9 9 Spoken 3 3.025605
## 10 10 Spoken 5 4.973715
## 11 11 Spoken 4 3.913898
## 12 12 Spoken 4 3.908666
## 13 13 Spoken 4 3.904526
## 14 14 Spoken 5 5.026285
## 15 15 Spoken 2 2.068756
## 16 16 Spoken 3 3.073923
## 17 17 Spoken 3 3.073315
## 18 18 Spoken 3 3.043185
## 19 1 Written 3 3.076402
## 20 2 Written 4 4.029323
## 21 3 Written 4 4.034346
## 22 4 Written 4 4.076246
## 23 5 Written 3 2.952741
## 24 6 Written 5 5.061069
## 25 7 Written 3 2.916593
## 26 8 Written 4 3.930198
## 27 9 Written 4 4.045028
## 28 10 Written 1 1.028545
## 29 11 Written 3 3.001938
## 30 12 Written 2 1.951113
## 31 13 Written 3 2.934935
## 32 14 Written 3 3.063972
## 33 15 Written 3 3.033724
## 34 16 Written 2 2.074818
## 35 17 Written 3 2.968841
## 36 18 Written 4 3.936012
position_dodge
Dodging moves elements side-to-side. Unlike jitter, it is not random. It spaces objects evenly.
ggplot(dta_long, aes(x = Pitch, y = PositivityJitter, group = pid)) +
scale_y_continuous(minor_breaks = NULL) +
geom_line(position = position_dodge(width = .1), colour = "#c9c9c9") +
geom_point(position = position_dodge(width = .1)) +
labs(y = "Positivity") +
theme_minimal(base_family = "Fira Sans")
Used when we want to determine whether two independent samples were drawn from the same population.
“In Experiment 3a, we recruited four trained stage actors to read all 18 pitches.”
“Evaluators were 265 visitors to the Museum of Science and Industry in Chicago (mean age = 35.03 years, SD = 14.40; 124 males), who agreed to participate in exchange for a food item.”
“We randomly assigned participants serving as potential employers (evaluators) to one of three conditions: Those in the writing condition read a written pitch, those in the female-speaker condition listened to one of the female actors reading a written pitch, and those in the male-speaker condition listened to one of the male actors reading a written pitch.”
RQ: Does the gender of the speaker affect how positively a pitch is perceived?
Hypothesis: Gender of the speaker will affect how positively a pitch is perceived.
“Evaluators had more negative impressions of male speakers (M = 5.79, SD = 1.78) than of female speakers, t(262) = −2.12, p = .04, 95% CI of the difference = [−1.03, −0.06], d = 0.26.”
But we’re getting at something similar.
Do a t test comparing participants who rated a female actor to those who rated a male actor. Authors did some form of planned contrast, which we will learn about when we learn ANOVA.
##
## Two-sample t test power calculation
##
## n = 216
## d = 0.2701842
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
Use head()
, summary()
, and str()
to inspect the data frame.
## Pnum actor_gender impression
## 1 9 Female 7.333333
## 2 12 Male 2.333333
## 3 69 Male 7.000000
## 4 85 Female 7.666667
## 5 114 Male 5.666667
## 6 133 Female 5.000000
## Pnum actor_gender impression
## Min. : 1.00 Length:216 Min. :0.3333
## 1st Qu.: 65.75 Class :character 1st Qu.:5.0000
## Median :135.50 Mode :character Median :6.3333
## Mean :133.93 Mean :6.0613
## 3rd Qu.:198.25 3rd Qu.:7.3333
## Max. :270.00 Max. :9.3333
## NA's :4
## 'data.frame': 216 obs. of 3 variables:
## $ Pnum : int 9 12 69 85 114 133 137 143 213 218 ...
## $ actor_gender: chr "Female" "Male" "Male" "Female" ...
## $ impression : num 7.33 2.33 7 7.67 5.67 ...
Convert actor_gender
to a factor.
Use tapply()
to apply psych::describe()
to impression for each level of actor_gender
.
## $Male
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 106 5.79 1.78 6 5.96 1.48 0.33 8.67 8.33 -0.84 0.18 0.17
##
## $Female
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 106 6.33 1.82 6.67 6.5 1.48 0.67 9.33 8.67 -0.85 0.38 0.18
We’ll use the formula notation.
##
## Welch Two Sample t-test
##
## data: impression by actor_gender
## t = -2.2016, df = 209.92, p-value = 0.02879
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.0311577 -0.0568926
## sample estimates:
## mean in group Male mean in group Female
## 5.789308 6.333333
Very similar to before and we add a new assumption (sort of!).
Test this the same as before, but on each independent sample separately.
This is the assumption that the variance of both groups is equal. This assumption is violated if measurement is more or less accurate for one of the groups.
There is a statistical test of this assumption, but I’m not going to teach it to you, because there is a better approach: use Welch’s t test!
Welch’s t test adjusts the degrees of freedom to account for heterogeneity of variance (AKA heteroscedasticity).
Look at these results:
##
## Welch Two Sample t-test
##
## data: impression by actor_gender
## t = -2.2016, df = 209.92, p-value = 0.02879
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.0311577 -0.0568926
## sample estimates:
## mean in group Male mean in group Female
## 5.789308 6.333333
Report M and SD for each group, and Cohen’s d. You know how to do this:
## [1] 0.2995998
Participants had more negative impressions of pitches read by males (M = 5.79, SD = 1.78) than females (M = 6.33, SD = 1.82), t(209.92) = 2.20, p = .03, 95% CI [−1.03, −0.06], d = 0.30.