One-Sample t Tests

Oct. 8, 2020

One Sample t Tests

Learning Outcomes

  1. Conduct one-sample t tests in R.
  2. Evaluate the assumptions of a one-sample t test.
  3. Report results of one-sample t tests in APA style.
  4. Create appropriate visualizations for one sample data.
  5. Conduct power analyses for one sample t tests.
  6. Evaluate the strength of evidence provided by one-sample t tests in studies that utilize them.

Today’s Lab

Reproducing these effects from Lloyd et al., 2018.

Together

“Similar to previous findings in the deception detection literature, sensitivity scores (M = .15; SD = .98) averaged across targets and participants were slightly better than chance (i.e., 0), t(401) = 3.053, p = .002, 95% CI [.05, .25] d = .30.”

On Your Own

“Consistent with past meta-analyses in the deception detection literature, accuracy scores (M = .52, SD = .13) were slightly better than chance (i.e., .5), t(401) = 3.045, p = .002, 95% CI [.01, .03], d = .30.”

Getting Started

Data Import

dta <- read.csv("Gender LD open data 8.16.17.csv")

Drop Some Participants

“Because we are interested in the effects of target and participant gender, only those who disclosed self-disclosed their gender were included in analyses (N = 402).”

How do we know which participants did not self disclose gender?

tRy it! Identify Rows to Drop

Check to the codebook to identify which participants did not self-disclose their gender.

Identify Rows to Drop

  • Participants with NA in gender column?
  • filter_$ column. Codebook says 1 means gender was disclosed.
  • What are the values of filter_$ if gender was not disclosed? How could you check?
unique(dta$filter_.)
## [1]  1 NA  0

tRy it! Drop Participants

Use subset() or [ to drop participants who did not self-disclose gender.

Dropping Participants with [

dta <- dta[dta$filter_. == 1, ]

Remember to check your work!

unique(dta$filter_.)
## [1]  1 NA

Dropping Participants with [

dta <- dta[dta$filter_. == 1 & !is.na(dta$filter_.), ]
# did it work this time?
unique(dta$filter_.)
## [1] 1

Dropping Participants with subset()

This works because subset() treats NA as FALSE.

dta <- subset(dta, filter_. == 1)
# subset treats NA as FALSE
unique(dta$filter_.)
## [1] 1

Conducting a One-sample t

t.test()

Open the documentation for t.test().

Description

Performs one and two sample t-tests on vectors of data.

Usage

t.test(x, ...)

## Default S3 method:
t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, ...)

## S3 method for class 'formula'
t.test(formula, data, subset, na.action, ...)

Why are there three usages???

A Brief Aside on OOP

In Language, Context Matters

Consider the following:

  • What is \(1 + 1\)?
  • What is \(flour + water\)?

Notice how you can solve both equations by treating + differently.

The + (a function) does something different depending on its arguments

Object Oriented Programming

Some functions will behave differently depending on the class(es) of their argument(s). For example:

summary(dta$race_TEXT)
##    Length     Class      Mode 
##       402 character character
summary(dta)
##        Ps           videoset          age             race      
##  Min.   :  1.0   Min.   : 1.00   Min.   :18.00   Min.   :1.000  
##  1st Qu.:101.2   1st Qu.: 5.25   1st Qu.:26.00   1st Qu.:5.000  
##  Median :202.5   Median :11.00   Median :31.00   Median :5.000  
##  Mean   :203.3   Mean   :10.52   Mean   :34.47   Mean   :4.666  
##  3rd Qu.:303.8   3rd Qu.:15.00   3rd Qu.:40.00   3rd Qu.:5.000  
##  Max.   :485.0   Max.   :20.00   Max.   :78.00   Max.   :7.000  
##                                                  NA's   :1      
##   race_TEXT          gender           f_dprime           f_crit       
##  Length:402         Mode:logical   Min.   :-3.0008   Min.   :-2.3263  
##  Class :character   NA's:402       1st Qu.:-0.6745   1st Qu.: 0.0000  
##  Mode  :character                  Median : 0.0000   Median : 0.3372  
##                                    Mean   : 0.2450   Mean   : 0.2701  
##                                    3rd Qu.: 0.6745   3rd Qu.: 0.6745  
##                                    Max.   : 4.6527   Max.   : 2.3263  
##                                                                       
##     m_dprime            m_crit          f_accuracy       m_accuracy    
##  Min.   :-4.65270   Min.   :-2.3263   Min.   :0.1250   Min.   :0.0000  
##  1st Qu.:-0.67449   1st Qu.: 0.0000   1st Qu.:0.3750   1st Qu.:0.3750  
##  Median : 0.00000   Median : 0.3372   Median :0.5000   Median :0.5000  
##  Mean   : 0.05301   Mean   : 0.4740   Mean   :0.5323   Mean   :0.5062  
##  3rd Qu.: 0.67449   3rd Qu.: 1.1632   3rd Qu.:0.6250   3rd Qu.:0.6250  
##  Max.   : 4.65270   Max.   : 2.3263   Max.   :1.0000   Max.   :1.0000  
##                                                                        
##   accuracy_tot      dprime_tot        crit_tot          filter_.
##  Min.   :0.2500   Min.   :-2.330   Min.   :-1.5000   Min.   :1  
##  1st Qu.:0.4400   1st Qu.:-0.340   1st Qu.: 0.0000   1st Qu.:1  
##  Median :0.5000   Median : 0.000   Median : 0.3400   Median :1  
##  Mean   :0.5205   Mean   : 0.149   Mean   : 0.3722   Mean   :1  
##  3rd Qu.:0.6300   3rd Qu.: 0.830   3rd Qu.: 0.5800   3rd Qu.:1  
##  Max.   :0.8100   Max.   : 2.660   Max.   : 2.3300   Max.   :1  
## 

Why Should I Care?

  • Sometimes, you don’t care.
    • E.g., summary() will choose the appropriate method for you.
  • Good to be aware of which method you are using because:
    • Different methods can require different arguments to the same function.
    • Helps for understanding the documentation.

Back to the t.test Documentation

Usage

t.test(x, ...)

## Default S3 method:
t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, ...)

## S3 method for class 'formula'
t.test(formula, data, subset, na.action, ...)

We’re using the default method today.

Arguments

Argument Value
x a (non-empty) numeric vector of data values.
y an optional (non-empty) numeric vector of data values.
alternative a character string specifying the alternative hypothesis, must be one of “two.sided” (default), “greater” or “less”. You can specify just the initial letter.
mu a number indicating the true value of the mean (or difference in means if you are performing a two sample test).
paired a logical indicating whether you want a paired t-test.
var.equal a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch (or Satterthwaite) approximation to the degrees of freedom is used.
conf.level confidence level of the interval.

tRy it! One-sample t Test

Use t.test() to conduct a one-sample t test comparing sensitivity to 0. Use values for the arguments: x, alternative, mu, and conf.level, to match the results from Lloyd et al.

Assign a name to the resulting R object.

dprime_ost <- t.test(x = dta$dprime_tot,
  alternative = "two.sided",
  mu = 0,
  conf.level = 0.95
)

Output of the t Test

dprime_ost
## 
##  One Sample t-test
## 
## data:  dta$dprime_tot
## t = 3.0542, df = 401, p-value = 0.002407
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  0.05310512 0.24495458
## sample estimates:
## mean of x 
## 0.1490299

From Lloyd et al.: t(401) = 3.053, p = .002, 95% CI [.05, .25] d = .30.

Assumptions

Review: Assumptions

  • What is meant by assumptions of a test?
  • Who/what is doing the assuming?
  • What happens if the assumptions are violated?

Assumptions of One Sample t

  1. Normally distributed population.
  2. Independence of observations → design issue.

Normally Distributed Population

Scores on the outcome variable are normally distributed in the population from which our sample was drawn.

Can we know answer this question conclusively?

We do have approaches that help us decide whether the assumption is tenable.

Shapiro–Wilk Test

Null hypothesis significance test against the null sample scores are drawn from a normally distributed population.

  • Significant result → significant deviation from normality.

Shapiro–Wilk Test in R

shapiro.test(dta$dprime_tot)
## 
##  Shapiro-Wilk normality test
## 
## data:  dta$dprime_tot
## W = 0.99014, p-value = 0.008507

How do we interpret this?

Problems with Shapiro–Wilk Test

  • Can only reject the null.
  • p ≥ .05 does not mean the assumption was not violated.
  • In large samples, practically meaningless departures from normality may be statistically significant.

Visual Inspection

I.e., does the sample look normal?

  • Histogram.
  • Q–Q Plot

tRy it! Histogram

Create a Histogram of dprime_tot

# Base R
hist(dta$dprime_tot)

# ggplot
ggplot(dta, aes(dprime_tot)) +
  geom_histogram(binwidth = 0.3)

Histogram

Q–Q Plot

Plots theoretical quantiles on the x and sample quantiles on the y.

If theory matches sample, points will fall on the line y = x (or y = x + c, where c is a constant).

Q–Q Plot in R

Use qqnorm().

# Create the plot
qqnorm(dta$dprime_tot)

# Add the line
# (this is where we expect points to fall if our theory is correct)
qqline(dta$dprime_tot)

Q–Q Plot

Data Vis for One Sample t

Histogram

Histogram with Distributions

Histogram with Distributions (Code)

Not going over this but here is the code if you’re interested.

null_dist <- function(x) {
  dnorm(x, 0, 0.978)
}

alt_dist <- function(x) {
  dnorm(x, 0.149, 0.978)
}

ggplot(dta, aes(dprime_tot)) +
  scale_x_continuous(breaks = seq(-4, 4, by = 1)) +
  geom_histogram(aes(y = ..density..), binwidth = .3, fill = "#dee2e6",
    colour = "#212529") +
  geom_function(fun = alt_dist, linetype = 1) +
  geom_function(fun = null_dist, linetype = 2) +
  theme_minimal(base_family = "Fira Sans") +
  theme(
    axis.text.y = element_blank()
  ) +
  labs(
    title = "Distribution of sensitivity scores",
    subtitle = paste(
      "Curves show theoretical distributions under the null",
      "and alternative hypotheses"
    ),
    x = NULL,
    y = NULL
  )

Linerange

Linerange (Code)

We won’t go over this, but here is the code for those interested.

dta_long <- stack(dta, select = c("f_dprime", "m_dprime", "dprime_tot"))

levels(dta_long$ind) <- c("Female", "Male", "Both")

ggplot(dta_long, aes(x = ind, y = values)) +
  geom_hline(yintercept = 0, colour = "#e9e9e9", size = 2) +
  stat_summary(fun.data = mean_cl_normal,
    geom = "errorbar",
    width = 0.05
  ) +
  stat_summary(fun.data = mean_cl_normal,
    fun.args = list(conf.int = .90),
    geom = "linerange",
    size = 1.5
  ) +
  stat_summary(fun = mean,
    geom = "point",
    size = 3,
    shape = 21,
    fill = "white"
  ) +
  coord_flip() +
  theme_minimal() +
  labs(x = "Target Gender", y = "Sensitivity")

Effect Sizes for One Sample t

Effect size

@allison_horst

Review: Why are effect sizes important?

  • Strength of evidence, not a dichotomous decision
  • Can’t hide a tiny effect behind a low p value
  • Standardized effect sizes are comparable between studies & effects

Effect Sizes for One-sample t

Most commonly reported effect sizes for a one-sample t test are:

  • Raw mean.
  • Cohen’s d/Cohen’s dz

Raw Mean

The mean effect size in the original units of measurement. For example,

“sensitivity scores (M = .15; SD = .98) averaged across targets and participants were slightly better than chance (i.e., 0), t(401) = 3.053, p = .002, 95% CI [.05, .25] d = .30.”

Pros & Cons of Raw Effect Size

Pros

  • Maintains original measurement units.
  • More meaningful interpretation than standardized ES.
  • Helps establish more meaningful measures.

Cons

  • Not comparable across measures.
  • Requires understanding of the measure to interpret.

Should I Report Raw Mean Effect Size?

Yes. Basically always report:

“…per-cell sample sizes, observed cell means, […] and cell standard deviations…”

Remember to interpret the raw effect size as well.

Calculating Raw Mean ES in R

mean(dta$dprime_tot)
## [1] 0.1490299
sd(dta$dprime_tot)
## [1] 0.9783241

Cohen’s d

Formula for Cohen’s d

\(d = \frac{M_1 - M_2}{\sigma}\)

Where \(\sigma\) is the pooled standard deviation. That is:

\(\sqrt{\frac{SD_1^2 + SD_2^2}{2}}\)

Effect size in the units of pooled standard deviations.

Pros & Cons of Cohen’s d

Pros

  • Extremely widely used.
  • Compare between multiple analyses, studies, measures.

Cons

  • Pooled SD is not an easily interpretable metric.
  • Over-reliance on “rules of thumb” for interpreting ES.
  • The substantive interpretation of, e.g., d = 0.3, is not constant.
  • Ignores design characteristics of within-subjects designs.

Cohen’s d for One-sample t

Let’s look at the formula again:

\(d = \frac{M_1 - M_2}{\sqrt{\frac{1}{2}(SD_1^2 + SD_2^2)}}\)

For a one-sample t test, what is:

  • M1? The sample M.
  • M2? The population mean, μ
  • SD1? The sample SD
  • SD2? The population SD; assumed that SD2 = SD1.

Knowing this, we can simplify the formula for the one-sample case to…

Cohen’s dz

\(d_z = \frac{M - \mu}{SD}\)

So for one sample, assuming SD1 = SD2, Cohen’s d and Cohen’s dz are the same.

tRy it! Cohen’s dz in R

Since μ = 0, dz is simply, \(M/SD\).

mean(dta$dprime_tot) / sd(dta$dprime_tot)
## [1] 0.1523318

Does this match what Lloyd et al. reported? No. We’ll show why later.

Cohen’s dz from t

We can convert the t value to a d value using t2d() from the package psych. t2d() computes d in up to 3 different ways, depending on whether values are supplied to the arguments n, n1, and n2.

  • n = the total sample size; computes Cohen’s d assuming balanced groups.
  • n1 = the sample size of group 1; if n2 = NULL, Cohen’s dz.
  • n2 = the sample size of group 2; if n1 and n2 are supplied, computes Cohen’s d.

tRy it! Compute Cohen’s dz Using t2d()

  1. Add a call to library(psych) at the top of your script.
  2. Compute Cohen’s dz.
psych::t2d(t = 3.0542, n1 = 402)
## [1] 0.1523297
psych::t2d(t = dprime_ost$statistic,
  n1 = 402)
##         t 
## 0.1523318

What Did Lloyd et al. Report?

Cohen’s d for independent samples.

psych::t2d(t = dprime_ost$statistic, n = 402)
##         t 
## 0.3046636

Power Analysis for One-sample t

Power Analysis in R

library(pwr) # This goes at the top of your script!
pwr.t.test(n = NULL,
  d = 0.5,
  sig.level = 0.05,
  power = 0.95,
  type = "one.sample",
  alternative = "two.sided"
)
## 
##      One-sample t test power calculation 
## 
##               n = 53.94061
##               d = 0.5
##       sig.level = 0.05
##           power = 0.95
##     alternative = two.sided

Sensitivity Analysis in R

pwr.t.test(n = 402,
  d = NULL,
  sig.level = 0.05,
  power = 0.95,
  type = "one.sample",
  alternative = "two.sided"
)
## 
##      One-sample t test power calculation 
## 
##               n = 402
##               d = 0.1802227
##       sig.level = 0.05
##           power = 0.95
##     alternative = two.sided

Reporting Results in APA Style

One-sample t-test: Reporting

The authors do this very well!

“Similar to previous findings in the deception detection literature, sensitivity scores (M = .15, SD = .98) averaged across targets and participants were slightly better than chance (i.e., 0), t(401) = 3.053, p = .002, 95% CI [.05, .25], d = .30.”

Assignment

George A. Miller

the magical number seven

miller

We All Know the Feeling…

“My problem is that I have been persecuted by an integer. For seven years this number has followed me around, has intruded in my most private data, and has assaulted me from the pages of our most public journals. […] There is, to quote a famous senator, a design behind it, some pattern governing its appearances. Either there really is something unusual about the number or else I am suffering from delusions of persecution.”

Seems Pretty Magical to Me

“[Miller] made the one and only hole-in-one of his life at the age of 77, on the seventh green. He made it with a seven iron. He loved that.”

The Magical Number Seven

Miller’s Law

  • On average, people can hold about seven objects in working memory.

Assignment: Your Digit Span

Make a prediction.

My digit span is…

  1. not equal to…
  2. greater than…
  3. less than…

See You Next Time!