Chi-square

Oct. 1

The Chi-square Test

Bridge In

Learning Outcomes

Upon completing this module, students will be able to:

  • Conduct \(\chi^2\) tests in R.
  • Create appropriate data visualizations for comparisons of categorical data.
  • Report results of \(\chi^2\) tests in APA style.
  • Accurately interpret the results of \(\chi^2\) tests.
  • Evaluate the strength of evidence provided by \(\chi^2\) tests in studies using \(\chi^2\) tests of independence for categorical data.

The Study: Reminders Through Association

Summary

Someone briefly remind us of the design of the study (focus on study 5).

What We’re Reproducing

Results from two \(\chi^2\) tests from study 5.

We’ll Do Together

“…participants are more likely to follow through when they are assigned a cue-based reminder (in the forced-reminder through-association condition, 87%) than when no cue-based reminder is available (none condition, 59%), \(\chi^2\)(1, N = 305) = 30.22, p < .001.”

On Your Own

“…those in the costly-reminder-through-association condition were not only more likely to earn the bonus (74%) than those in the none condition (59%), \(\chi^2\)(1, N = 297) = 7.23, p = .007,…”

Let’s Get Started

tRy it! Setup

Complete the steps in the “Setup” portion of the lab activity.

  1. Download “RTA_study5.csv” and “codebook database.xlsx” from Canvas.
  2. Import “RTA_study5.csv” to R.
  3. Convert the following variables to factors, condition, choice, and correct. Read from “codebook database.xlsx” to identify appropriate factor labels.

Import the Data

Import “RTA_study5.csv” to R.

dta <- read.csv("data/RTA_study5.csv")

Convert Variables to Factors

Convert the following variables to factors, condition, choice, and correct. Read from “codebook database.xlsx” to identify appropriate factor labels.

Convert condition to Factor

The codebook tells us the levels/labels for condition.

dta$condition <- factor(dta$condition,
  levels = 1:4,
  labels = c("Free", "None", "Costly", "All")
)
##  [1] Free   Costly Costly None   Free   Free   All    All    None   All   
## Levels: Free None Costly All

Convert choice to Factor

What does choice tell us?

dta$choice <- factor(dta$choice,
  levels = c(0, 1),
  labels = c("did not take reminder", "took reminder")
)
##  [1] took reminder         took reminder         did not take reminder
##  [4] did not take reminder did not take reminder did not take reminder
##  [7] did not take reminder did not take reminder did not take reminder
## [10] took reminder        
## Levels: did not take reminder took reminder

Convert correct to Factor

What does correct tell us?

dta$correct <- factor(dta$correct,
  levels = c(0, 1),
  labels = c("incorrect", "correct")
)
##  [1] correct   incorrect incorrect correct   correct   correct   incorrect
##  [8] correct   correct   correct  
## Levels: incorrect correct

Reproduce Results

Result 1

“…participants are more likely to follow through when they are assigned a cue-based reminder (in the forced-reminder through-association condition, 87%) than when no cue-based reminder is available (none condition, 59%), \(\chi^2\)(1, N = 305) = 30.22, p < .001.”

tRy it! Subset & Drop Levels

  1. Create a subset of your data.frame that includes only the relevant levels of condition.
  2. Use droplevels() to drop the extra levels of condition.

Hint: You can use either | or %in% to subset with one line of code. Otherwise, you could do it in two steps.

Subset

Option 1: Multiple Steps

dta1 <- dta
dta1 <- subset(dta1, condition != "Free")
dta1 <- subset(dta1, condition != "Costly")

Subset (2)

Option 2: Using |

Remember that | means “or”.

dta1 <- subset(dta, condition == "All" | condition == "None")

Can you imagine a situation where this approach might be unwieldy?

Subset (3)

Option 3: Using %in%

dta1 <- subset(dta, condition %in% c("All", "None"))

Check Your Work (an aside)

Why should you check your work as you go?

  • Just because R didn’t return an error, doesn’t mean your code did what you wanted.
  • Sometimes errors later in the code are the result of an unrecognized mistake earlier.
  • Code that does the wrong thing but doesn’t return an error is harder to catch.
  • It saves time “debugging” down the line.

Did Our Subsetting Work?

We don’t need to see this in your code

We can check with the function all(), which returns:

  • TRUE if all values in the vector are TRUE.
  • FALSE if any values in the vector are FALSE.
  • NA otherwise.

Did Our Subsetting Work? (2)

all(dta1$condition %in% c("All", "None"))
## [1] TRUE

all() can be a useful tool for testing your code.

Did Our Subsetting Work? (3)

Alternatively, we can use summary(), which will count the number of times each factor level occurs.

summary(dta1$condition)
##   Free   None Costly    All 
##      0    153      0    152

Drop Extra Levels

Why do we need to do this?

levels(dta1$condition)
## [1] "Free"   "None"   "Costly" "All"

Drop Extra Levels (2)

dta1$condition <- droplevels(dta1$condition)
levels(dta1$condition)
## [1] "None" "All"

Mosaic Plots

A Mosaic Plot