Data analysis and statistics

rich-ramsey.github.io/talks/sbs-retreat-25/

Richard Ramsey
www.rich-ramsey.com

Aim

Provide the lab with a general context for thinking about data analysis and statistics.

Warning

I’m not a statistician. I’m not a statistician. I’m not a statistician…

Overview

Background.
Caution.
Regression.
Summary.

Background

Sampling from a larger population

Image from: https://danawanzer.github.io/stats-with-jamovi/

My background

Cognitive neuroscience / experimental psychology.
Social perception and cognition.
I am not a statistician. (Did I say that already?)
More recently: open science, methods and meta-science.

My background

My undergrad and postgrad stats classes looked like this:

Each week described a different statistical test.
Your job was to choose the right test for a given type of data and run the test (usually via point-and-click in SPSS).
Then you interpret the p-value.
Job done.
Easy, right?

My background

In the wake of the reproducibility crisis, I felt the need to become more statistically literate.
Enter:

Richard McElreath’s excellent textbook (McElreath 2020)
Solomon Kurz’s brilliant translation into tidyverse principles (Kurz 2023).
Various papers and books by Andrew Gelman.

Caution

Champagne inference on a beer budget

A quote from Andrew Gelman (Gelman, 2024):

once the data have been collected, the most important decisions have already been done

Champagne inference on a beer budget

All statistical models are fundamentally limited and need to be framed within the wider scientific context (McElreath 2020), such as:

The importance of theory
Open data and materials
Pre-registration
Meta-analyses

Computational modelling
Data science
Experimental design
And many more besides

Champagne inference on a beer budget

Before we make inferences and draw conclusions, we should spend more time (Scheel et al. 2021):

Forming concepts.
Developing valid measures.
Identifying boundary conditions and auxillary assumptions.
And so on…

Champagne inference on a beer budget

Measurement schmeasurement

We demonstrate that psychology is plagued by a measurement schmeasurement attitude: questionable measurement practices are common, hide a stunning source of researcher degrees of freedom, pose a serious threat to cumulative psychological science, but are largely ignored.

(Flake and Fried 2020)

Caution (!) is required

Statistical inference is not magical.
Inferences rest on many assumptions and data quality.
Inferences are likely to be fragile/tentative/suggestive.
So, be cautious!
And try to create research designs that do not rely too heavily on one particular part of your inferential model.

Towards statistical thinking

Develop a statistical philosophy rather than rely on historical statistical rituals (Gigerenzer 2018).
There are many different approaches to the same question.
Have a sense of how your approach fits into the mix of options, in terms of pros and cons.
Be able to defend and justify your choices explicitly.

# Regression

Single-level linear regression

\[\color{red}{Y_i} = \color{orange}{\beta_0} + \color{green}{\beta_1} \color{blue}{X_i} + \color{violet}{\varepsilon_i}\]

\(\color{red}{Y_i}\): The outcome/response variable for observation \(i\)
\(\color{orange}{\beta_0}\): The intercept (value of \(Y\) when \(X = 0\))
\(\color{green}{\beta_1}\): The slope (change in \(Y\) for one unit increase in \(X\))
\(\color{blue}{X_i}\): The predictor variable for observation \(i\)
\(\color{violet}{\varepsilon_i}\): The error term (residual) for observation \(i\)

Common statistical tests are forms of linear regression

https://lindeloev.github.io/tests-as-linear/

Homework

Go away, get some data and run single level regressions in R.
Use different types of data - continuous and categorical predictors etc.
I bet there are a million tutorials online.
Aim: understand the basics of single level regression - the rest builds on this core foundation.

Multi-level linear regression

\[\color{red}{Y_{ij}} = (\color{orange}{\gamma_{00}} + \color{yellow}{u_{0j}}) + (\color{green}{\gamma_{10}} + \color{cyan}{u_{1j}})\color{blue}{X_{ij}} + \color{violet}{\varepsilon_{ij}}\]

\(\color{red}{Y_{ij}}\): The outcome variable for observation \(i\) in group \(j\)
\(\color{orange}{\gamma_{00}}\): The fixed effect intercept (population average)
\(\color{yellow}{u_{0j}}\): The random/varying intercept for group \(j\)
\(\color{green}{\gamma_{10}}\): The fixed effect slope (population average)
\(\color{cyan}{u_{1j}}\): The random/varying slope for group \(j\)
\(\color{blue}{X_{ij}}\): The predictor variable for observation \(i\) in group \(j\)
\(\color{violet}{\varepsilon_{ij}}\): The error term for observation \(i\) in group \(j\)

Multi-level regression: why bother?

One general and versatile way to approach data analysis.
It avoids picking the “right” statistical test.
There is an active community of users and lots of resources.
It is suitable for most data in psychology and human neuroscience, which typically has a nested (multi-level) structure e.g., sub-groups within a bigger group.
It takes advantage of partial pooling / shrinkage.

Summary vs trial-level data

Partial pooling or shrinkage

let’s build in an intuition

Partial pooling or shrinkage

let’s plot it

Partial pooling or shrinkage

let’s plot it

Fixed and varying effects

# specify the model formula with only fixed or population-level terms
formula = dv ~ 1 + condition 


# and now add varying intercepts per participant
formula = dv ~ 1 + condition + 
  (1 | participant)


# and now add varying intercepts and slopes per participant
formula = dv ~ 1 + condition + 
  (1 + condition | participant)

Estimation approaches - basics

Adapted from Kruschke and Liddell (2018)

Estimation approaches - parameters

Estimation approaches - wider reading

Summary

Science needs more David Bowie

Statistical reform is important.
But science is not a one-trick pony.
We need more David Bowie.

https://vocal.media/beat/reinventing-david-bowie

Thanks for your attention

And thanks to these fine folks:

John Bartlett for his tutorial on reproducible presentations in R (this is solid gold).
Lisa Debruine for sharing lots of example presentations

Resources:

Unless otherwise specified, icons were used under license from The Noun Project
Slides were created with Quarto and RevealJS

And here’s my stuff

Slides: www.rich-ramsey.github.io/talks/sbs-retreat-25/
Code: https://github.com/rich-ramsey/talks
Website: www.rich-ramsey.com
Github: https://github.com/rich-ramsey

References

Flake, Jessica Kay, and Eiko I. Fried. 2020. “Measurement Schmeasurement: Questionable Measurement Practices and How to Avoid Them.” Advances in Methods and Practices in Psychological Science 3: 456–65. https://doi.org/10.1177/2515245920952393.

Gigerenzer, Gerd. 2018. “Statistical Rituals: The Replication Delusion and How We Got There.” Advances in Methods and Practices in Psychological Science 1 (2): 198–218. https://doi.org/10.1177/2515245918771329.

Kruschke, J. K., and T. M. Liddell. 2018. “The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective.” Psychon Bull Rev 25 (1): 178–206. https://doi.org/10.3758/s13423-016-1221-4.

Kruschke, John. 2014. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. Academic Press. https://books.google.com?id=FzvLAwAAQBAJ.

Kurz, A. Solomon. 2023. Statistical Rethinking with Brms, Ggplot2, and the Tidyverse: Second Edition. Version 0.4.0. https://bookdown.org/content/4857/.

McElreath, Richard. 2020. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC press.

Scheel, Anne M., Leonid Tiokhin, Peder M. Isager, and Daniël Lakens. 2021. “Why Hypothesis Testers Should Spend Less Time Testing Hypotheses.” Perspectives on Psychological Science 16: 744–55. https://doi.org/10.1177/1745691620966795.

Winter, Bodo. 2019. Statistics for Linguists: An Introduction Using R. Routledge. https://books.google.com?id=8cbADwAAQBAJ.