We should all learn how to create fake data

rich-ramsey.github.io/talks/ncm-pitch-25/

Richard Ramsey
www.rich-ramsey.com

Conceptual context

There is no means of testing which decision is better, because there is no basis for comparison. We live everything as it comes, without warning, like an actor going on cold. And what can life be worth if the first rehearsal for life is life itself?

Milan Kundera, The Unbearable Lightness of Being

How?

Use in-built R functions, for example:

rand_norm <- rnorm(n = 1000, mean = 0, sd = 1)

Benefits

Lisa Debruine outlines these benefits:

  1. Develop an understanding of data and statistical concepts.
  2. Plan your pre-registration.
  3. Estimate power/precision for a future study.
  4. Deal with potential confidentiality issues.
  5. Create demo data for teaching and tutorials.

Resources

  • A list of resources that justify the use of fake data from Andrew Gelman here.

  • The faux R package and some slides by Lisa Debruine.

  • Blog posts by Solomon Kurz here.

  • R code showing demo examples from my lab here.

  • A journal paper on understanding multi-level regression models via data simulation.