Prediction-Based Inference Short Course

Methods & Applications

Author

Tyler H. McCormick

Published

Sunday, March 15, 2026 | 1:00 pm PDT

Workshop Goals and Objectives

Learning Goals:

  • Understand limitations in using predicted data for inference.
  • Learn about methods that correct for bias and recover valid uncertainty estimates.
  • Gain practical skills using the ipd R package.

Learning Objectives:

  • Explore data with AI/ML-predicted outcomes and diagnose bias/variance in predictions.
  • Apply ipd::ipd() to continuous and binary outcomes.
  • Interpret prediction-based (PB) inference outputs and visualize model results.

Time Outline (4 hours)

Activity Time
Overview & Introductions 40 m
Short Break 5 m
Getting Started 30 m
Break 20 m
Module 1: Measuring Adiposity 30 m
Short Break 5 m
Module 2: Proteomics with AlphaFold 30 m
Break 20 m
Module 3: Rashomon Quartet 40 m
Wrap-Up, Conclusions, and Questions 20 m

Quick Start

The companion website for this workshop is available at:

https://thmccormick.github.io/ipd-short-course

To use the workshop image:

docker run -e PASSWORD=<choose_a_password_for_rstudio> -p 8787:8787 ghcr.io/thmccormick/ipd-short-course:latest

Once running, navigate to http://localhost:8787/ and then log in with rstudio:yourchosenpassword.

Then begin!

Short Course Overview

In this short course, we explore the consequences of conducting inference on predicted data across several applications and present a suite of prediction-based (PB) inference methods that adjust for prediction-related uncertainty to improve inference validity and efficiency. We also introduce ipd, a user-friendly R package that implements the PB inference methods through a unified interface. The package supports modular integration into existing workflows and includes tidy methods for model inspection and diagnostics.

Modules

This short course covers four modules, each illustrated with the ipd package:1

Supplemental Module

We have also included an additional module for self-guided exploration:

Participation

This short course uses a blended format of instruction and hands-on coding exercises. Participants should:

  • Follow along in the virtual RStudio environment (see below).
  • Attempt to complete brief exercises or run the solution code snippets in real time.
  • Engage in Q&A at module boundaries to troubleshoot and discuss concepts.

Prerequisites

  • A computer with internet access.
  • Familiarity with base R and tidyverse syntax (e.g., dplyr, broom).
  • Basic understanding of predictive (e.g., randomForest) and regression modeling (e.g., lm, glm).
  • Optional: Exposure to Bioconductor’s ExpressionSet, AnnotationDbi, and MLInterfaces is helpful for one of the supplemental modules.

Instructor and Acknowledgement

Presenter: Tyler H. McCormick ✉︎

Acknowledgement: This short course adapts and extends Stephen Salerno’s original IPD workshop website and materials.

Original Workshop Contributors (Alphabetical Order): Awan Afiaz ✉︎, David Cheng ✉︎, Jianhui Gao ✉︎, Jesse Gronsbell ✉︎, Kentaro Hoffman ✉︎, Jeff Leek ✉︎, Qiongshi Lu ✉︎, Tyler McCormick ✉︎, Jiacheng Miao ✉︎, Anna Neufeld ✉︎, Stephen Salerno ✉︎

Footnotes

  1. Module card cover images were generated by GPT-5.2.↩︎