Why Real-World Evidence Needs Causal Inference

The Promise—and the Problem

The last decade has witnessed an extraordinary shift in how we generate medical evidence. Real-world data (RWD)—drawn from electronic health records, insurance claims, patient registries, and platforms like the All of Us Research Program—now covers hundreds of millions of patients, far exceeding the reach of any randomized controlled trial.

Regulatory agencies have taken notice. The FDA's Real-World Evidence Program, formalized under the 21st Century Cures Act, explicitly endorses using RWE for regulatory decisions. The European Medicines Agency has followed with its own framework. The question is no longer whether we will use real-world data—it's whether we will use it correctly.

372M+

Patient records in major RWD databases worldwide

70–80%

of real patients excluded from typical Phase III RCTs

$100–300M

Average cost of a single Phase III trial

The problem is deceptively simple: observational data cannot, by default, answer causal questions. Patients who receive a drug differ systematically from those who don't. Doctors prescribe based on severity, comorbidities, and intuition—all of which confound the treatment-outcome relationship. Without explicit causal methodology, even sophisticated machine learning applied to millions of records will produce associations that look like effects but aren't.

“The plural of anecdote is not data. And the plural of correlation is not causation—no matter how large the database.”

— Adapted from Hernán & Robins, Causal Inference: What If (2020)

The Causal Gap in Current RWE Practice

Most published RWE studies still rely on regression adjustment or propensity score matching—methods that handle baseline confounding but fail catastrophically in the presence of time-varying confounders affected by prior treatment. In cardiovascular disease prevention, for instance, a patient's LDL cholesterol at month six is both a confounder for future treatment decisions and a mediator of earlier treatment effects. Standard regression cannot disentangle this.

The Three Invisible Biases

Immortal time bias inflates treatment benefits by misclassifying follow-up time. Confounding by indication occurs when sicker patients receive treatment and appear to do worse. Selection bias emerges when analyses condition on post-treatment variables. Each is invisible to standard statistical tests—and each requires causal reasoning to address.

The consequences are not theoretical. Observational studies using naive methods have repeatedly produced results that contradicted subsequent RCTs—overstating benefits of hormone replacement therapy for cardiovascular protection, understating risks of COX-2 inhibitors, and generating spurious associations between statins and cancer. These failures weren't caused by bad data. They were caused by the absence of causal structure.

The Causal Toolkit for RWE

What does it mean to apply causal inference to real-world evidence? It means moving beyond prediction and association to explicitly model the data-generating process—the mechanisms by which treatments are assigned, confounders evolve, and outcomes occur. Three methodological families form the core of this toolkit:

1. Target Trial Emulation

Developed by Hernán, Robins, and colleagues, target trial emulation asks: what randomized trial would we ideally conduct to answer this question? Then it specifies how to approximate that trial using observational data—including eligibility criteria, treatment strategies, causal contrasts, and time zero alignment. This framework has been shown to eliminate many common biases in observational research, including immortal time bias and prevalent user bias.

Target Trial Protocol — Polypill vs. Dual Therapy

# My dissertation design (All of Us, n = 24,788)

Eligibility:    Primary prevention, no prior CVD, age ≥ 40
Treatment A:   Polypill (antihypertensive + statin + aspirin)
Treatment B:   Dual therapy (antihypertensive + statin only)
Time zero:     Date of first qualifying prescription
Outcome:       First MACE event (MI, stroke, CV death)
Follow-up:     5 years, intention-to-treat analogue
Causal contrast: E[Y(polypill)] − E[Y(dual therapy)]
Reference:     Muñoz et al., NEJM 2019

2. The Parametric G-Formula

When confounders change over time and are affected by prior treatment—as they almost always are in chronic disease—standard methods produce biased estimates regardless of sample size. The parametric G-formula handles this by modeling the joint distribution of time-varying confounders and outcomes under each treatment strategy, then simulating counterfactual outcomes via Monte Carlo methods.

The G-formula answers a fundamentally different question than regression: not "what is the association between treatment and outcome adjusting for covariates?" but rather "what would happen if we intervened to give everyone treatment A versus treatment B, accounting for how confounders evolve over time?"

3. Causal Bayesian Networks

Dynamic Bayesian Networks (DBNs) encode the causal structure of a system as a directed acyclic graph with temporal dynamics. Unlike the G-formula, which requires correct specification of parametric models for every time-varying variable, DBNs learn the conditional dependence structure from data and can represent complex feedback loops between treatment, biomarkers, and outcomes.

The key advantage is transparency: the causal graph itself is an explicit, inspectable hypothesis about the world. When two causal methods produce different estimates on the same data, the discrepancy often reveals misspecification that would have been invisible using either method alone.

Simplified DAG — Time-Varying Confounding with Mediation

Why Triangulation Matters

No single causal method is assumption-free. The G-formula requires correct parametric specification. Target trial emulation assumes no unmeasured confounding given the measured covariates. Bayesian networks assume faithfulness between the graph and the data distribution. Each method has blind spots.

The solution is methodological triangulation: applying multiple causal approaches to the same data and question, then examining whether they converge or diverge. When they converge, confidence increases. When they diverge, the discrepancy itself becomes informative—pointing to model misspecification, unmeasured confounding, or violations of positivity.

Method	Handles Time-Varying Confounding	Assumption Transparency	Data Requirements
Propensity Score Matching	Baseline only	Low (black-box weights)	Moderate
Target Trial Emulation + IPTW	Yes (with MSMs)	High (explicit protocol)	Moderate–High
Parametric G-Formula	Yes (full joint model)	Medium (parametric models)	High
Clone-Censor-Weight	Yes (sustained strategies)	High (explicit cloning)	High
Dynamic Bayesian Networks	Yes (structural learning)	Very High (visible graph)	Very High

The Regulatory Imperative

The FDA's March 2024 guidance on non-interventional studies for drug and biological products makes this explicit: regulatory submissions using RWE must articulate a causal framework, identify potential confounders, and demonstrate that the study design addresses them. Simply adjusting for baseline covariates no longer suffices.

This isn't bureaucratic formalism. Regulatory agencies have been burned by observational studies that produced misleading results because they lacked causal rigor. The new standard demands what the Duke-Margolis Center for Health Policy calls a "Causal Roadmap"—a prespecified protocol that mirrors the target trial framework, complete with sensitivity analyses for unmeasured confounding.

What the FDA Now Expects

A clearly defined causal estimand (not just an "association"). A target trial protocol. Identification and measurement of time-varying confounders. Sensitivity analyses for unmeasured confounding. Transparent reporting of assumptions. This represents a fundamental shift from correlational RWE to causal RWE.

From Correlation at Scale to Causation at Scale

The opportunity is enormous. Platforms like the All of Us Research Program provide linked genomic, clinical, and survey data on hundreds of thousands of participants—the kind of granular, longitudinal data that causal methods were designed for. When combined with target trial emulation and methods like the G-formula, this data can answer questions that would take years and hundreds of millions of dollars to address through RCTs.

My dissertation applies exactly this approach: using the All of Us platform (n = 24,788) to emulate a target trial comparing polypill versus dual therapy for primary cardiovascular prevention, with triangulation across the parametric G-formula, Clone-Censor-Weight methodology, and Dynamic Bayesian Networks.

But the ambition here extends beyond any single study. What's emerging is a new paradigm for evidence generation—one where causal inference provides the methodological backbone that transforms real-world data from a source of hypotheses into a source of actionable, reliable evidence for clinical and regulatory decisions.

“The companies and institutions that master causal inference for RWE first will have an enormous competitive advantage—faster evidence generation, more comprehensive patient coverage, and regulatory differentiation.”

What Needs to Happen Next

For researchers: Pre-specify your causal estimand and target trial protocol before touching the data. Use DAGs to make your causal assumptions explicit and debatable. Triangulate across methods. Report negative controls and sensitivity analyses.

For pharma and regulatory science: Invest in causal inference infrastructure, not just bigger databases. The bottleneck is no longer data volume—it's methodological rigor. Companies that build internal causal inference capabilities will generate regulatory-grade evidence faster and at lower cost than those relying on traditional approaches.

For the field: Make causal reasoning teachable. Most clinical researchers receive extensive training in regression and survival analysis but minimal exposure to DAGs, do-calculus, or the G-formula. Closing this gap is prerequisite to the responsible use of RWE.

Real-world evidence is not the future of drug development—it's the present. The question is whether we'll use it with the causal rigor it demands, or whether we'll repeat the mistakes of correlational epidemiology at unprecedented scale.

The infrastructure exists. The methods are mature. The regulatory path is open. What's missing is the will to build on causal foundations rather than correlational convenience.

References

Hernán MA, Robins JM. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC, 2020.
Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183(8):758–764.
Muñoz D, et al. Polypill for cardiovascular disease prevention in an underserved population. N Engl J Med. 2019;381:1114–1123.
Robins JM. A new approach to causal inference in mortality studies with sustained exposure periods. Math Modelling. 1986;7:1393–1512.
Food and Drug Administration. Real-World Evidence: Considerations Regarding Non-Interventional Studies for Drug and Biological Products Guidance for Industry. March 2024.
Petersen ML, et al. A causal roadmap for generating high-quality real-world evidence. J Clin Transl Sci. 2023;7(1):e211.
Berger ML, et al. Real-world evidence, causal inference, and machine learning. Value Health. 2019;22(5):587–592.
Mariani S, et al. Clone-censor-weight approach for causal inference in sustained treatment strategies. Stat Med. 2022;41(22):4370–4390.
Scutari M, Denis JB. Bayesian Networks with Examples in R. 2nd ed. CRC Press, 2021.
All of Us Research Program. The All of Us Research Program. N Engl J Med. 2019;381:668–676.