Researchers outline bias in epidemic research -- and offer new simulation tool to guide future work: Analysis yields way to improve data collection, c

1 Apr 2022

Science Daily

Share on XTweet

A team of researchers unpacks a series of biases in epidemic research, ranging from clinical trials to data collection, and offers a game-theory approach to address them, in a new analysis. The work sheds new light on the pitfalls associated with technology development and deployment in combating global crises like COVID-19, with a look toward future pandemic scenarios.

"Even today, empirical methods used by epidemic researchers suffer from defects in design and execution," explains Bud Mishra, a professor at New York University's Courant Institute of Mathematical Sciences and the senior author of the paper, which appears in the journal Technology & Innovation. "In our work, we illuminate common, but remarkably oft-overlooked, pitfalls that plague research methodologies -- and introduce a simulation tool that we think can improve methodological decision-making."

Even in an era when vaccines can be successfully developed in a matter of months, combatting afflictions in ways not imaginable in previous centuries, scientists may still be unwittingly hindered by flaws in their methods.

In the paper, Mishra and his co-authors, Inavamsi Enaganti and Nivedita Ganesh, NYU graduate students in computer science, explore some standard paradoxes, fallacies, and biases in the context of hypothesizing and show how they are relevant to work aimed at addressing epidemics. These include the Grue Paradox, Simpson's Paradox, and confirmation bias, among others:

The Grue Paradox

The authors note that research has often been hampered by errors linked to inductive reasoning, falling under what is known as the Grue Paradox. For example, if all emeralds observed during a given period are green, then all emeralds must be green. However, if we define "grue" as the property of being green up to a certain period in time and then blue thereafter, inductive evidence supports the conclusion that all emeralds are "grue" and supports the conclusion that all emeralds are green, preventing one from reaching a definitive conclusion on the color of emeralds.

"While constructing and comparing hypotheses in the context of epidemics, it is vital to identify the temporal dependence of the predicate," the authors write. These include hypotheses on the mutation of a virus, inducement of herd immunity, or recurring waves of infection.

Simpson's Paradox

"Simpson's Paradox is a phenomenon where trends that are observed in data when stratified into different groups are reversed when combined," the authors write. "This effect has widespread presence in academic literature and notoriously perverts the truth."

For instance, if in a clinical trial 100 subjects undergo Treatment 1 and 100 subjects undergo Treatment 2 with success rates of 40 percent and 37 percent, respectively, one would assume Treatment 1 is more effective. However, if you split these data by genetic markers -- say, Genetic Marker A and Genetic Marker B -- the efficacy of the treatments may yield different results. For example, Treatment 1 may look superior when you look at an aggregated population, but its worth may diminish for certain subgroups.

Confirmation Bias

The widely known Confirmation Bias, or the tendency to look for and recall data with greater emphasis when it supports a researcher's hypothesis, also plagues epidemic research, the authors note.

"This phenomenon can already be seen in the COVID-19 context in the selective marshaling of data to paint a picture that supports popular belief," they write. "For instance, evidence that supports countries practicing strict lockdown and social distancing improves public health has been given more weight than evidence suggesting countries relaxing their measures have a similar reduction in their caseloads. Additionally, other variables that could be as influential as lockdown, but are contextual and varied for different geographies, might have been ignored, such as population density or history of vaccinations."

In addressing these methodological challenges, the team created an open-source Epidemic Simulation platform (Episimmer) that seeks to provide decision support to help answer users' questions related to policies and restrictions during an epidemic.

Episimmer, which the researchers tested under several simulated public-health emergencies, performs "counterfactual" analyses, measuring what would have happened to an ecosystem in the absence of interventions and policies, thereby helping users discover and hone the opportunities and optimizations they could make to their COVID-19 strategies (Note: The platform's python package is available on this page: https://pypi.org/project/episimmer/ ). These could include decisions such as "Which days to be remote or in-person" for schools and workplaces as well as "Which vaccination routine is more efficient given the local interaction patterns?"

"Faced with a rapidly evolving virus, inventors must experiment, iterate, and deploy both creative and effective solutions while avoiding pitfalls that plague clinical trials and related work," says Enaganti.

The team carried out its research as part of a self-assembled larger multi-disciplinary international research group, dubbed RxCovea, and enabled its tools' deployment in India as part of Campus-Rakshak program.