Framework for Minimizing Bias in Data Science

Project summary

  • Astronomy studies celestial objects and phenomena;
  • Epidemiology studies frequency and causes of disease in humans.

The two project leaders Daniela Huppenkothen and Konrad H. Stopsack and their team members Marcel S. Pawlowski and Denzil Correa want to compare two typical approaches between astronomy and epidemiology.

At first glance, these two disciplines could not be more different. What unites them is a reliance on observational data. In contrast to experimental data, where researchers actively manipulate a key experimental condition, astronomers cannot change the temperature of a star billions of kilometers away in order to study the effect of temperature on brightness. Epidemiologists cannot randomly assign people to smoking vs. no smoking when studying lung cancer. 

In order to still draw causal inferences from observational data, epidemiology has developed a powerful theoretical framework over the past three decades to minimize selection bias, information bias, and confounding. Consequently, epidemiologists can make causal statements—that smoking indeed causes lung cancer—, not just predictions—that smokers are more likely to get lung cancer because of smoking or because of some associated characteristic. 

The project’s objective is is to build a set of tutorials and teaching materials, as well as co-author a scientific paper that introduces these key principles into their fields. Further progress on this project will be presented in early 2020.