Framework for Minimizing Bias in Data Science
- Astronomy studies celestial objects and phenomena
- Epidemiology studies frequency and causes of disease in humans
The two project leaders Daniela Huppenkothen and Konrad H. Stopsack and their team members Marcel S. Pawlowski and Denzil Correa want to compare two typical approaches between astronomy and epidemiology.
At first glance, these two disciplines could not be more different. What unites them is a reliance on observational data. In contrast to experimental data, where researchers actively manipulate a key experimental condition, astronomers cannot change the temperature of a star billions of kilometers away in order to study the effect of temperature on brightness. Epidemiologists cannot randomly assign people to smoking vs. no smoking when studying lung cancer.
In order to still draw causal inferences from observational data, epidemiology has developed a powerful theoretical framework over the past three decades to minimize selection bias, information bias, and confounding. Consequently, epidemiologists can make causal statements — that smoking indeed causes lung cancer — not just predictions, that smokers are more likely to get lung cancer because of smoking or because of some associated characteristic.
The project’s objective is to build a set of tutorials and teaching materials, as well as co-author a scientific paper that introduces these key principles into their fields. The project has so far generated extensive notes, which form the basis of a future publication as
well as teaching materials. Because all project members live in different locations, all meetings have been online so far. A planned multi-day face-to-face meeting to finalize a number of project deliverables had to be cancelled due to the ongoing COVID-19 pandemic.
In a second phase, outside of the scope of the current project, the project leaders will seek to develop teaching materials and attempt to obtain feedback on them through AlumNode.