New Study Examines When Correlation Can Be Causation

Since 1897, statisticians have claimed that correlation is not causation. Statisticians, such as Karl Pearson, effectively removed any discussion of causation from scientific studies. In recent years, statisticians have discussed methods of detecting cause and effect and determined four criteria that must be present. These four principles have led to creation of network models of causation.

In a new study, Farrokh Alemi, professor in the Department of Health Administration and Policy, along with Manaf Zargoush of the DeGroote School of Business at McMaster University and Jee Vang, a doctoral student at George Mason University, explain how sequence of events can be combined with association among events to improve detection of causal relationships and how network models can be improved if sequence is more rigorously taken into account. The study is published in Health Care Management Science.

The authors used longitudinal data to establish sequence and then analyzed cross-sectional data to discover relationships among the variables, while constraining the data to reflect the observed sequence. The procedure can be applied to massive data in electronic health records to identify new relationships. It might help make new discoveries in science.

“Many of the current methods of data analysis do not take into account the sequence of events,” Alemi said. “By including sequence into these methods, the results lead to stronger causal interpretations. For example, we are studying whether prolonged use of antibiotics affects the human biome and, in the long run, causes diabetes. Investigation of a medication’s side effects would be strengthened with causal interpretation of the data. We would know what side effects are caused by the medication and which ones are just a spurious correlation in the data.”

This project was funded by appropriation #3620160 from the VA Office of Geriatrics and Extended Care to Cari Levy.