Air Pollution, COVID19, and Race: Data Science Challenges and Opportunities

Thursday, October 22, 2020, 2:30-3:50pm, virtual access only

Francesca Dominici, PhD
Clarence James Gamble Professor of Biostatistics, Population and Data Science
Harvard T.H. Chan School of Public Health

Title: Air Pollution, COVID19, and Race: Data Science Challenges and Opportunities

Abstract: The coronavirus will likely kill thousands of Americans. But what if I told you about a serious threat to American national security. This emergency comes from climate change and air pollution.

To help address this threat, we have developed an artificial neural network model that uses on-the-ground air-monitoring data and satellite-based measurements to estimate daily pollution levels dividing the country into 1-square-kilometer zones across the continental U.S. We have paired this information with health data contained in Medicare claims records from the last 12 years, which includes 97% of the population ages 65 or older. We also developed statistical methods for causal inference and computational efficient algorithms for the analysis of over 550 million health records.

The result? This data science platform is telling us that federal limits on the nation’s most widespread air pollutants are not stringent enough. Our research shows that short- and long-term exposure to air pollution is killing thousands of senior citizens each year.

Our research shows the critical new role of data science in public health and the associated methodological challenges. For example, with enormous amounts of data, the threat of unmeasured confounding bias is amplified, and causality is even harder to assess with observational studies. We will discuss these and other challenges.

Prediction, Estimation, and Attribution

Thursday, October 1, 2020, 2:30-3:50pm, virtual access only

Bradley Efron, PhD
Max H. Stein Professor and Professor of Statistics and of Biomedical Data Science
Stanford University

Title: Prediction, Estimation, and Attribution

Abstract: The scientific needs and computational limitations of the Twentieth Century fashioned classical statistical methodology. Both the needs and limitations have changed in the Twenty-First, and so has the meth-odology. Large-scale prediction algorithms – neural nets, deep learning, boosting, support vector machines, random forests – have achieved star status in the popular press. They are recognizable as heirs to the regression tradition, but ones carried out at enormous scale and on titanic data sets. How do these algorithms compare with standard regression techniques such as Ordinary Least Squares or logistic regression? Several key discrepancies will be examined, centering on the differences between prediction and estimation or prediction and attribution (that is, significance testing). Most of the discussion is carried out through small numerical examples. The talk does not assume familiarity with prediction algorithms.

Statistical challenges in the analysis of single-cell RNA-seq from brain cells

Thursday, March 18, 2021, 2:30-3:50pm, virtual access only

Kathryn Roeder
UPMC Professor of Statistics and Life Sciences in the Departments of Statistics and Data Science and Computational Biology
Carnegie Mellon University

Title: Statistical challenges in the analysis of single-cell RNA-seq from brain cells

Abstract: Quantification of gene expression using single cell RNA-sequencing of brain tissues, can be a critical step in the understanding of cell development and differences between cells sampled from case and control subjects.   We describe statistical challenges encountered analyzing expression of brain cells in the context of two projects. First, over-correction has been one of the main concerns in employing various data integration methods that risk removing the biological distinctions, which is harmful for cell type identification. Here, we present a simple yet surprisingly effective transfer learning model named cFIT for removing batch effects across experiments, technologies, subjects, and even species. Second, gene co-expression networks yield critical insights into biological processes, and single-cell RNA sequencing provides an opportunity to target inquiries at the cellular level.  However, due to the sparsity and heterogeneity of transcript counts, it is challenging to construct accurate gene co-expression networks.  We develop an alternative approach that estimates cell-specific networks for each single cell. We use this method to identify differential network genes in a comparison of cells from brains of individuals with autism spectrum disorder and those without.

Biomedical AI: Its Roots, Evolution, and Early Days at Stanford

Thursday, February 4, 2021, 2:30-3:50pm, virtual access only

Edward H. (Ted) Shortliffe, MD, PhD

Chair Emeritus and Adjunct Professor, Department of Biomedical Informatics, Columbia University
President and CEO Emeritus, American Medical Informatics Association (AMIA)
Senior Executive Consultant, IBM Watson Health
Former Professor of Medicine and Computer Science, Stanford University (1979-2000)
Founding Director, Biomedical Informatics Training Program @ Stanford

Seminar Title: Biomedical AI: Its Roots, Evolution, and Early Days at Stanford

Abstract: Five decades have passed in the evolution of Artificial Intelligence in Medicine (AIM), a field that has evolved substantially while tracking the corresponding changes in computer science, hardware technology, communications, and biomedicine. Emerging from medical schools and computer science departments in its early years, the AIM field is now more visible and influential than ever before, paralleling the enthusiasm and accomplishments of AI more generally. This talk will briefly summarize some of AIM history, providing an update on the status of the field as we enter our second half-century. My remarks on this subject will emphasize the role that Stanford played in the emergence of the field. They will also offer the perspective of an informatics journal editor-in-chief who has seen many state-of-the-art AIM papers and thereby recognizes the tension between applying existing methods to new problems and developing new science that advances the field in a generalizable way. In addition, the inherent complexity of medicine and of clinical care necessitates that we address not only decision-making performance but also issues of usability, workflow, transparency, safety, and the pursuit of persuasive results from formal clinical trials. These requirements contribute to an ongoing investigative agenda that means fundamental AIM research will continue to be crucial and will define our accomplishments in the decades ahead.