DBDSfest 2024 Faculty Lightning Talks – Stanford – Department of Biomedical Data Science

Zihuai He, Ph.D.
Assistant Professor
Stanford ADRC, Data Management & Statistics Associate Core Leader
Quantitative Sciences Unit
Department of Neurology and Neurological Sciences

“Beyond guilty by association at scale: searching for causal variants on the basis of genome-wide summary statistics”

Abstract: Understanding the causal genetic architecture of complex phenotypes is essential for future research into disease mechanisms and potential therapies. Here, we present a novel framework for genome-wide detection of sets of variants that carry non-redundant information on the phenotypes and are therefore more likely to be causal in a biological sense. Crucially, our framework requires only summary statistics obtained from standard genome-wide marginal association testing. The described approach, implemented in open-source software, is also computationally efficient, requiring less than 15 minutes on a single CPU to perform genome-wide analysis. Through extensive genome-wide simulation studies, we show that the method can substantially outperform usual two-stage marginal association testing and fine-mapping procedures in precision and recall. In applications to a meta-analysis of ten large-scale genetic studies of Alzheimer’s disease (AD), we identified 82 loci associated with AD, including 37 additional loci missed by conventional GWAS pipeline. The identified putative causal variants achieve state-of-the-art agreement with massively parallel reporter assays and CRISPR-Cas9 experiments.

Biography: Dr. He received PhD from the University of Michigan in 2016. Following a postdoctoral training in biostatistics at Columbia University, he joined Stanford University in 2018. His research is concentrated in the area of biostatistics, statistical genetics, and integrative analysis of omics data. His current methodological work focuses on statistical inference in high-dimensional and large-scale testing problems, incorporating rigorous feature selection into machine learning methods, and translating data-driven discoveries into mechanistic insights and drug targets.

Olivier Gevaert, Ph.D.
Associate Professor of Medicine (Biomedical Informatics) and of Biomedical Data Science

“Cross-modal modeling: from data imputation to synthetic data”

Abstract:
Missing data presents a persistent challenge in biomedical research. Data imputation techniques have evolved from single-modality approaches to multi-modal approaches, which show great promise for imputing one modality based on the availability of another. Recent advancements in large, pre-trained artificial intelligence (AI) models, known as foundation models, offer even more powerful solutions for data imputation. We introduce the concept of cross-modal data modeling, a methodology harnessing foundation models to impute missing data and also generate realistic synthetic samples. Cross-modal modeling empowers researchers to model complex interactions among diverse biomedical data types, including omics and imaging. This approach can illuminate how one modality influences another, facilitating in-silico exploration of disease mechanisms without the need for extensive and costly real-world data collection. We highlight ongoing efforts in cross-modal modeling and anticipate its substantial contributions to understanding disease biology and enhancing healthcare practices.