Alex Derry

Alex Derry’s Dissertation Defense 12/8

Friday, December 8th, 2023

10:00 am PST

Location: Y2E2 111

Deep learning on local sites for protein structure and function analysis

Understanding how the three-dimensional structure of a protein leads to its function is important for determining disease mechanisms, developing targeted therapeutics, and engineering new proteins with desired functional characteristics. The expansion of protein structure databases due to experimental and computational advances provides an unprecedented opportunity to learn structure-function relationships in a data-driven manner. Deep learning methods that operate on protein structures have shown promise for specific tasks, but their utility for functional analysis has been limited due to inconsistencies in model training and evaluation, lack of labeled proteinfunction data, and an inability to reconcile global predictions with local biochemical mechanisms. In this dissertation, I explore these challenges and propose a framework for protein analysis based on learning on local sites rather than the entire protein structure. First, to establish standards for model development and evaluation, I present work on (1) developing a suite of benchmark datasets, processing tools, and baseline models, and (2) quantifying the effect of differing structure compositions in the training data. I then describe a self-supervised learning method that leverages evolutionary relationships to learn general-purpose representations of local structural sites and show how these representations enable improved performance on downstream tasks involving classification, search, and annotation of functional sites. By clustering millions of sites, I propose a framework for protein analysis based on conserved structural motifs which enables the discovery of functional relationships across protein classes. Finally, I present a method for explainable function annotation that predicts the overall function of a protein as well as the individual residues which are responsible.


(PW: 271506)

Kyle Daniels

Weekly Seminar: Kyle Daniels, 12/7/23


Speaker: Kyle Daniels, Assistant Professor of Genetics, Stanford University

Title: Decoding the language of signaling domains to control cell function

Abstract: Cell therapies are powerful technologies in which human cells are reprogrammed for therapeutic applications such as killing cancer cells or replacing defective cells. The technologies underlying cell therapies are increasingly complexity, making rational engineering of cell therapies more difficult. Creating the next generation of cell therapies will require improved experimental approaches and predictive models. Artificial intelligence (AI) and machine learning (ML) methods have revolutionized several fields in biology including genome annotation, protein structure prediction, and enzyme design. Combining experimental library screens and AI to build create predictive models, design rules, and improved designs could accelerate the development of cell therapies. Chimeric antigen receptor (CAR) costimulatory domains derived from native immune receptors steer the phenotypic output of therapeutic T cells. We constructed a library of CARs containing ~2,300 synthetic costimulatory domains, built from combinations of 13 signaling motifs. These CARs promoted diverse cell fates, which were sensitive to motif combinations and configurations. Neural networks trained to decode the combinatorial grammar of CAR signaling motifs allowed extraction of key design rules. For example, non-native combinations of motifs which bind tumor necrosis factor receptor-associated factors (TRAFs) and phospholipase C gamma 1 (PLCg1) enhanced cytotoxicity and stemness associated with effective tumor killing. Thus, libraries built from minimal building blocks of signaling, combined with machine learning, can efficiently guide engineering of receptors with desired phenotypes.

Suggested readings:



Manuel Rivas

Weekly Seminar: Manny Rivas, “Prediction and inference from population scale datasets” 11/16

Prediction and inference from population scale datasets

Thursdays 11/16 1:30-3:00 pm in MSOB x303

Population biobanks are a valuable resource for identifying genetic and environmental factors that contribute to disease. Recent advances in statistical methods and computational power have enabled the analysis of large-scale datasets from these biobanks, leading to the discovery of novel therapeutic targets and pathways. This seminar will present on the use of population biobank scale datasets for the analysis of renal, liver, and sex hormone biomarkers. In addition, I will discuss the path from statistical methodological development to target identification for glaucoma to therapeutic development using monoclonal antibodies to mimic effects of protective mutations in humans. Finally, I will present on approaches for disease risk prediction using genetics, metabolomics, and proteomics data. Together, the methods and applications presented in this talk demonstrate the value of population-scale cohorts to advance our understanding of disease and development of new treatments.

Suggested reading:

Rare protein-altering variants in ANGPTL7 lower intraocular pressure and protect against glaucoma,

Genetics of 35 blood and urine biomarkers,

Tanigawa Y, Qian J, Venkataraman G, Justesen JM, Li R, Tibshirani R, et al. (2022) Significant sparse polygenic risk scores across 813 traits in UK Biobank. PLoS Genet 18(3): e1010105.

Bayesian model comparison for rare-variant association studies
GR Venkataraman, C DeBoever, Y Tanigawa… – The American Journal of Human Genetics, 2021. Julia Carrasco-Zanini,et al.

This weekly seminar (running during Fall, Winter and Spring quarters) doubles as a class “Workshops in Biostatistics (BIODS/STATS 260).”

Data Studio: Clinical Trial Design for Glaucoma Treatment Using Humphrey Visual Field as Primary Outcome

TITLE: Clinical Trial Design for Glaucoma Treatment Using Humphrey Visual Field as Primary OutcomeA Guide for the Statistically Perplexed
DATE: Wednesday, 4 October 2023
TIME: 3:00–4:30 PM
LOCATION: Conference Room X399, Medical School Office Building, 1265 Welch Road, Stanford, CA


Laurel Stell, Biomedical Data Science
Jeffrey Goldberg, Ophthalmology
Gala Beykin, Ophthalmology


Ying Lu
Chiara Sabatti
Lu Tian
Balasubramanian Narasimhan (Naras)
Brad Efron
Mei-Chiung Shih
John S. Tamaresis



The Data Studio Workshop brings together a biomedical investigator with a group of experts for an in-depth session to solicit advice about statistical and study design issues that arise while planning or conducting a research project. This week, the investigator(s) will discuss the following project with the group.


Glaucoma treatments are typically assessed by whether they control interocular pressure (IOP), but the disease often continues to progress despite reduction in IOP.  The Humphrey Visual Field (HVF) exam, which measures the retina’s sensitivity to light, is widely used to diagnose glaucoma and its progress, but its measurement error can be large in comparison to the rate of progression. Consequently, estimating the rate of decrease in HVF measurements by linear regression generally requires regular exams over 10 years or more, and even then the slope is often not statistically significant.  Finally, treatments are not likely to reverse damage but only slow or delay neurodegeneration.  All of these factors can result in prohibitively large sample sizes or long trial times when using HVF as primary outcome in a clinical trial.

Hypothesis & Aim

We have performed exploratory analysis of HVF exams.  We hope to leverage such data to improve clinical trial inclusion criteria and statistical tests for treatment effect.


We have HVF data from a variety of sources: (a) thirty glaucomatous eyes in a test-retest study that performed weekly exams for three months (Artes et al, 2014), (b) data from Phase 1b trials including six or fewer exams over a year or two from about 150 eyes (Goldberg et al, 2022), and (c) the public UW-HVF data set of thousands of eyes, including 450 with at least nine exams over 10 years or more–but without clinical information such as diagnosis, progression or treatment.

Statistical Models

The HVF exam measures sensitivity at an array of 52 points on the retina.  We will discuss properties of the measurements at individual locations, averaged over the whole retina, and averaged over each of six regions identified by mapping neurons in the retina.  We are seeking advice on statistical models for testing treatment effect.


  1. Do we have sufficient pilot data?
  2. If not, what do we need?
  3. How to estimate power for possible outcome measures?


Join from PC, Mac, Linux, iOS or Android:

    Password: 842586

Or iPhone one-tap (US Toll): +18333021536,,96972699747# or +16507249799,,96972699747#

Or Telephone:

    Dial: +1 650 724 9799 (US, Canada, Caribbean Toll) or +1 833 302 1536 (US, Canada, Caribbean Toll Free)

    Meeting ID: 969 7269 9747

    Password: 842586

    International numbers available:

    Meeting ID: 969 7269 9747

    Password: 842586


    Password: 842586