Friday, December 8th, 2023
10:00 am PST
Location: Y2E2 111
Deep learning on local sites for protein structure and function analysis
Understanding how the three-dimensional structure of a protein leads to its function is important for determining disease mechanisms, developing targeted therapeutics, and engineering new proteins with desired functional characteristics. The expansion of protein structure databases due to experimental and computational advances provides an unprecedented opportunity to learn structure-function relationships in a data-driven manner. Deep learning methods that operate on protein structures have shown promise for specific tasks, but their utility for functional analysis has been limited due to inconsistencies in model training and evaluation, lack of labeled proteinfunction data, and an inability to reconcile global predictions with local biochemical mechanisms. In this dissertation, I explore these challenges and propose a framework for protein analysis based on learning on local sites rather than the entire protein structure. First, to establish standards for model development and evaluation, I present work on (1) developing a suite of benchmark datasets, processing tools, and baseline models, and (2) quantifying the effect of differing structure compositions in the training data. I then describe a self-supervised learning method that leverages evolutionary relationships to learn general-purpose representations of local structural sites and show how these representations enable improved performance on downstream tasks involving classification, search, and annotation of functional sites. By clustering millions of sites, I propose a framework for protein analysis based on conserved structural motifs which enables the discovery of functional relationships across protein classes. Finally, I present a method for explainable function annotation that predicts the overall function of a protein as well as the individual residues which are responsible.