Workshop on GenAI in Healthcare

OCT 25, 2023 | 3:00 – 6:00 PM

OCT 25, 2023 | 3:00 – 6:00 PM
CHEM-H BUILDING | ROOM E241
3:00-4:45 PM: Red Team Challenge
4:45-6:00 PM: Happy Hour

Privacy, Data Privacy, and Differential Privacy

BIOMEDICAL DATA SCIENCE PRESENTS:
BIODS 260C
6/1/23 1:30PM-2:50PM
MSOB X303 (ZOOM LINK BELOW)
Xiao-Li Meng
Founding Editor-in-Chief of Harvard Data Science Review
Whipple V. N. Jones Professor of Statistics, Harvard University

Title: Privacy, Data Privacy, and Differential Privacy

Abstract: This talk invites curious minds to contemplate the notion of data privacy, especially at the individual levels. It first traces the evasive concept of privacy to a legal right, apparently derived from the frustration of the husband of a socialite attracting tabloids when yellow journalism and printing photography in newspapers became popular in 1890s. More than a century later, the rise of digital technologies and data science has made the issue of data privacy a central concern for essentially all enterprises, from medical research to business applications, and to census operations. Differential privacy (DP), a theoretically elegant and methodologically impactful framework developed in cryptography, is a major milestone in dealing with the thorny issue of properly balancing data privacy and data utility. However, the popularity of DP has brought both hype and scrutiny, revealing several misunderstandings and subtleties that have created confusions even among specialists. The technical part of this talk is therefore devoted to explicating such issues from a statistical framework, built upon the prior-to-posterior semantics of DP and a multi-resolution perspective. This framework yields an intuitive statistical interpretation of DP, albeit it does not correspond in general to the commonly perceived and desired data privacy protection. Ultimately, the talk aims to highlight the challenges and research opportunities in quantifying data privacy, what DP does and does not protect, and the need to properly analyze DP data. (This talk is based on joint work with James Bailie and Ruobin Gong.)

Suggested readings:

1) Harvard Data Science Review: Differential Privacy for 2020 Census, https://hdsr.mitpress.mit.edu/specialissue2
and, in the same issue, the editorial.

2) Oberski, D. L., & Kreuter, F. (2020). Differential Privacy and Social Science: An Urgent Puzzle. Harvard Data Science Review, 2(1). https://doi.org/10.1162/99608f92.63a22079

3) Groshen, E. L., & Goroff, D. (2022). Disclosure Avoidance and the 2020 Census: What Do Researchers Need to Know? Harvard Data Science Review, (Special Issue 2). https://doi.org/10.1162/99608f92.aed7f34f

Bio:

Xiao-Li Meng, the Founding Editor-in-Chief of Harvard Data Science Review and the Whipple V. N. Jones Professor of Statistics, was named the best statistician under the age of 40 by Committee of Presidents of Statistical Societies (COPSS) in 2001, and he is the recipient of numerous awards and honors for his more than 150 publications in at least a dozen theoretical and methodological areas, as well as in areas of pedagogy and professional development. In 2020, he was elected to the American Academy of Arts and Sciences. . Meng received his BS in mathematics from Fudan University in 1982 and his PhD in statistics from Harvard in 1990. He was on the faculty of the University of Chicago from 1991 to 2001 before returning to Harvard, where he served as the Chair of the Department of Statistics (2004–2012 and the Dean of Graduate School of Arts and Sciences (2012–2017).

Zoom link: https://stanford.zoom.us/j/92124459914pwd=cFpJYXVLOExUVjMzZkNsYXA0b0RxUT09&from=a ddon
Meeting ID: 943 2440 5118
Password: 36643

PDF Flier

Inference from single cell lineage tracing data generated via genome editing and a novel test for phylogenetic association.

BIOMEDICAL DATA SCIENCE PRESENTS:
BIODS 260C
5/25/23 1:30PM-2:50PM
MSOB X303 (ZOOM LINK BELOW)
Julia Adela Palacios
Assistant Professor of Statistics
Assistant Professor of Biomedical Data Science
Stanford University

Title: Inference from single cell lineage tracing data generated via genome editing and a novel test for phylogenetic association.

Abstract: Single cell lineage tracing data obtained via genome editing with Crispr/Cas9 technology enables us to better understand important developmental processes at an unprecedented resolution. In the first part of the seminar, I will present a model that allows us to infer cell lineage phylogenies and lineage population size trajectories in a maximum likelihood or Bayesian framework. We assume an efficient coalescent model on cell phylogenies and propose a mutation model that describes how synthetic CRISPR target arrays generate observed variation after many cell divisions. We apply our method to two different CRISPR technologies. In the second part of the seminar, I will present a model for trait evolution inspired by the Chinese Restaurant process. We use this model to derive a test for phylogenetic binary trait association and apply it to test several hypotheses in phylogenetics, infectious diseases and cancer.

References:

1. Zhang J, Preising GA, Schumer M, Palacios JA. CRP-Tree: A phylogenetic association test for binary traits. 2. Yang et al., Lineage tracing reveals the phylodynamics, plasticity, and paths of tumor evolution, Cell 2022

Bio:

In her research, Professor Palacios seeks to provide statistically rigorous answers to concrete, data-driven questions in population genetics, epidemiology, and comparative genomics, often involving probabilistic modeling of evolutionary forces and the development of computationally tractable methods that are applicable to big data problems. Past and current research relies heavily on the theory of stochastic processes and recent developments in machine learning and statistical theory for big data; future research plans are aimed at incorporating the effects of selection and population structure in Bayesian inference of evolutionary parameters such as effective population size and recombination rates, and development of more realistic and computationally efficient methods for phylodynamic methods of infectious diseases.

Zoom link: https://stanford.zoom.us/j/92124459914pwd=cFpJYXVLOExUVjMzZkNsYXA0b0RxUT09&f rom=addon
Meeting ID: 943 2440 5118
Password: 366430

PDF Flier

Probing differential expression patterns efficiently and robustly through adaptive linear multi-rank two-sample tests

BIOMEDICAL DATA SCIENCE PRESENTS:
BIODS 260C
5/18/23 1:30PM-2:50PM
MSOB X303 (ZOOM LINK BELOW)
Dan Daniel Erdmann-Pham
Stein Fellow in the Statistics Department at
Stanford University

Title:

Probing differential expression patterns efficiently and robustly through adaptive linear multi-rank two-sample tests

Abstract:

Two- and K-sample tests are commonly used tools to extract scientific discoveries from data. Naturally, the precise choice of test ought depend on the specifics of the generating mechanisms producing the data: strong parametric assumptions allow for efficient likelihood-based testing, while non-parametric approaches like Mann- Whitney and Kolmogorov-Smirnov-type tests are popular when such prior knowledge is absent. As this talk will argue, practitioners often find themselves in situations of neither full knowledge of all involved distribution nor in full ignorance of them, and therefore are in need of tests that span the spectrum of possible prior knowledge gracefully. It proposes so-called adaptive linear multi-rank statistics as promising candidates for this task, and illustrates their general utility, flexibility (including applications to multiple testing and testing under nuisance alternatives), and computational feasibility on examples from population genetics and single-cell differential expression analysis.

Bio:

I am a statistician working on the rigorous, interpretable, and scalable analysis of data with a specific focus on data arising in biology. Data underpins much modern scientific discovery, which has motivated the development of a rich set of tools to aid its analysis. The field of machine learning in particular has supplied an inventory of quantitative methods ranging from hypothesis testing to function approximation that are available off-the-shelf. However, choosing the most suitable algorithm for a given data set, or indeed whether an algorithm delivering satisfactory performance exists, is often obscured by tacit theoretical assumptions not readily accessible to the user, or a lack of clarity regarding method-specific capabilities and limitations. The broad theme of my work is to bridge such gaps by providing transparent data-analysis schemes for which provable optimality guarantees exist.

Zoom link: https://stanford.zoom.us/j/92124459914? pwd=cFpJYXVLOExUVjMzZkNsYXA0b0RxUT09&from=addon
Meeting ID: 943 2440 5118
Password: 366430

PDF Flier

1 2 3 8