MIT Clinical Machine Learning Group

MIT Clinical Machine Learning Group

About the Lab

Led by David Sontag, the Clinical Machine Learning Group is interested in advancing machine learning and artificial intelligence, and using these techniques to advance health care.

Broadly, we have two goals:

Clinical: To truly make a difference in health care, we need to create algorithms that are useful for solving real clinical problems.
Machine learning: We need rigorous solutions, which can pave the way for safe deployment of machine learning in high-stakes settings like healthcare.

News

December 2022 Monica presented “Large Language Models are Few-Shot Clinical Information Extractors” at EMNLP 2022
Nov. 2022: Three papers at NeurIPS 2022, including
- Michael presenting “Evaluating Robustness to Dataset Shift via Parametric Robustness Sets”
- Zeshan presenting “Falsification before Extrapolation in Causal Effect Estimation”
- Hunter presenting “Training Subset Selection for Weak Supervision”
July 2022: We have two papers at ICML 2022, with
- Hunter and Monica presenting “Co-training improves prompt-based learning for large language models”
- Hussein presenting “Sample Efficient Learning of Predictors that Complement Humans”
March 2022: Members of our lab presented two papers at AISTATS 2022
- “Leveraging Time Irreversibility with Order-Contrastive Pre-training “
- “Using Time-Series Privileged Information for Provably Efficient Learning of Prediction Models “
December 2021: We have two papers at AAAI 2022, with
- Hussein presenting “Teaching Humans When To Defer to a Classifier via Exemplars”
- Irene presenting “Clustering Interval-Censored Time-Series for Disease Phenotyping”
October 2021: Luke presented MedKnowts: Unified Documentation and Information Retrieval for Electronic Health Records at UIST 2021. Explore the project page for a demo and more!
August 2021: Jason presented Directing Human Attention in Event Localization for Clinical Timeline Creation at MLHC 2021
July 2021: We have three papers at ICML 2021, with
- Michael presenting “Regularizing towards Causal Invariance: Linear Models with Proxies”
- Hunter presenting “Graph cuts always find a global optimum for Potts models (with a catch)”
- Zeshan presenting “Neural Pharmacodynamic State Space Modeling”

Team

David Sontag

Professor of EECS

Rebecca (Peyser) Boiarsky

PhD student

Christina X Ji

PhD Student

Chandler Squires

PhD Student

Hussein Mozannar

PhD Student

Hunter Lang

PhD Student

Shannon Shen

PhD Student

Ilker Demirel

PhD Student

Barbara Lam

Clinical Collaborator

Sama Setty

Undergraduate Researcher

Featured Publications

Effective Human-AI Teams via Learned Natural Language Rules and Onboarding

People are starting to rely on AI agents to assist them with various tasks and thus forming human-AI teams. The human must know when to …

Hussein Mozannar, Jimin J Lee, Dennis Wei, Prasanna Sattigeri, Subhro Das, David Sontag

2023 Advances in neural information processing systems (NeurIPS)

Code

Who Should Predict? Exact Algorithms For Learning to Defer to Humans

Algorithmic predictors should be able to defer the prediction to a human decision maker to ensure accurate predictions. In this work, …

Hussein Mozannar, Hunter Lang, Dennis Wei, Prasanna Sattigeri, Subhro Das, David Sontag

2023 Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS)

PDF Code

Evaluating Robustness to Dataset Shift via Parametric Robustness Sets

We give a method for proactively identifying small, plausible shifts in distribution which lead to large differences in model …

Nikolaj Thams, Michael Oberst, David Sontag

2022 Advances in Neural Information Processing Systems (NeurIPS)

PDF Code

Falsification before Extrapolation in Causal Effect Estimation

Randomized Controlled Trials (RCTs) represent a gold standard when developing policy guidelines. However, RCTs are often narrow, and …

Zeshan Hussain, Michael Oberst, Ming-Chieh Shih, David Sontag

2022 Advances in Neural Information Processing Systems (NeurIPS)

PDF

Co-training Improves Prompt-based Learning for Large Language Models

We demonstrate that co-training (Blum & Mitchell, 1998) can improve the performance of prompt-based learning by using unlabeled …

Hunter Lang, Monica Agrawal, Yoon Kim, David Sontag

2022 Proceedings of the Thirty-Eighth International Conference on Machine Learning (ICML)

PDF

ETAB: A Benchmark Suite for Visual Representation Learning in Echocardiography

Echocardiography is one of the most commonly used diagnostic imaging modalities in cardiology. Application of deep learning models to …

Ahmed Alaa, Anthony Philippakis, David Sontag

2022 Advances in Neural Information Processing Systems Datasets and Benchmarks Track

PDF

Neural Pharmacodynamic State Space Modeling

Modeling the time-series of high-dimensional, longitudinal data is important for predicting patient disease progression. However, …

Zeshan Hussain, Rahul G Krishnan, David Sontag

2021 Proceedings of the Thirty-Eighth International Conference on Machine Learning (ICML)

PDF Code

Regularizing towards Causal Invariance: Linear Models with Proxies

We propose a method for learning linear models whose predictive performance is robust to causal interventions on unobserved variables, …

Michael Oberst, Nikolaj Thams, Jonas Peters, David Sontag

2021 Proceedings of the Thirty-Eighth International Conference on Machine Learning (ICML)

PDF Code

Deep Contextual Clinical Prediction with Reverse Distillation

Healthcare providers are increasingly using machine learning to predict patient outcomes to make meaningful interventions. However, …

Rohan Kodialam, Rebecca Boiarsky, Justin Lim, Neil Dixit, Aditya Sai, David Sontag

2021 Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence

PDF

A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection

Antibiotic resistance is a major cause of treatment failure and leads to increased use of broad-spectrum agents, which begets further …

Sanjat Kanjilal, Michael Oberst, Sooraj Boominathan, Helen Zhou, David C. Hooper, David Sontag

2020 Science Translational Medicine

PDF Code DOI

A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection

Characterization of Overlap in Observational Studies

Overlap between treatment groups is required for non-parametric estimation of causal effects. If a subgroup of subjects always receives …

Michael Oberst, Fredrik D. Johansson, Dennis Wei, Tian Gao, Gabriel Brat, David Sontag, Kush R. Varshney

2020 Proceedings of the Twenty-Third International Conference on Artificial Intelligence and Statistics (AISTATS)

PDF Code

Fast, Structured Clinical Documentation via Contextual Autocomplete

We present a system that uses a learned autocompletion mechanism to facilitate rapid creation of semi-structured clinical …

Divya Gopinath, Monica Agrawal, Luke Murray, Steven Horng, David Karger, David Sontag

2020 Proceedings of the Machine Learning for Healthcare Conference

PDF

Fast, Structured Clinical Documentation via Contextual Autocomplete

Recent Publications

Quickly discover relevant content by filtering publications.

Towards Verifiable Text Generation with Symbolic References

Large language models (LLMs) have demonstrated an impressive ability to synthesize plausible and fluent text. However they remain …

Lucas Torroba Hennigen, Shannon Shen, Aniruddha Nrusimha, Bernhard Gapp, David Sontag, Yoon Kim

2023 arXiv preprint arXiv:2311.09188

PDF Code

Conceptualizing Machine Learning for Dynamic Information Retrieval of Electronic Health Record Notes

The large amount of time clinicians spend sifting through patient notes and documenting in electronic health records (EHRs) is a …

Sharon Jiang, Shannon Shen, Monica Agrawal, Barbara Lam, Nicholas Kurtzman, Steven Horng, David Karger, David Sontag

2023 Machine Learning for Healthcare Conference (MLHC), 2023

PDF

Large-Scale Study of Temporal Shift in Health Insurance Claims

Most machine learning models for predicting clinical outcomes are developed using historical data. Yet, even if these models are …

Christina X Ji, Ahmed M Alaa, David Sontag

2023 Conference on Health, Inference, and Learning

PDF Code

A Deep Dive into Single-Cell RNA Sequencing Foundation Models

Large-scale foundation models, which are pre-trained on massive, unlabeled datasets and subsequently fine-tuned on specific tasks, have …

Rebecca Boiarsky, Nalini M. Singh, Alejandro Buendia, Gad Getz, David Sontag

2023 bioRxiv

DOI URL

Conformalized Unconditional Quantile Regression

We develop a predictive inference procedure that combines conformal prediction (CP) with unconditional quantile regression (QR)—a …

Ahmed Alaa, Zeshan Hussain, David Sontag

2023 Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS)

See all publications