George H. Chen
Associate Professor of Information Systems,
Heinz College
Affiliated Faculty,
Machine Learning Department
Carnegie Mellon University
Email: georgechen [at symbol] cmu.edu
Office: HBH 2216 (the west wing of Hamburg Hall, second floor)
About
I primarily work on building trustworthy machine learning models for
time-to-event prediction (survival analysis) and for time series
analysis. I often use nonparametric prediction models that work
well under very few assumptions on the data. My main application area
is in healthcare. I am supported in part by an NSF CAREER award.
Survival analysis:
Much of what I work on is survival analysis. I have an upcoming
monograph (to appear in Foundations and Trends in Machine Learning)
that aims to be a reasonably self-contained introduction to deep survival
models for time-to-event prediction, targeted toward a machine learning
audience. A draft is available
here (September 2024).
Previously, I taught a
survival
analysis tutorial at CHIL 2020 and at SIGMETRICS 2021, and I
co-organized a survival
analysis symposium (part of the 2023 AAAI Fall Symposium Series).
CoolCrop:
I occasionally also work on machine learning for the developing world.
I co-founded and now am an advisor for
CoolCrop, an AgriTech startup based
in India that works on providing farmers with cold storage units
(such as a refrigerator shared by a village) and market forecasts.
We currently serve over 9000 farmers across 7 states in India
at over 40 sites.
Pre-historic:
I obtained my Ph.D. in Electrical
Engineering and Computer Science at MIT.
My thesis was on
nonparametric machine learning methods. At MIT, I also worked on
satellite
image analysis to help bring electricity to rural India, and
taught twice in Jerusalem for MEET,
a program that brings together Israeli and Palestinian high school
students to learn computer science and entrepreneurship. I completed my B.S.
at UC Berkeley, majoring in
Electrical Engineering and Computer Sciences, and
Engineering Mathematics and Statistics.
My CV can be found here.
Some News
Neural Information Processing Systems (Dec 2024): I'm serving as an area chair.
[website]
Conference on Health, Inference, and Learning (June 2024): I co-organized research roundtables.
[website]
MEET Summer 2023:
I returned to Jerusalem to teach computer science to Israeli and Palestinian high school students as part of MIT's Middle East Entrepreneurs of Tomorrow (MEET) program. I previously taught for this program in the summers of 2015 and 2016.
Teaching (Fall 2024, mini 2)
95-865 "Unstructured Data Analytics" (Sections A2/B2/C2)
Research Supervision
I've had the fortune of working with many wonderful students over the years (listed below). If you're interested in working with me and you already are a CMU student, then feel free to shoot me an email telling me what you're particularly excited about working on, why it overlaps with my research interests, and what skills you've already cultivated.
Current PhD student collaborators:
Current master's student collaborators:
- Mingzhu Liu
- Shaopeng Zhang
Past students and where they went after graduating:
- Helen S. Zeng (PhD 2024), Assistant Professor at UC Davis Graduate School of Management
- Yue Zhao (PhD 2023), Assistant Professor at USC Department of Computer Science
- Emaad Manzoor (PhD 2021), Assistant Professor at Cornell University SC Johnson Graduate School of Management
- Mi Zhou (PhD 2020), Assistant Professor at UBC Sauder School of Business
- ♣ Wei Ma (master's in ML 2018/PhD 2019), Assistant Professor at Hong Kong Polytechnic University in the Department of Civil and Environmental Engineering
- ♣ Lynn H. Kaack (master's in ML 2018/PhD 2019), Assistant Professor at the Hertie School
- Thomas Tam (MSPPM 2023), Sunstella Foundation/Jewish Healthcare Foundation
- Brenda Palma (MISM 2022), Markaaz
- Xiaotong (Maggie) Lu (MISM 2020), Uber
- Runtong (Fred) Yang (MISM 2019), Indeed
- Ren Zuo (MISM 2018), Cornerstone Research
- Linhong (Lexie) Li (B.S. 2020), McKinsey
- Junyan Pu (B.S. 2020), CMU master's degree program in CS
♣ indicates a PhD student who worked with me on a secondary master's in ML (I was their master's research advisor but not their PhD research advisor)
Past postdoc:
- Shu Hu (postdoc from Fall 2022 to Summer 2023), Assistant Professor at Purdue University in Indianapolis, Department of Computer and Information Technology
Papers
You can also find my papers listed on
Google Scholar.
Some Working Papers
-
"Can Platform Accountability Reduce Sex Trafficking? Evidence from the Price Effect"
Helen S. Zeng, George H. Chen, Brett Danaher, Michael D. Smith
-
"Multi-stage Readmission and Mortality Prediction along Patient Care Pathway"
Xinyu Yao, Rema Padman, George H. Chen, Karmel S. Shehadeh, Arman Kilic
Under review
2024
-
"Generalized Prompt Tuning: Adapting Frozen Pre-Trained Univariate Time Series Foundation Models for Multivariate Healthcare Time Series"
Mingzhu Liu, Angela H. Chen, George H. Chen
Accepted at Machine Learning for Health (ML4H), December 2024
(A short version of this paper not specific to healthcare is slated to be presented at the NeurIPS workshop on Time Series in the Age of Large Models workshop in December 2024)
[paper draft coming soon!]
-
"An Introduction to Deep Survival Analysis Models for Predicting Time-to-Event Outcomes"
George H. Chen
To appear in Foundations and Trends in Machine Learning
[arXiv] [preprint] [code]
-
"Fairness in Survival Analysis with Distributionally Robust Optimization"
Shu Hu*, George H. Chen*
(* = equal contribution)
Journal of Machine Learning Research (JMLR), August 2024
[arXiv] [publisher's link] [code]
Note: Journal paper version of our ML4H 2022 paper—generalizes the DRO approach from our earlier paper to a wide class of survival models (such as but not limited to Cox, DeepHit, and SODEN models), adds theoretical analysis for our split DRO approach, and derives an exact DRO Cox method without sample splitting
-
"Neural Topic Models with Survival Supervision: Jointly Predicting Time-to-Event Outcomes and Learning How Clinical Features Relate"
George H. Chen*, Linhong Li*, Ren Zuo, Amanda Coston, Jeremy C. Weiss
(* = equal contribution)
Artificial Intelligence in Medicine, June 2024
[arXiv] [publisher's link] [code]
Note: Journal paper version of our AIME 2020 paper—fixes some model presentation glitches, includes more combinations of topic models with survival models, and has much more thorough discussion
-
"Survival Kernets: Scalable and Interpretable Deep Kernel Survival Analysis with an Accuracy Guarantee"
George H. Chen
Journal of Machine Learning Research (JMLR), February 2024
[arXiv] [publisher's link] [code]
(Presented at the International Conference on Machine Learning (ICML) in July 2024 as part of the journal-to-conference track)
Best paper finalist (applied track) at the INFORMS Data Mining and Decision Analytics Workshop 2022
Note: This paper is about learning a flexible survival model that is in some sense interpretable and also simultaneously learns a "kernel function" (measures the similarity between any two data points). This paper is the third in a trilogy of papers I've written on kernel survival analysis and aims to combine insights from my ICML 2019 paper (on theory for nearest neighbor and kernel Kaplan-Meier estimators) and my MLHC 2020 paper (on how to automatically learn kernel functions for kernel Kaplan-Meier estimators) as to obtain a class of scalable, interpretable, and accurate kernel Kaplan-Meier estimators, where a special case of this class of estimators has a theoretical guarantee.
-
"Improving Fairness in Deepfake Detection"
Yan Ju*, Shu Hu*, Shan Jia, George H. Chen, Siwei Lyu
(* = equal contribution)
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2024
[arXiv] [publisher's link]
2023
-
"Temporal Supervised Contrastive Learning for Modeling Patient Risk Progression"
Shahriar Noroozizadeh, Jeremy C. Weiss, George H. Chen
Machine Learning for Health (ML4H), December 2023
[arXiv] [publisher's link] [code]
(Preliminary version presented at the AAAI 2023 Workshop on Representation Learning for Responsible Human-Centric AI)
-
"Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing Risks"
Xiaobin Shen, Jonathan Elmer, George H. Chen
Machine Learning for Healthcare (MLHC), August 2023
[arXiv (includes minor corrections)] [publisher's link] [code]
-
"A General Framework for Visualizing Embedding Spaces of Neural Survival Analysis Models Based on Angular Information"
George H. Chen
Conference on Health, Inference, and Learning (CHIL), June 2023
[arXiv] [publisher's link] [code]
-
"Influence via Ethos: On the Persuasive Power of Reputation in Deliberation Online"
Emaad Manzoor, George H. Chen, Dokyun Lee, Michael D. Smith
Management Science, May 2023
[arXiv] [publisher's link] [Cornell news]
Best paper at the AAAI Workshop on AI for Behavior Change 2021
2022
-
"BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs"
Kay Liu*, Yingtong Dou*, Yue Zhao*, Xueying Ding, Xiyang Hu, Ruitong Zhang, Kaize Ding, Canyu Chen, Hao Peng, Kai Shu, Lichao Sun, Jundong Li, George H. Chen, Zhihao Jia, Philip S. Yu
(* = equal contribution)
Neural Information Processing Systems (NeurIPS) (Datasets and Benchmarks track), November-December 2022
[arXiv] [code]
-
"Distributionally Robust Survival Analysis: A Novel Fairness Loss Without Demographics"
Shu Hu*, George H. Chen*
(* = equal contribution)
Machine Learning for Health (ML4H), November 2022
[arXiv] [publisher's link] [code]
Note:
In this original conference version of the paper, we only considered the Cox model. For a substantial extension to a much wider class of survival models, an exact Cox DRO method, and theory on our split DRO approach, please see our JMLR 2024 journal paper extension.
-
"TOD: Tensor-Based Outlier Detection, a General GPU-Accelerated Framework"
Yue Zhao, George H. Chen, Zhihao Jia
Proceedings of the VLDB Endowment, Vol 16, No. 3, November 2022
[arXiv] [publisher's link] [code]
(Presented at the International Conference on Very Large Data Bases, August-September 2023)
-
"ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions"
Zheng Li*, Yue Zhao*, Xiyang Hu, Nicola Botta, Cezar Ionescu, George H. Chen
(* = equal contribution)
IEEE Transactions on Knowledge and Data Engineering, March 2022
[arXiv] [publisher's link] [code]
2021
-
"Consumer Behavior in the Online Classroom: Using Video Analytics and Machine Learning to Understand the Consumption of Video Courseware"
Mi Zhou, George H. Chen, Pedro Ferreira, Michael D. Smith
Journal of Marketing Research, December 2021
[SSRN] [publisher's link]
2020
-
"Neural Topic Models with Survival Supervision: Jointly Predicting Time-to-Event Outcomes and Learning How Clinical Features Relate"
Linhong Li, Ren Zuo, Amanda Coston, Jeremy C. Weiss, George H. Chen
International Conference on Artificial Intelligence in Medicine (AIME), August 2020
[arXiv (journal version; fixes various bugs in the conference paper version)] [code] [talk slides]
-
"Predicting Mortality Risk in Viral and Unspecified Pneumonia to Assist Clinicians with COVID-19 ECMO Planning"
Helen Zhou*, Cheng Cheng*, Zachary C. Lipton, George H. Chen, Jeremy C. Weiss
(* = equal contribution)
International Conference on Artificial Intelligence in Medicine (AIME), August 2020
[arXiv] [code]
(Also presented at the International Conference on Machine Learning (ICML) Workshop on Machine Learning for Global Health, July 2020)
-
"Deep Kernel Survival Analysis and Subject-Specific Survival Time Prediction Intervals"
George H. Chen
Machine Learning for Healthcare (MLHC), August 2020
[arXiv] [publisher's link] [code] [poster]
Note:
This paper is essentially a sequel to my theory paper on nearest neighbor and kernel survival analysis (ICML 2019), where an open problem encountered is how to automatically learn kernel functions for survival analysis aside from using random survival forests. In my follow-up JMLR 2024 paper, I show how to scale deep kernel survival analysis up to large datasets and how to establish an accuracy guarantee for a special case of the resulting estimator.
2019
-
"Missing Not at Random in Matrix Completion: The Effectiveness of Estimating Missingness Probabilities Under a Low Nuclear Norm Assumption"
Wei Ma*, George H. Chen*
(* = equal contribution)
Neural Information Processing Systems (NeurIPS), December 2019
[arXiv] [code] [poster] [talk slides]
Best paper (theoretical track) at the INFORMS Data Mining and Decision Analytics Workshop 2019
-
"Truck Traffic Monitoring with Satellite Images"
Lynn H. Kaack, George H. Chen, M. Granger Morgan
ACM Conference on Computing and Sustainable Societies (COMPASS), July 2019
[arXiv]
(Also presented at the
International Conference on Machine Learning (ICML) Workshop on Climate Change, June 2019)
-
"Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates"
George H. Chen
International Conference on Machine Learning (ICML), June 2019
[arXiv (includes minor corrections)] [publisher's link] [code] [talk slides] [poster]
Note:
I wrote two follow-up papers; see the notes for my MLHC 2020 paper and my JMLR 2024 paper.
-
"An Interpretable Produce Price Forecasting System for Small Farmers in India using Collaborative Filtering and Adaptive Nearest Neighbors"
Wei Ma, Kendall Nowocin, Niraj Marathe, George H. Chen
Information and Communication Technologies and Development (ICTD), January 2019
[arXiv]
2018
-
"Explaining the Success of Nearest Neighbor Methods in Prediction"
George H. Chen, Devavrat Shah
Foundations and Trends in Machine Learning, May 2018
[publisher's link]
2017
-
"Survival-Supervised Topic Modeling with Anchor Words: Characterizing Pancreatitis Outcomes"
George H. Chen, Jeremy C. Weiss
Neural Information Processing Systems (NeurIPS) Workshop on Machine Learning for Health (ML4H), December 2017
[arXiv (short workshop version)]
(Also presented at the Society for Medical Decision Making North American Meeting, October 2017)
-
"Toward Reducing Crop Spoilage and Increasing Small Farmer Profits in India: a Simultaneous Hardware and Software Solution"
George H. Chen, Kendall Nowocin, Niraj Marathe
Information and Communication Technologies and Development, November 2017
[arXiv]
2015
-
"A Latent Source Model for Patch-Based Image Segmentation"
George H. Chen, Devavrat Shah, Polina Golland
Medical Image Computing and Computer-Assisted Intervention (MICCAI), October 2015
[arXiv]
[paper]
[poster]
Note:
For a more comprehensive exposition of this paper, consider
reading Chapter 5 of my
Ph.D. thesis.
-
"Latent Source Models for Nonparametric Inference"
George H. Chen
Ph.D. thesis, MIT, May 2015
[paper]
Received the George M. Sprowls award for best Ph.D. thesis in Computer Science at MIT
-
"Targeting Villages for Rural Development Using Satellite Image
Analysis"
Kush R. Varshney, George H. Chen, Brian Abelson, Kendall
Nowocin, Vivek Sakhrani, Ling Xu, Brian L. Spatocco
Big Data, March 2015
[paper]
2014
-
"A Latent Source Model for Online Collaborative Filtering"
(alphabetical author ordering)
Guy Bresler, George H. Chen, Devavrat Shah
Neural Information Processing Systems (NeurIPS), December 2014
[arXiv - longer version]
[paper - short conference version]
[poster]
Selected as a spotlight (one of 62/1678 submissions)
Note:
An expanded version including intuition for how collaborative
filtering relates to an MAP item recommender and derivations for
the examples is in Chapter 4 of my
Ph.D. thesis;
the notation has also been changed to be more similar to the
other two papers that went toward my thesis.
2013
-
"A Latent Source Model for Nonparametric Time Series Classification"
(alphabetical author ordering)
George H. Chen, Stanislav Nikolov, Devavrat Shah
Neural Information Processing Systems (NeurIPS), December 2013
[arXiv - longer version]
[paper - short conference version]
[poster]
Note:
An expanded version with a lower bound on the misclassification
rate and further discussion is in Chapter 3 of my
Ph.D. thesis.
-
"Sparse Projections of Medical Images onto Manifolds"
George H. Chen, Christian Wachinger, Polina Golland
Information Processing in Medical Imaging (IPMI), June-July 2013
[arXiv]
[paper]
[poster]
2012
-
"Deformation-Invariant Sparse Coding"
George H. Chen
Master's thesis, MIT, May 2012
[paper]
[poster]
2011
-
"Deformation-Invariant Sparse Coding for Modeling Spatial Variability of Functional Patterns in the Brain"
George H. Chen, Evelina G. Fedorenko, Nancy G. Kanwisher, Polina Golland
Neural Information Processing Systems (NeurIPS) Workshop on Machine Learning and Interpretation in Neuroimaging, December 2011
[paper]
[talk slides]
2010
-
"Indoor Localization and Visualization Using a Human-Operated Backpack System"
Timothy Liu, Matthew Carlberg, George Chen, Jacky Chen, John Kua, Avideh Zakhor
International Conference on Indoor Positioning and Indoor Navigation (IPIN), September 2010
[paper]
-
"Indoor Localization Algorithms for a Human-Operated Backpack System"
George Chen, John Kua, Stephen Shum, Nikhil Naikal, Matthew Carlberg, Avideh Zakhor
International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), May 2010
[paper]
2009
-
"Classifying Urban Landscape in Aerial LIDAR Using 3D Shape Analysis"
Matthew Carlberg, Peiran Gao, George Chen, Avideh Zakhor
International Conference on Image Processing (ICIP), November 2009
[paper]
-
"2D Tree Detection in Large Urban Landscapes Using Aerial LIDAR Data"
George Chen, Avideh Zakhor
International Conference on Image Processing (ICIP), November 2009
[paper]
-
"Image Augmented Laser Scan Matching for Indoor Dead Reckoning"
Nikhil Naikal, John Kua, George Chen, Avideh Zakhor
International Conference on Intelligent Robots and Systems (IROS), October 2009
[paper]
Last updated 11/11/2024.