Michael Y. Li

Hi! I'm a Computer Science PhD student at Stanford, advised by Noah Goodman and Emily Fox.

I received my undergraduate degree from Princeton (summa cum laude, Phi Beta Kappa), where I was fortunate to work with Tom Griffiths and Ryan Adams. I’ve also spent time doing research in tech and quant finance (Microsoft Research and Two Sigma).

Email / GitHub / Google Scholar / LinkedIn

Research

These days, I work primarily on enhancing the reasoning capabilities of large language models. Before that, I worked on various topics in statistical machine learning (e.g., Sequential Monte Carlo, variational inference, Gaussian processes, point processes) and ways to integrate black-box LLMs into data-science workflows. If you are a Stanford student interested in doing research with me, feel free to reach out at firstname.middle_initial.lastname@stanford.edu!

Automated Hypothesis Validation with Agentic Sequential Falsifications

Kexin Huang*, Ying Jin*, Ryan Li*, Michael Y. Li, Emmanuel Candès, Jure Leskovec
ICML, 2025
paper

LLMs + sequential hypothesis tests with Type-I error control

BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery

Kanishk Gandhi*, Michael Y. Li*, Lyle Goodyear,Louise Li, Aditi Bhaskar, Mohammed Zaman,Noah D. Goodman
NeurIPS Workshop on Scaling Environments for Agents, 2025
paper

A benchmark for LLM driven experimental design and model discovery

CriticAL: Model Criticism Automation with Language Models

Michael Y. Li, Noah D. Goodman, Emily B. Fox
NeurIPS Statistical Foundations of LLMs and Foundation Models Workshop, 2024
paper

Automated Bayesian model criticism.

What Should Embeddings Embed? Transformers Represent Latent Generating Distributions

Liyi Zhang, Michael Y. Li, Thomas L. Griffiths
preprint, 2024
paper

We study the embeddings of transformers through the lens of predictive sufficient statistics.

Automated Statistical Model Discovery with Language Models

Michael Y. Li, Emily B. Fox, Noah D. Goodman
ICML, 2024
paper

We propose a language model driven automated statistical model discovery system.

NAS-X: Neural Adaptive Smoothing via Twisting

Dieterich Lawson* Michael Y. Li*, Scott W. Linderman
NeurIPS, 2023
Advances in Approximate Bayesian Inference, 2023 [Oral Presentation]
paper website

Twisted Sequential Monte Carlo for inference in latent variables models.

Why think step-by-step? Reasoning emerges from the locality of experience

Ben Prystawski, Michael Y. Li, Noah D. Goodman
NeurIPS, 2023 [Oral Presentation, top 0.5%]
paper

We empirically and theoretically study when chain-of-thought reasoning emerges in large language models.

Gaussian Process Surrogate Models for Neural Networks

Michael Y. Li, Erin Grant, Thomas L. Griffiths
UAI, 2023
paper

Using Gaussian processes to approximate neural networks.

Learning to Learn Functions

Michael Y. Li, Fred Callaway, William D. Thompson, Ryan P. Adams, Thomas L. Griffiths
Cognitive Science, 2023
paper

We propose hierarchical Bayesian models of how people learn to learn functions and validate our model in behavioral experiments.

Design and source code from Jon Barron's website