Zhenghao Xu

I am a fourth-year PhD candidate at Georgia Tech in the Machine Learning program, advised by Prof. Tuo Zhao and Prof. Molei Tao (co-advise). Before joining Georgia Tech, I received my bachelor's degree in Computer Science and Technology from Zhejiang University, where I was a member of the Qiushi Science Class (Computer Science).

Email / GitHub / Google Scholar / LinkedIn / X (Twitter)

Research

My current research focuses on reinforcement learning (RL) and its application in LLM post-training. I am also interested in continuous optimization and deep learning theory.

Selected Papers

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training

Zhenghao Xu, Qin Lu, Changlong Yu, Tuo Zhao
arXiv, 2026
paper / code / blog (Revisiting (Kimi's) Policy Mirror Descent)

Policy mirror descent for LLM post-training with an implicit regularization perspective.

Ask a Strong LLM Judge when Your Reward Model is Uncertain

Zhenghao Xu, Qin Lu, Qingru Zhang, Liang Qiu, Ilgee Hong, Changlong Yu, Wenlin Yao, Yao Liu, Haoming Jiang, Lihong Li, Hyokun Yun, Tuo Zhao
NeurIPS, 2025
paper / code

Uncertainty-based routing between reward models and strong LLM judges for efficient pairwise RLHF.

Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds

Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
JMLR, 2024
paper / code

Theory for policy optimization with neural function approximation under low-dimensional structure.

Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks

Zhenghao Xu, Yuqing Wang, Tuo Zhao, Rachel Ward, Molei Tao
NeurIPS, 2024
paper

Acceleration and low-rank adaptivity guarantees for Nesterov's method in matrix factorization and linear networks.

Good Regularity Creates Large Learning Rate Implicit Biases: Edge of Stability, Balancing, and Catapult

Yuqing Wang, Zhenghao Xu, Tuo Zhao, Molei Tao
JMLR, accepted (to appear)
paper / short version (M3L@NeurIPS 2023)

Large learning rate dynamics in nonconvex optimization, including edge of stability, balancing, and catapult.

Selected Projects

Use this section for non-paper projects, demos, internships, or open-source work.

Design and source code from Jon Barron's website