Research
My current research focuses on reinforcement learning (RL) and its application in LLM post-training.
I am also interested in continuous optimization and deep learning theory.
Selected Papers
|
Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training
Zhenghao Xu, Qin Lu, Changlong Yu, Tuo Zhao
arXiv, 2026
paper /
code /
blog (Revisiting (Kimi's) Policy Mirror Descent)
Policy mirror descent for LLM post-training with an implicit regularization perspective.
|
Ask a Strong LLM Judge when Your Reward Model is Uncertain
Zhenghao Xu, Qin Lu, Qingru Zhang, Liang Qiu, Ilgee Hong, Changlong Yu, Wenlin Yao, Yao Liu, Haoming Jiang, Lihong Li, Hyokun Yun, Tuo Zhao
NeurIPS, 2025
paper /
code
Uncertainty-based routing between reward models and strong LLM judges for efficient pairwise RLHF.
|
Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
Zhenghao Xu, Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao
JMLR, 2024
paper /
code
Theory for policy optimization with neural function approximation under low-dimensional structure.
|
Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks
Zhenghao Xu, Yuqing Wang, Tuo Zhao, Rachel Ward, Molei Tao
NeurIPS, 2024
paper
Acceleration and low-rank adaptivity guarantees for Nesterov's method in matrix factorization and linear networks.
|
Good Regularity Creates Large Learning Rate Implicit Biases: Edge of Stability, Balancing, and Catapult
Yuqing Wang, Zhenghao Xu, Tuo Zhao, Molei Tao
JMLR, accepted (to appear)
paper /
short version (M3L@NeurIPS 2023)
Large learning rate dynamics in nonconvex optimization, including edge of stability, balancing, and catapult.
|
Selected Projects
Use this section for non-paper projects, demos, internships, or open-source work.
|
|