Stebėti
Han Zhong
Han Zhong
Patvirtintas el. paštas stu.pku.edu.cn - Pagrindinis puslapis
Pavadinimas
Cituota
Cituota
Metai
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond
H Zhong, W Xiong, S Zheng, L Wang, Z Wang, Z Yang, T Zhang
arXiv preprint arXiv:2211.01962, 2022
50*2022
Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game
W Xiong, H Zhong, C Shi, C Shen, L Wang, T Zhang
arXiv preprint arXiv:2205.15512, 2022
442022
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers?
H Zhong, Z Yang, Z Wang, MI Jordan
Journal of Machine Learning Research 24 (35), 1-52, 2023
38*2023
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
X Chen, H Zhong, Z Yang, Z Wang, L Wang
International Conference on Machine Learning, 3773-3793, 2022
382022
Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets
H Zhong, W Xiong, J Tan, L Wang, T Zhang, Z Wang, Z Yang
International Conference on Machine Learning, 27117-27142, 2022
362022
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
W Xiong, H Dong, C Ye, Z Wang, H Zhong, H Ji, N Jiang, T Zhang
ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation …, 2023
33*2023
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Z Liu, M Lu, W Xiong, H Zhong, H Hu, S Zhang, S Zheng, Z Yang, Z Wang
Thirty-seventh Conference on Neural Information Processing Systems, 2023
28*2023
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
W Xiong, H Zhong, C Shi, C Shen, T Zhang
International Conference on Machine Learning, 24496-24523, 2022
252022
Why robust generalization in deep learning is difficult: Perspective of expressive power
B Li, J Jin, H Zhong, J Hopcroft, L Wang
Advances in Neural Information Processing Systems 35, 4370-4384, 2022
242022
A theoretical analysis of optimistic proximal policy optimization in linear markov decision processes
H Zhong, T Zhang
Advances in Neural Information Processing Systems 36, 2024
182024
Double pessimism is provably efficient for distributionally robust offline reinforcement learning: Generic algorithm and robust partial coverage
J Blanchet, M Lu, T Zhang, H Zhong
Advances in Neural Information Processing Systems 36, 2024
172024
Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs
H Zhong, Z Yang, Z Wang, C Szepesvári
arXiv preprint arXiv:2110.08984, 2021
172021
Nearly optimal policy optimization with stable at any time guarantee
T Wu, Y Yang, H Zhong, L Wang, S Du, J Jiao
International Conference on Machine Learning, 24243-24265, 2022
132022
DPO Meets PPO: Reinforced Token Optimization for RLHF
H Zhong, G Feng, W Xiong, L Zhao, D He, J Bian, L Wang
arXiv preprint arXiv:2404.18922, 2024
62024
Tackling heavy-tailed rewards in reinforcement learning with function approximation: Minimax optimal and instance-dependent regret bounds
J Huang, H Zhong, L Wang, L Yang
Advances in Neural Information Processing Systems 36, 2024
62024
Provable Sim-to-real Transfer in Continuous Domain with Partial Observations
J Hu, H Zhong, C Jin, L Wang
arXiv preprint arXiv:2210.15598, 2022
62022
A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning
Y Yang, T Wu, H Zhong, E Garcelon, M Pirotta, A Lazaric, L Wang, SS Du
International Conference on Learning Representations, 2021/9/29, 2021
6*2021
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
R Yang, X Pan, F Luo, S Qiu, H Zhong, D Yu, J Chen
arXiv preprint arXiv:2402.10207, 2024
52024
Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs
H Zhong, J Huang, L Yang, L Wang
Advances in Neural Information Processing Systems 34, 2021
52021
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy
H Zhong, X Deng, EX Fang, Z Yang, Z Wang, R Li
arXiv preprint arXiv:2012.14098, 2020
42020
Sistema negali atlikti operacijos. Bandykite vėliau dar kartą.
Straipsniai 1–20