PhD. Student, Reasoning & Learning Group
Department of Computer Science and Technology
National Key Laboratory for Novel Software Technology
Nanjing University, Nanjing 210023, China
Supervisor: Professor Yang Gao , Associate Professor Jing Huo, Assistant Professor Tianpei Yang
I am currently a third-year PhD student in Department of Computer Science and Technology at Nanjing University and a member of Reasoning & Learning Group. I received my B.Eng. degree in 2020 and M.Sc. degree in 2023 both from Northwestern Polytechnical University. In September 2023, I was admitted to study for a Ph.D. degree in Nanjing University.
My current research focuses on Causal Reinforcement Learning.
🔥 News
- 2025.12: 入选2025年中国科协青年科技人才培育工程博士生专项计划.
- 2025.12: We release one work on Reinforcement learning for LLMs Reasoning.
- 2025.12: 获得2025年度国家自然科学基金青年学生基础研究项目(博士研究生)资助.
- 2025.11: One paper on “Multi-agent reinforcement learning” is accepted by IEEE Internet of Things Journal.
- 2025.11: One paper on “Offline reinforcement learning” is accepted by T-NNLS.
- 2025.11: One paper on “Causal Multi-agent reinforcement learning” is accepted by AAAI 2026.
- 2025.11: One paper on “Safety on Large language models” is accepted by Machine Learning.
- 2025.11: One paper on “Multi-agent reinforcement learning” is accepted by Neural Network.
- 2025.04: One paper on “Causal reinforcement learning” is accepted by SCIENCE CHINA Information Sciences (SCIS).
- 2025.02: We release one work on Evaluating LLMs Safety.
- 2025.01: Two papers on “Causal reinforcement learning” are accepted by ICLR 2025.
- 2024.12: One paper on “Multi-agent reinforcement learning” is accepted by AAAI 2025 (Oral).
- 2024.10: One paper on “Bayesian Optimization” is accepted by the Journal of Software (in Chinese).
- 2024.06: One paper on “Causal Reinforcement Learning” is accepted by ICML 2024 Workshop: Foundations of Reinforcement Learning and Control.
- 2024.04: One paper on “eXplainable Reinforcement Learning” is accepted by Chinese Journal of Computers (in Chinese).
- 2024.04: One paper on “Multi-agent reinforcement learning” is accepted by the Journal of Software (in Chinese).
- 2023.12: Two papers on “Multi-agent reinforcement learning” are accepted by ICASSP 2024.
- 2023.10: I hosted the reinforcement learning algorithm session on ECAI-2023.
- 2023.09: I received the excellent master’s degree thesis from Northwestern Polytechnical University.
- 2023.07: One paper on “Reinforcement learning” is accepted by ECAI 2023.
📝 Selected Publications
-
Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning
Hongye Cao, Zhixin Bai, Ziyue Peng, Boyan Wang, Tianpei Yang, Jing Huo, Yuyao Zhang, Yang Gao
arXiv, 2025
Papar -
Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation
Hongye Cao, Fan Feng, Jing Huo, Shangdong Yang, Meng Fang, Tianpei Yang, and Yang Gao
T-NNLS, 2026 -
Causality-Aware Efficient Exploration for Cooperative Multi-Agent Reinforcement Learning
Hongye Cao, Tianpei Yang, Fan Feng, Hammadi Rafik Ouariachi, Yali Du, Meng Fang, Jing Huo, and Yang Gao.
AAAI (poster), 2026 -
Causal Action Empowerment for Efficient Reinforcement Learning in Embodied Agents
Hongye Cao, Fan Feng, Jing Huo, and Yang Gao.
SCIENCE CHINA Information Sciences (SCIS), 2025
Paper Code -
SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
Hongye Cao, Yanming Wang, Sijia Jing, Ziyue Peng, Zhixin Bai, Zhe Cao, Meng Fang, Fan Feng, Boyan Wang, Jiaheng Liu, Tianpei Yang, Jing Huo, Yang Gao, Fanyu Meng, Xi Yang, Chao Deng, Junlan Feng.
arXiv, 2025
Papar GitHub -
Towards Empowerment Gain through Causal Structure Learning in Model-Based RL
Hongye Cao, Fan Feng, Meng Fang, Shaokang Dong, Tianpei Yang, Jing Huo, and Yang Gao.
ICML 2024 Workshop: Foundations of Reinforcement Learning and Control, 2024
ICLR (poster), 2025
OpenReview Project Page -
Causal Information Prioritization for Efficient Reinforcement Learning
Hongye Cao, Fan Feng, Tianpei Yang, Jing Huo, and Yang Gao.
ICLR (poster), 2025
OpenReview Project Page -
A Survey of Interpretability Research Methods for Reinforcement Learning
Hongye Cao, Xiao Liu, Shaokang Dong, Shangdong Yang, Jing Huo, Wenbin Li, Yang Gao.
Chinese Journal of Computers, 2024
PDF -
Enhancing OOD Generalization in Offline Reinforcement Learning with Energy-Based Policy Optimization
Hongye Cao, Shangdong Yang, Jing Huo, Xingguo Chen, Yang Gao.
European Conference on Artificial Intelligence (ECAI), 2023 (Acceptance Rate: 24%=391⁄1631)
PDF
🍀 Projects
- National Natural Science Foundation for Ph.D. students: “Research and Application of Reinforcement Learning Integrating Causal Discovery”
🏆 Honors and Awards
- 2025.12, Young Scientific and Technological Talents Cultivation Project Doctoral Program, China Association for Science and Technology
- 2025.10, National Scholarship for Graduate Students, Nanjing University
- 2025.10, Outstanding Scientific Research and Innovation Project, Nanjing University
- 2023.09, Excellent Master’s Thesis of Northwestern Polytechnical University
- 2023.06, Outstanding Graduates of Shaanxi Province
- 2023.04, Outstanding Graduate Representative of Northwestern Polytechnical University
- 2022.11, Northwestern Polytechnical University Graduate Model Candidate
- 2022.10, National Scholarship for Graduate Students, Northwestern Polytechnical University
- 2021.10, National Scholarship for Graduate Students, Northwestern Polytechnical University
- 2021.08, National Second Prize of the 10th China Software Cup Competition (21/5543)
- 2019.10, China Aerospace Science and Technology Corporation second-class scholarship
👨🎓 Educations
- 2023.09 - now, Ph.D student, Department of Computer Science and Technology, Nanjing University, Nanjing.
- 2020.09 - 2023.04, Master, School of Software, Northwestern Polytechnical University, Xi’an.
- 2016.09 - 2020.06, Bachelor, School of Software, Northwestern Polytechnical University, Xi’an.
💬 Reviewer
- ICLR-26, AAAI-26, ACL ARR
- IEEE Transactions on Neural Networks and Learning Systems, IEEE Transactions on Artificial Intelligence, Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
- 软件学报
💬 Chair
- Session Chair: ECAI 2023 Session:Reinforcement Learning Algorithms
💬 Talk
- 2025.08, CCF-AI