Research
My research interest covers the agent alignment with human interest, e.g. RLHF of LLMs, and the alignment between self-interest agents, e.g. emergence of collaboration and agreement in a multi-agent system.
|
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji*, Mickel Liu*, Juntao Dai*, Xuehai Pan, Chi Zhang, Ce Bian, Ruiyang Sun, Yizhou Wang, Yaodong Yang (*equal contribution, random ordering)
Preprint
/
Code
/
Hugging Face
NeurIPS 2023
We present the BeaverTails dataset for safety research in large language models. Our findings show that modeling decoupled human preferences for helpfulness and harmlessness improves LLM safety without sacrificing performance.
|
|
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai*, Xuehai Pan*, Ruiyang Sun*, Jiaming Ji*, Xinbo Xu, Mickel Liu, Yizhou Wang, Yaodong Yang (*equal contribution)
Preprint
/
Code
/
Website
ICLR 2024 Spotlight
Building on the previous BeaverTails dataset, we introduce an RLHF algorithm with safety constraints. Using the Lagrangian method, Safe RLHF fine-tunes the balance between harmlessness and helpfulness. Our three-round fine-tuning shows better mitigation of harmful responses and improved performance over existing value-aligned algorithms.
This work has been graciously promote-tweeted by Ahsen (@_akhaliq).
|
|
Baichuan 2: Open large-scale language models
Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Chao Yin, Chenxu Lv, Da Pan, Dian Wang, Dong Yan, Fan Yang, Fei Deng, Feng Wang, Feng Liu, Guangwei Ai, Guosheng Dong Haizhou Zhao, Hang Xu, Haoze Sun, Hongda Zhang, Hui Liu, Jiaming Ji, Jian Xie, Juntao Dai, Kun Fang, Lei Su Liang Song, Lifeng Liu, Liyun Ru, Luyao Ma, Mang Wang, Mickel Liu, MingAn Lin, Nuolan Nie, Peidong Guo, Ruiyang Sun, Tao Zhang, Tianpeng Li, Tianyu Li, Wei Cheng, Weipeng Chen, Xiangrong Zeng, Xiaochuan Wang, Xiaoxi Chen, Xin Men, Xin Yu, Xuehai Pan, Yanjun Shen, Yiding Wang, Yiyu Li, Youxin Jiang, Yuchen Gao, Yupeng Zhang, Zenan Zhou, Zhiying Wu (alphabetical ordering)
Preprint
/
Code
/
Hugging Face
/
Bloomberg ($1bn valuation)
Technical report in public archive
During my time at Baichuan, I have participated in the open-sourcing of our LLMs, which were trained from scratch on 2.6 trillion tokens, and credited as an author to in the corresponding technical report.
Baichuan 2 matches or exceeds the performance of other open-source models in its class across various public benchmarks, including MMLU, CMMLU, GSM8K, and HumanEval. Notably, Baichuan2-13B-Chat achieved the highest score on the SuperCLUE-agent benchmark among all open-sourced models.
|
|
Proactive Multi-Camera Collaboration For 3D Human Pose Estimation
Hai Ci*, Mickel Liu*, Xuehai Pan*, Fangwei Zhong, Yizhou Wang (*equal contribution)
Proceeding
/
Code
/
Website
ICLR 2023
Active3DPose Presents a multi-agent reinforcement learning (MARL) scheme for proactive Multi-Camera Collaboration in 3D Human Pose Estimation in dynamic human crowds. Aerial cameras are decentralized, self-interest agents, but need to collaborate to complete the given task. We proposed a reward structure inspired by the solution concept of Shapley value that helps facilitating the process of collaboration. The simulation environment is built using UnrealEngine and we used distributive RL framework Ray RLlib to train our agents.
|
|
MATE: Benchmarking multi-agent reinforcement learning in distributed target coverage control
Xuehai Pan, Mickel Liu, Fangwei Zhong, Yaodong Yang, Song-Chun Zhu, Yizhou Wang
Proceeding
/
Code
/
Doc
NeurIPS 2022
We introduce the Multi-Agent Tracking Environment (MATE), a novel multi-agent environment simulates the target coverage control problems in the real world. MATE hosts an asymmetric cooperative-competitive game consisting of two groups of learning agents — "cameras" and "targets" — with opposing interests. This process of co-evolution between cameras and targets helps to realize a less exploitable camera network.
|
|
OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research
Jiaming Ji*, Jiayi Zhou*, Borong Zhang*, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang (*core developers)
Preprint
/
Code
/
Website
JMLR 2024
We introduce a framework designed to expedite SafeRL research endeavors. Our framework encompasses an collection of algorithms spanning different RL domains and places heavy emphasis on safety elements.
|
Miscellanea
Recipient of Paul G. Allen School CSE Fellowship
Conference Reviewer: ICML 2023, NeurIPS 2022-24, ICLR 2023-25, AAAI 2024
Recipient of CSC Fellowship from 2020 to 2023
|
|