Chao Yang

I am Research Scientist at Shanghai AI Lab (上海人工智能实验室), leading a Fundamental Large Model Safety & Decision Intelligence research group.

Recently, I have completed my postdoctoral fellowship under the guidance of Professor Yu Qiao , where my research focused on the security aspects of large-scale models. My work delved into the vulnerabilities and defense mechanisms associated with AI systems, particularly in the context of large language models and their applications.

Previously, I received my Ph.D. in Department of Computer Science and Technology at Tsinghua University in 2022, advised by Prof. Fuchun Sun and Prof. Huaping Liu.

My research interest includes Large Language Model Safety, Multi-modal Large Model, and Robotic Embodied Intelligence for Trustworthy AGI. Some of my current research keywords can be found below:

Large Language Model: LLM Post-training and Safety Alignment, LLM Attack and Defense.
Multimodal LLM: Modality Fusion, Multimodal alignment, VQA.
Embodied Robotics: Robotic Manipulation, Reinforcement Learning, Imitation Learning.

For Academic Cooperation, please feel free to email me at yangchao[at] pjlab [dot] org [dot] cn. For other matters, please contact me at yangchao9264 [at] 126 [dot] com or yangchaoemigmo [at] gmail [dot] com.

news

Jul 15, 2025	🎉 Big Project Release! We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety. SafeWork-R1
Jun 04, 2025	We find patches from harmful content, enabling them to bypass data moderation and generate dangerous responses when encountering the full image or related text. VLMs Can Aggregate Scattered Training Patchesk
May 16, 2025	Our paper Adversarial Preference Learning for Robust LLM Alignment is accepted by ACL2025. Arxiv Link
May 02, 2025	[ICML2025] Emergent Response Planning in LLM and C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation are accepted by ICML2025.
Dec 08, 2024	We proposal a new law, AI 45°-Law toward trustworthy AGI! Arxiv Link
Sep 26, 2024	Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models is accepted by NeurIPS 2024. NeurIPS Link
Sep 23, 2024	Inference-Time Language Model Alignment via Integrated Value Guidance is accepted by EMNLP 2024. Arixv Link
Jul 04, 2024	MM-SafetyBench (A Benchmark for Safety Evaluation of Multimodal Large Language Models) is accepted by ECCV 2024. Project Page.
May 16, 2024	Three Papers (Emulated Disalignment, Structured Reasoning, Multi-Objective DPO) are accepted by ACL 2024.

selected publications

CVPR2024

VideoDistill: Language-aware Vision Distillation for Video Question Answering

Bo Zou^*, Chao Yang^*, Yu Qiao, and 2 more authors

arXiv preprint arXiv:2404.00973, 2024

PDF
CVPR2024

LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction

Bo Zou^*, Chao Yang^*, Yu Qiao, and 2 more authors

2024

PDF
ACL2024 Oral

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

Zhanhui Zhou, Jie Liu, Zhichen Dong, and 4 more authors

arXiv preprint arXiv:2402.12343, 2024
NAACL2024

Attacks, defenses and evaluations for llm conversation safety: A survey

Zhichen Dong, Zhanhui Zhou, Chao Yang+, and 2 more authors

arXiv preprint arXiv:2402.09283, 2024
AAAI2024

Critic-guided decision transformer for offline reinforcement learning

Yuanfu Wang, Chao Yang, Ying Wen, and 2 more authors

arXiv preprint arXiv:2312.13716, 2023
ECCV2024

Safety of Multimodal Large Language Models on Images and Text

Xin Liu, Yichen Zhu, Yunshi Lan, and 2 more authors

arXiv preprint arXiv:2402.00357, 2024