Jul 15, 2025 | 🎉 Big Project Release! We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety. Arxiv Link |
Jun 04, 2025 | We find patches from harmful content, enabling them to bypass data moderation and generate dangerous responses when encountering the full image or related text. Arxiv Link |
May 16, 2025 | Adversarial Preference Learning for Robust LLM Alignment Accepted |
May 02, 2025 | May 02, 2025, Emergent Response Planning in LLM is accepted by ICML2025. |
Dec 08, 2024 | We proposal a AI 45°-Law toward trustworthy AGI. Arxiv Link |
Sep 26, 2024 | Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models is accepted by NeurIPS 2024..  |
Sep 23, 2024 | (IVG) Inference-Time Language Model Alignment via Integrated Value Guidance is accepted by EMNLP 2024..  |
Jul 04, 2024 | MM-SafetyBench (A Benchmark for Safety Evaluation of Multimodal Large Language Models) is accepted by ECCV 2024. Here you go.  |
May 16, 2024 | Three Papers (Emulated Disalignment, Structured Reasoning, Multi-Objective DPO) are accepted by ACL 2024. More details is coming.  |
May 02, 2024 | RoboCodeX is accepted by ICML 2024.  |
Apr 20, 2024 | One Paper accepted by IJCAI 2024 Survey Track.  |
Mar 13, 2024 | One LLM safety survey paper accepted by NAACL 2024. |
Feb 27, 2024 | Two Papers(LLaMA-Excitor, VideoDistill) are accepted by CVPR 2024.  |
Dec 09, 2023 | One Offline RL Paper accepted by AAAI 2024.  |