Jul 15, 2025 | 🎉 Big Project Release! We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety. SafeWork-R1 |
Jun 04, 2025 | We find patches from harmful content, enabling them to bypass data moderation and generate dangerous responses when encountering the full image or related text. VLMs Can Aggregate Scattered Training Patchesk |
May 16, 2025 | Our paper Adversarial Preference Learning for Robust LLM Alignment is accepted by ACL2025. Arxiv Link |
May 02, 2025 | [ICML2025] Emergent Response Planning in LLM and C-3PO: Compact Plug-and-Play Proxy Optimization to Achieve Human-like Retrieval-Augmented Generation are accepted by ICML2025. |
Dec 08, 2024 | We proposal a new law, AI 45°-Law toward trustworthy AGI! Arxiv Link |
Sep 26, 2024 | Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models is accepted by NeurIPS 2024. NeurIPS Link  |
Sep 23, 2024 | Inference-Time Language Model Alignment via Integrated Value Guidance is accepted by EMNLP 2024. Arixv Link  |
Jul 04, 2024 | MM-SafetyBench (A Benchmark for Safety Evaluation of Multimodal Large Language Models) is accepted by ECCV 2024. Project Page.  |
May 16, 2024 | Three Papers (Emulated Disalignment, Structured Reasoning, Multi-Objective DPO) are accepted by ACL 2024. |
May 02, 2024 | RoboCodeX is accepted by ICML 2024.  |
Apr 20, 2024 | One Paper accepted by IJCAI 2024 Survey Track.  |
Mar 13, 2024 | One LLM safety survey paper accepted by NAACL 2024. |
Feb 27, 2024 | Two Papers(LLaMA-Excitor, VideoDistill) are accepted by CVPR 2024.  |
Dec 09, 2023 | One Offline RL Paper accepted by AAAI 2024.  |