news

Jul 15, 2025 🎉 Big Project Release! We introduce SafeWork-R1, a cutting-edge multimodal reasoning model that demonstrates the coevolution of capabilities and safety. Arxiv Link :rocket: :sparkles:
Jun 04, 2025 We find patches from harmful content, enabling them to bypass data moderation and generate dangerous responses when encountering the full image or related text. Arxiv Link :sparkles:
May 16, 2025 Adversarial Preference Learning for Robust LLM Alignment Accepted
May 02, 2025 May 02, 2025, Emergent Response Planning in LLM is accepted by ICML2025. :sparkles:
Dec 08, 2024 We proposal a AI 45°-Law toward trustworthy AGI. Arxiv Link :sparkles:
Sep 26, 2024 Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models is accepted by NeurIPS 2024.. :sparkles::sparkles:
Sep 23, 2024 (IVG) Inference-Time Language Model Alignment via Integrated Value Guidance is accepted by EMNLP 2024.. :sparkles::sparkles:
Jul 04, 2024 MM-SafetyBench (A Benchmark for Safety Evaluation of Multimodal Large Language Models) is accepted by ECCV 2024. Here you go. :sparkles::sparkles:
May 16, 2024 Three Papers (Emulated Disalignment, Structured Reasoning, Multi-Objective DPO) are accepted by ACL 2024. More details is coming. :sparkles::sparkles:
May 02, 2024 RoboCodeX is accepted by ICML 2024. :sparkles::sparkles:
Apr 20, 2024 One Paper accepted by IJCAI 2024 Survey Track. :sparkles::sparkles:
Mar 13, 2024 One LLM safety survey paper accepted by NAACL 2024. :sparkles: :smile:
Feb 27, 2024 Two Papers(LLaMA-Excitor, VideoDistill) are accepted by CVPR 2024. :sparkles::sparkles:
Dec 09, 2023 One Offline RL Paper accepted by AAAI 2024. :sparkles::sparkles: