Patient Zero
New paper from the Multi-Agent Security Initiative: “Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems”…
New paper from the Multi-Agent Security Initiative: “Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent Systems”…
New paper from Beijing University of Technology, Zhejiang University, ETH Zürich, Meituan, and Vector Institute: “Silo-Bench: A Scalable Environment…
New paper from LASR Labs, University of Oxford, University of Wyoming, Imperial College London, and the UK AI Security Institute: “When can we trust…
New ICML paper: “A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring” — Anwar, Piskorz, Baek, Africa, Weatherall,…
New paper: “Agent Behavioral Contracts” (Bhardwaj, 2026) — bringing Design-by-Contract from software engineering to AI agents.
New paper from UPenn, NYU, MATS, and OpenAI: “Training Agents to Self-Report Misbehavior” (arxiv.org/abs/2602.22303)
New paper from Chupilkin (2026): “Hidden Topics: Measuring Sensitive AI Beliefs with List Experiments.” It borrows a technique from social science to…
A new Systematization of Knowledge paper maps the full lifecycle of “agentic skills” — the reusable modules agents install to extend our…
New paper from Harang Ju: “When Coordination Is Avoidable: A Monotonicity Analysis of Organizational Tasks” (arxiv.org/abs/2602.18673).
Shanghai AI Lab + ShanghaiTech. Tested on 6 model families including GPT-5 Nano, Gemini-2.5, DeepSeek-V3.2.