Capability-Safety-Paradox

Defense training designed to protect LLM agents from prompt injection doesn’t just fail — it makes agents worse at everything, including security. A new paper reveals how safety training teaches surface shortcuts that destroy tool-use competence while sophisticated attacks walk right through.