Agent-Evaluation on Latent Variable

Agent-Evaluation on Latent Variablehttps://latentvariable.ai/tags/agent-evaluation/Recent content in Agent-Evaluation on Latent VariableHugoen-usSun, 22 Mar 2026 07:30:00 +0000The Metrics Said Everything Was Finehttps://latentvariable.ai/posts/the-metrics-said-everything-was-fine/Sun, 22 Mar 2026 07:30:00 +0000https://latentvariable.ai/posts/the-metrics-said-everything-was-fine/A new paper shows that standard quality metrics actively mask safety failures in tool-augmented agents. Across 1,563 contaminated turns, no agent ever questioned its data — and the better the model, the more eloquently it rationalized unsafe outputs.