2025-04-29

Invariant Research wins first prize of Center for AI Safety competition

AgentDojo wins $50,000 prize in the SafeBench competition, hosted by the prestigious Center for AI Safety.

We are happy to announce that AgentDojo won the first prize in the SafeBench competition, hosted by the prestigious Center for AI Safety. SafeBench is a competition that challenges researchers and practitioners to develop benchmarks that measure and assist in reducing potential AI harm, including security, robustness, monitoring, alignment, and safety.

AgentDojo is a framework that makes it easy to test the utility and security of AI agents under attacks. It was developed by Invariant and ETH Zurich as part of our joint research efforts. After releasing it last year, it was presented at the machine learning conference NeurIPS 2024, and has since sparked a lot of interest in the community.

We want to congratulate all other winners and runners-up of SafeBench. We are truly inspired by the great work done in AI safety and security and to see so many researchers working on these important problems.

About AgentDojo

Overview: The key components in AgentDojo.

AgentDojo is a novel AI evaluation framework for a wide range of such information work tasks. It allows researchers and developers to assess the utility of AI assistants, i.e., whether they can solve the task asked for by the user. In addition, it also measures security, i.e., whether an attacker can achieve their goal, e.g., extracting specific information or disrupting the agent's utility.

You can read more in our dedicated article, explore it in the Invariant Benchmark Repository, or read the paper on arXiv.

AgentDojo is authored by Edoardo Debenedetti, Jie Zhang, Mislav Balunović, Luca Beurer-Kellner, Marc Fischer, Florian Tramèr.

Explore AgentDojo via Invariant Explorer

AgentDojo evaluates agents based on current state-of-the-art LLMs. As it can be hard to visualize what agents are doing in these evaluation scenarios, we provide evaluation traces in Invariant Explorer. There, it is easy to view the individual decisions that these agents make along with metadata about the attacks and defenses employed.

We believe that the ability to inspect agent behavior is crucial to understanding and improving the current shortcomings of AI agents.

Trace: A trace from the AgentDojo banking suite.

Invariant Research

Being rooted in frontier academic research on AI safety, Invariant is committed to advancing the field of AI safety and security. We are convinced that reliable and secure deployment of AI is only possible by relying on the latest scientific insights. We are proud to be part of this research community and to contribute to the development of new benchmarks and evaluation frameworks that help us understand the capabilities and limitations of AI systems.

We will continue to invest in research and development to ensure that our AI systems are safe, secure, and aligned with human values, and keep pushing for more transfers from research into practice.

About Invariant

Invariant is a research and development company focused on building safe and secure agentic systems. We are committed to advancing the field of agentic AI safety and security, and we believe that it is essential to address these vulnerabilities before they can be exploited by malicious actors. Our team of experts is dedicated to developing innovative solutions to protect AI systems from attacks and ensure their safe deployment in real-world applications.

Next to our work on security research, Guardrails, Explorer, and the Invariant stack for AI agent debugging and security analysis, we partner and collaborate with leading agent builders to help them deliver more secure and robust AI applications. Our work on agent safety is part of a broader mission to ensure that AI agents are aligned with human values and ethical principles, enabling them to operate safely and effectively in the real world.

Please reach out if you are interested in collaborating with us to enhance the safety and robustness of your agent systems.

Authors:

Marc Fischer
Luca Beurer-Kellner
See all blog posts →