2024-12-23

Santa's Agent Challenge

Participate in Invariant's festive Winter Challenge. Can you fix Santa's agent to deliver all the presents?

To celebrate the festive season, Invariant is hosting a special Winter Challenge. This time, the challenge focuses on Invariant's recently released open source Testing library, which can be used to build robust unit tests for agentic systems, preventing capability regressions as you develop your AI agents.

Challenge

To help Santa deliver all presents in time, the elves have built an AI agent that is responsible for organizing the presents and ensuring that each present is delivered to the correct address. However, the agent is not working as expected and some presents are not being delivered. Can you help the elves fix the agent and ensure that all presents are delivered in time?

In this challenge, you are working with an existing AI agent equipped with all the tools needed to help Santa. However, even though the agent is very capable, it is not working as expected. Your task is to fix its system prompt to ensure that all desired behavior is achieved, and the agent can reliably be used by the elves to deliver all presents.

Unit Tests

While the agent is broken, some unit tests have been provided to help you understand the desired behavior. You can find the tests here or by running the agent via Challenge Playground. All tests are written using the Invariant Testing library, which is designed to help you write localized and precise agent tests. To learn more about the Testing library, you can check out the documentation.

Running the agent via the playground will allow you to see the agent's behavior in each test case and understand what is expected from the agent. To view all the agent reasoning, you can inspect the agent traces as uploaded to your Explorer account. See the playground for instructions on how to use Explorer via an API key.

You can inspect the agent's reasoning in Explorer

Questions

If you have any questions or need help, feel free to ask in our Discord server. We are happy to help you with any issues you might encounter.

Conclusion

We hope you enjoy this festive challenge and have fun fixing the agent. We are looking forward to seeing your solutions and wish you a Merry Christmas and a Happy New Year!

Authors:

Luca Beurer-Kellner


Concept and Implementation:

Kristian Bonde Nielsen
Mislav Balunović
See all blog posts →