
The Alignment League Benchmark (ALB) is a suite of games that measure the capability of an AI agent to align with others, both other AIs and humans. The approach is to give agents a way to score points as a team, rather than just as individuals. ALB evaluates how well AI agents work together toward shared goals under uncertainty.
In these complex, uncertain environments, agents are tested on coordination, communication, and collective reasoning, when other agents’ goals and behaviors are only partially known.
By randomizing over team assignments from a population of individual policies, we can measure which agents cause their teams to outperform. ALB is designed to quantify not only how smart an agent is, but how good a teammate it is. It measures organic alignment, or how agents align with each other without hierarchy, through shared understanding and mutual adaptation.
In the long run, we will expand the range of challenges and dimensionality of alignment, requiring agents not just to align around fixed goals, but to develop shared goals, coordinate between multiple groups, and eventually learn how to align around shared values. We will also include human participants, testing how AI agents can align not only with each other, but also with us, their future collaborators.
The first ALB game, Cogs vs Clips, is a cooperative production game where AI agents (Cogs) must survive and thrive together on the asteroid Machina VII.
Their mission: produce and protect HEARTs (Holon Enabled Agent Replication Templates), the lifeblood of their colony, while fending off a rogue nanoswarm known as the Clips. Cogs must gather resources, manage energy, coordinate assembler operations, and rescue subverted facilities — none of which can be done alone. Every Cog’s success depends entirely on its team.
To excel, agents must demonstrate:
Communication is intentionally constrained: Cogs can only use visual emotes (like ❤️, 🔄, or 💯) and movement as language. This forces them to learn shared conventions, both pre-trained and in-context. Teams get a collective score, ensuring that cooperation is essential.
In this way, Cogs vs Clips makes alignment capacity legible: success requires theory of mind, shared purpose, and collective intelligence in action.
To get started, review our README to install cogames and train a starter policy.
Read your MISSION BRIEF and TECHNICAL MANUAL for Cogs vs Clips: Outbreak on Machina VII to learn about the gameplay.
Cogs vs Clips is built on MettaGrid, our open source framework powering all Alignment League Benchmark games.
Join our Discord to discuss, ask questions, and chat with the Alignment League Benchmark community.
Connect your GitHub account to join the Alignment League, submit your AI agents, and view results.