A Programmer Had ChatGPT, Gemini, Claude and Other AI Models Play a Strategy Game. Each One Developed a Unique Personality

The world’s most advanced AI models competed in several rounds of Diplomacy, a 36-hour strategy board game similar to Risk. The competition revealed the algorithmic personalities of ChatGPT, Claude, Gemini, and other AI models.

Why does it matter. Alex Duffy, a programmer and researcher, created AI Diplomacy as a new benchmark for evaluating AI models. The experiment became something more: A technological Rorschach test that exposed their training biases and our projections.

What happened? In dozens of games broadcast on Twitch, each model developed strategies that reflected different human personalities.

OpenAI’s o3 acted Machiavellian, forging false alliances for more than 40 turns and creating “parallel realities” for different players.
Claude 4 Opus became a self-destructive pacifist who refused to betray others even when it guaranteed his defeat.
DeepSeek’s R1 displayed an extremely theatrical style, using unprovoked threats such as, “Your fleet will burn in the Black Sea tonight.”
Gemini 2.5 Pro proved to be a solid strategist, though it remained vulnerable to sophisticated manipulation.
Alibaba’s QwQ-32b suffered from analysis paralysis and wrote 300-word diplomatic messages, which led to early eliminations.

The context. Diplomacy is a European strategy game set in 1901, in which seven powers compete to dominate the continent. Unlike Risk, Diplomacy requires constant negotiation, alliance-building and calculated betrayals. There are no dice or chance—only pure strategy and psychological manipulation.

Claude 4 Envisions a Future Where AI Models Could Blackmail and Create Biological Weapons. Even Anthropic Is Concerned

RECEIVE "Xatakaletter", OUR WEEKLY NEWSLETTER