InteractiveBench · March 2026

Interactive Benchmarks

A unified evaluation paradigm that assesses LLMs' reasoning ability through active information acquisition — spanning Interactive Proofs (Logic & Math) and Interactive Games (Poker & Trust).

Explore Benchmarks → View Leaderboard

Scroll

🤖

Frontier LLMs Evaluated

🧩

Benchmark Domains

📊

Benchmark Instances

♠️

5K+

Poker Hands Simulated

🏆

76.9%

Best Math Accuracy

Evaluation Framework

Two paradigms, four domains

★ InteractiveBench · Mar 2026

Interactive Proofs & Games

Interactive Benchmarks

A unified evaluation paradigm assessing LLMs' active information-acquisition ability under budget constraints — covering abductive logic reasoning, mathematical solving, strategic poker playing, and adaptive trust games simulating.

Logic · Interactive Proof

Situation Puzzle

46 instances · Best: Gemini 30.4%

Math · Interactive Proof

HLE Math Problems

52 instances · Best: Grok 76.9%

Poker · Interactive Game

Texas Hold'em

5,000 hands · Best: Gemini +31.8/hand

Trust · Interactive Game

Iterated Prisoner's Dilemma

Round-robin · Best: Qwen3 1.867/round