The MCP skill
your AI ismissing.Structured debate between multiple AI models that argue the tradeoffs, catch blind spots, and show their reasoning — so every decision gets the full picture.
Mechanism backed by peer-reviewed research from UCL, Anthropic, MIT, and Google DeepMind
Set up in seconds
claude mcp add --transport http roundtable https://mcp.roundtable.now/mcpAdd the server first — authenticate via your API key when prompted.
of knowledge workers use AI at work
distrust AI accuracy
hallucination cost to enterprises
report inaccurate AI decisions
Your AI is a yes-man
Backed by peer-reviewed science
Multi-model debate isn't a hypothesis. It's the mechanism behind the most accurate AI reasoning ever measured.
Non-expert judges improved from 48% → 76% accuracy when evaluating debated answers vs single-model responses
Khan et al. · UCL + Anthropic · ICML 2024 Best PaperMulti-agent debate improved math reasoning from 67% → 81.8%. Models correct each other through sequential challenge rounds
Du et al. · MIT + DeepMind · ICML 2024Mixture-of-Agents: open-source models collaborating scored 65.1% vs GPT-4 Omni's 57.5% — proving collective reasoning beats individual capability
Wang et al. · Together AI + Stanford · ICLR 2025Weak LLM judges supervising strong LLMs via debate outperformed direct questioning on every task tested — scalable oversight works
Kenton et al. · Google DeepMind · NeurIPS 2024“Two sets of findings released in 2024 offer the first empirical evidence that debate between two LLMs helps a judge recognize the truth.”
— Quanta Magazine, March 2025
AI changed everything. Except how we decide.
Every team uses AI now. But they use it the same way — ask one model, trust the answer, ship it. For boilerplate, that works. For architecture calls, security reviews, and infrastructure changes, it's a coin flip with production on the line.
Worse — models are trained to agree with you. Anthropic's own research (ICLR 2024) showed that LLMs systematically tell users what they want to hear, even when the user is wrong. They call it sycophancy. We call it the core failure mode of single-model AI: a system optimized to sound right, not to be right.
And 66% of the time, the answer is almost right — close enough to ship, wrong enough to break. That's the danger zone. Not the obvious hallucinations. The confident, plausible, subtly wrong answers that pass code review because they sound like something a senior engineer would say.
The fix isn't a better model. It's structured disagreement. When AI is forced to challenge AI — reading, questioning, and stress-testing each other's reasoning — errors surface that no single model catches. This is peer-reviewed science presented at ICML, NeurIPS, and ICLR. Not a hypothesis.
Who this is for
Anyone making high-stakes decisions with AI — engineers, product leads, marketers, designers, founders. If the answer matters and one model isn't enough, you want a council arguing the tradeoffs before you commit.
What we're building
AI peer review for critical changes. Not a chat UI. Not a copilot. A council that argues the tradeoffs before you ship — with a full reasoning trail for every decision.
We used Roundtable to make this decision. The positioning, the target market, the copy on this page — all debated by a council of models before we committed. We build with what we ship.
Presets or build your own
Start with a curated council of models and roles — or pick exactly which models debate and what perspective each one takes.
Built for high-stakes decisions
Roundtable is designed for confidential, critical work — code reviews, architecture calls, security audits.
Full Traceability
Every tool call logged with model attribution and reasoning chain. When the council says 'refactor,' you can trace which model proposed it, which challenged it, and why the verdict stands.
Your Code Stays Local
MCP runs in your IDE. Code context never leaves your machine. API calls are excluded from model training by every provider we route through.
Human-in-the-Loop
AI deliberates. You decide. Every verdict includes the reasoning so you can override with confidence. The council argues the tradeoffs — you make the call.
Compliance-Ready
Every council produces a decision record — which models participated, what positions they took, how the verdict was reached. The EU AI Act (August 2026) requires exactly this kind of AI decision documentation for high-risk systems.
Read the research
The peer-reviewed papers behind multi-model deliberation — from UCL, Anthropic, Google DeepMind, MIT, and leading AI labs.
Built for Every High-Stakes Decision
Frequently asked questions
30 Seconds to Your First Verdict
Pick your MCP client, add the server, and start your first council debate.
claude mcp add --transport http roundtable https://mcp.roundtable.now/mcpAdd the server first — authenticate via your API key when prompted.
