LIVE
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms
ANTHROPICOpus 4.7 benchmarks published2m ago
CLAUDEOK142ms
OPUS 4.7$15 / $75per Mtok
CHATGPTOK89ms
HACKERNEWSWhy has not AI improved design quality the way it improved dev speed?14m ago
MMLU-PROleader Opus 4.788.4
GEMINIDEGRADED312ms
MISTRALMistral Medium 3 released6m ago
GPT-4o$5 / $15per Mtok
ARXIVCompositional reasoning in LRMs22m ago
BEDROCKOK178ms
GEMINI 2.5$3.50 / $10.50per Mtok
THE VERGEFrontier Model Forum expansion announced38m ago
SWE-BENCHleader Claude Opus 4.772.1%
MISTRALOK104ms

Compare AI Models

Side-by-side comparison of pricing, capabilities, and specs.

Catalog refreshed Apr 17, 2026 · updates daily

Choosing between Claude, GPT-4, and Gemini isn't obvious. They're all excellent models, but they have different strengths, different pricing, and different deployment options. Claude excels at nuanced reasoning and long-context analysis. GPT-4 dominates at code generation and multimodal understanding. Gemini has the longest context window available. The question is: which one makes sense for your use case and your budget?

Use this comparison tool to evaluate models across the dimensions that matter: pricing (input and output token costs), context window (how much text you can process at once), release date, and capabilities (coding, vision, function calling, fine-tuning). Context window is often overlooked but critical. A model that costs half as much but has a 4K context window isn't useful if your documents are 20K tokens. Benchmarks tell you how smart a model is; we show you what it costs and what it can do.

The pricing vs performance tradeoff is central to AI economics. The newest flagship models offer marginal improvements over previous generations for 2-3x the cost. Open-source models like Llama are free but require self-hosting infrastructure. Small models like Claude Haiku or GPT-4o-mini are dramatically cheaper and often sufficient for non-critical tasks. See our pricing guide for deeper analysis and cost optimization strategies, and benchmarks to understand raw capability.

Select at least two models above to see the comparison table.

Popular Comparisons