# The Frontier Model Matrix

> Context windows, output limits and pricing for the frontier LLMs an agent-builder reaches for (June 2026), enriched to full EAV depth per research/briefs/models.md. Claude rows are exact, verified against the Anthropic model catalog and AWS Bedrock model cards. Non-Anthropic rows list capability where it is publicly stable and defer pricing/context to the provider, because third-party prices move and we will not print a number we cannot vouch for. Every sourced value carries its primary source URL + last_verified; any value not confirmed against a primary source is a structured placeholder { value:null, verify_status:'verify-against-primary-at-build', source_hint:<primary URL> } rather than a guess.
> Pricing unit: USD per 1M tokens (input / output). Updated 2026-06-15. JSON: /api/models

| Model | Vendor | Model ID | Context | Max out | In $/M | Out $/M |
|---|---|---|---|---|---|---|
| Claude Fable 5 | Anthropic | `claude-fable-5` | 1M | 128K | $10.00 | $50.00 |
| Claude Opus 4.8 | Anthropic | `claude-opus-4-8` | 1M | 128K | $5.00 | $25.00 |
| Claude Sonnet 4.6 | Anthropic | `claude-sonnet-4-6` | 1M | 64K | $3.00 | $15.00 |
| Claude Haiku 4.5 | Anthropic | `claude-haiku-4-5` | 200K | 64K | $1.00 | $5.00 |
| GPT (frontier tier) | OpenAI | `see provider` | see provider | see provider | see provider | see provider |
| Gemini (frontier tier) | Google | `see provider` | 1M+ (varies) | see provider | see provider | see provider |
| Llama (open weights) | Meta | `see provider` | varies | varies | self-host or per-host | self-host or per-host |
| GPT-5 | OpenAI | `gpt-5` | 400K | 128K | $1.25 | $10.00 |
| GPT-5.1 | OpenAI | `gpt-5.1` | 400K | 128K | $1.25 | $10.00 |
| GPT-5 Mini | OpenAI | `gpt-5-mini` | 400K | 128K | $0.25 | $2.00 |
| GPT-5 Codex | OpenAI | `gpt-5-codex` | 400K | 128K | $1.25 | $10.00 |
| GPT-5.1 Codex | OpenAI | `gpt-5.1-codex` | 400K | 128K | $1.25 | $10.00 |
| OpenAI o3 | OpenAI | `o3` | 200K | 100K | $2.00 | $8.00 |
| Gemini 3 Pro | Google | `gemini-3-pro-preview` | 1M | 64K | $2.00 | $12.00 |
| Gemini 3 Flash | Google | `gemini-3-flash-preview` | 1M | 64K | $0.50 | $3.00 |
| Gemini 2.5 Pro | Google | `gemini-2.5-pro` | 1M | 64K | $1.25 | $10.00 |
| Gemini 2.5 Flash | Google | `gemini-2.5-flash` | 1M | 64K | $0.30 | $2.50 |
| Grok 4.3 | xAI | `grok-4.3` | 1M | 30K | $1.25 | $2.50 |
| DeepSeek-V4-Flash (deepseek-chat) | DeepSeek | `deepseek-chat` | 1M | 384K | $0.14 | $0.28 |
| DeepSeek-V4-Flash (deepseek-reasoner) | DeepSeek | `deepseek-reasoner` | 1M | 384K | $0.14 | $0.28 |
| Qwen3 Max | Alibaba | `qwen3-max` | 262K | 64K | $1.20 | $6.00 |
| Qwen3 235B-A22B | Alibaba | `qwen3-235b-a22b` | 131K | 16K | $0.10 | $0.60 |
| Qwen3 Coder Plus | Alibaba | `qwen3-coder-plus` | 1M | 64K | $1.00 | $5.00 |
| Mistral Large | Mistral | `mistral-large-latest` | 262K | 262K | $0.50 | $1.50 |
| Mistral Medium | Mistral | `mistral-medium-latest` | 262K | 262K | $0.40 | $2.00 |
| Magistral Medium | Mistral | `magistral-medium-latest` | 128K | 16K | $2.00 | $5.00 |
| GLM-5 | Zhipu AI | `glm-5` | 200K | 128K | $1.00 | $3.20 |
| GLM-4.7 | Zhipu AI | `glm-4.7` | 200K | 128K | $0.60 | $2.20 |
| GLM-4.6 | Zhipu AI | `glm-4.6` | 200K | 128K | $0.43 | $1.74 |
| Kimi K2 | Moonshot AI | `kimi-k2` | 262K | 262K | see provider | see provider |

- **Claude Fable 5** — Anthropic's most powerful, most intelligent model — a tier above Opus. Adaptive thinking; the model that built this site.
- **Claude Opus 4.8** — Most capable Opus-tier model: state-of-the-art long-horizon agentic execution, knowledge work and memory. 1M context at standard pricing.
- **Claude Sonnet 4.6** — Best balance of speed and intelligence for high-volume production agents. Adaptive thinking; 1M context.
- **Claude Haiku 4.5** — Fastest and most cost-effective Claude model — ideal for subagents, classification and latency-critical steps.
- **GPT (frontier tier)** — OpenAI's flagship reasoning family. Pricing and exact context vary by released variant — check OpenAI's pricing page for current numbers.
- **Gemini (frontier tier)** — Long-context multimodal family; some variants advertise multi-million-token windows. Confirm pricing on Google's pricing page.
- **Llama (open weights)** — Open-weights family you can run yourself; effective price depends on your inference host, not a list price.
- **GPT-5** — OpenAI's flagship reasoning model: 400K context, native tool calling and schema-guaranteed structured output. A frontier agentic workhorse.
- **GPT-5.1** — Refreshed GPT-5 flagship (Nov 2025): same 400K context and tool calling, tuned for agentic workflows.
- **GPT-5 Mini** — Cost-efficient GPT-5 tier for high-volume agents and subagents: 400K context, tool calling and structured output at a fraction of flagship price.
- **GPT-5 Codex** — Coding-agent specialization of GPT-5: 400K context, tool calling and structured output, tuned for software-engineering loops.
- **GPT-5.1 Codex** — Coding-agent specialization of GPT-5.1: 400K context, tool calling and structured output for SWE agents.
- **OpenAI o3** — Dedicated reasoning model with tool calling and structured output: deep multi-step problem solving for analytical agents.
- **Gemini 3 Pro** — Google's frontier long-context multimodal model: ~1M-token window, thinking, tool calling and structured output.
- **Gemini 3 Flash** — Fast, cheap Gemini 3 tier with ~1M context, thinking, tool calling and structured output: built for high-throughput multimodal agents.
- **Gemini 2.5 Pro** — Proven long-context multimodal workhorse: ~1M-token window, thinking, tool calling and structured output.
- **Gemini 2.5 Flash** — High-volume multimodal agent tier: ~1M context, thinking, tool calling and structured output at low cost.
- **Grok 4.3** — xAI's current flagship: 1M-token context, reasoning and tool calling, tuned for agentic chat and coding.
- **DeepSeek-V4-Flash (deepseek-chat)** — Non-thinking mode of DeepSeek-V4-Flash: 1M context, very low price, tool calling. The deepseek-chat API alias.
- **DeepSeek-V4-Flash (deepseek-reasoner)** — Thinking mode of DeepSeek-V4-Flash: 1M context, chain-of-thought reasoning and tool calling at low cost.
- **Qwen3 Max** — Alibaba's flagship Qwen3 tier: 262K context, tool calling and structured output for general agentic tasks.
- **Qwen3 235B-A22B** — Open-weights Qwen3 MoE (235B total / 22B active): 131K context, reasoning and tool calling at very low cost.
- **Qwen3 Coder Plus** — Coding-agent Qwen3 tier: ~1M context, tool calling and structured output for software-engineering loops.
- **Mistral Large** — Mistral's flagship: 262K context, tool calling and structured output for general European-sovereign agent stacks.
- **Mistral Medium** — Mid-tier Mistral: 262K context, tool calling and structured output, balanced cost for production agents.
- **Magistral Medium** — Mistral's reasoning model: 128K context, chain-of-thought reasoning and tool calling.
- **GLM-5** — Zhipu's open-weights flagship: ~200K context, reasoning and tool calling, agentic-oriented.
- **GLM-4.7** — Open-weights GLM-4.7: ~200K context, reasoning, tool calling and structured output at low cost.
- **GLM-4.6** — Open-weights GLM-4.6: ~200K context, reasoning and tool calling, a low-cost agentic workhorse.
- **Kimi K2** — Moonshot's open-weights agentic model: 262K context, reasoning, tool calling and structured output.
