GET /api/models · 30 models · updated 2026-06-15
The Frontier Model Matrix
What it costs to think. Context windows, output ceilings and per-million-token pricing for the models an agent-builder reaches for. Claude rows are exact; for other vendors we list stable capability and defer pricing to the provider rather than print a number we can't vouch for.
| Model | Model ID | Context | Max out | In $/M | Out $/M | Strengths |
|---|---|---|---|---|---|---|
| Claude Fable 5Anthropic | claude-fable-5 |
1M | 128K | $10.00 | $50.00 | Anthropic's most powerful, most intelligent model — a tier above Opus. Adaptive thinking; the model that built this site. |
| Claude Opus 4.8Anthropic | claude-opus-4-8 |
1M | 128K | $5.00 | $25.00 | Most capable Opus-tier model: state-of-the-art long-horizon agentic execution, knowledge work and memory. 1M context at standard pricing. |
| Claude Sonnet 4.6Anthropic | claude-sonnet-4-6 |
1M | 64K | $3.00 | $15.00 | Best balance of speed and intelligence for high-volume production agents. Adaptive thinking; 1M context. |
| Claude Haiku 4.5Anthropic | claude-haiku-4-5 |
200K | 64K | $1.00 | $5.00 | Fastest and most cost-effective Claude model — ideal for subagents, classification and latency-critical steps. |
| GPT (frontier tier)OpenAI | see provider |
see provider | see provider | see provider | see provider | OpenAI's flagship reasoning family. Pricing and exact context vary by released variant — check OpenAI's pricing page for current numbers. |
| Gemini (frontier tier)Google | see provider |
1M+ (varies) | see provider | see provider | see provider | Long-context multimodal family; some variants advertise multi-million-token windows. Confirm pricing on Google's pricing page. |
| Llama (open weights)Meta | see provider |
varies | varies | self-host or per-host | self-host or per-host | Open-weights family you can run yourself; effective price depends on your inference host, not a list price. |
| GPT-5OpenAI | gpt-5 |
400K | 128K | $1.25 | $10.00 | OpenAI's flagship reasoning model: 400K context, native tool calling and schema-guaranteed structured output. A frontier agentic workhorse. |
| GPT-5.1OpenAI | gpt-5.1 |
400K | 128K | $1.25 | $10.00 | Refreshed GPT-5 flagship (Nov 2025): same 400K context and tool calling, tuned for agentic workflows. |
| GPT-5 MiniOpenAI | gpt-5-mini |
400K | 128K | $0.25 | $2.00 | Cost-efficient GPT-5 tier for high-volume agents and subagents: 400K context, tool calling and structured output at a fraction of flagship price. |
| GPT-5 CodexOpenAI | gpt-5-codex |
400K | 128K | $1.25 | $10.00 | Coding-agent specialization of GPT-5: 400K context, tool calling and structured output, tuned for software-engineering loops. |
| GPT-5.1 CodexOpenAI | gpt-5.1-codex |
400K | 128K | $1.25 | $10.00 | Coding-agent specialization of GPT-5.1: 400K context, tool calling and structured output for SWE agents. |
| OpenAI o3OpenAI | o3 |
200K | 100K | $2.00 | $8.00 | Dedicated reasoning model with tool calling and structured output: deep multi-step problem solving for analytical agents. |
| Gemini 3 ProGoogle | gemini-3-pro-preview |
1M | 64K | $2.00 | $12.00 | Google's frontier long-context multimodal model: ~1M-token window, thinking, tool calling and structured output. |
| Gemini 3 FlashGoogle | gemini-3-flash-preview |
1M | 64K | $0.50 | $3.00 | Fast, cheap Gemini 3 tier with ~1M context, thinking, tool calling and structured output: built for high-throughput multimodal agents. |
| Gemini 2.5 ProGoogle | gemini-2.5-pro |
1M | 64K | $1.25 | $10.00 | Proven long-context multimodal workhorse: ~1M-token window, thinking, tool calling and structured output. |
| Gemini 2.5 FlashGoogle | gemini-2.5-flash |
1M | 64K | $0.30 | $2.50 | High-volume multimodal agent tier: ~1M context, thinking, tool calling and structured output at low cost. |
| Grok 4.3xAI | grok-4.3 |
1M | 30K | $1.25 | $2.50 | xAI's current flagship: 1M-token context, reasoning and tool calling, tuned for agentic chat and coding. |
| DeepSeek-V4-Flash (deepseek-chat)DeepSeek | deepseek-chat |
1M | 384K | $0.14 | $0.28 | Non-thinking mode of DeepSeek-V4-Flash: 1M context, very low price, tool calling. The deepseek-chat API alias. |
| DeepSeek-V4-Flash (deepseek-reasoner)DeepSeek | deepseek-reasoner |
1M | 384K | $0.14 | $0.28 | Thinking mode of DeepSeek-V4-Flash: 1M context, chain-of-thought reasoning and tool calling at low cost. |
| Qwen3 MaxAlibaba | qwen3-max |
262K | 64K | $1.20 | $6.00 | Alibaba's flagship Qwen3 tier: 262K context, tool calling and structured output for general agentic tasks. |
| Qwen3 235B-A22BAlibaba | qwen3-235b-a22b |
131K | 16K | $0.10 | $0.60 | Open-weights Qwen3 MoE (235B total / 22B active): 131K context, reasoning and tool calling at very low cost. |
| Qwen3 Coder PlusAlibaba | qwen3-coder-plus |
1M | 64K | $1.00 | $5.00 | Coding-agent Qwen3 tier: ~1M context, tool calling and structured output for software-engineering loops. |
| Mistral LargeMistral | mistral-large-latest |
262K | 262K | $0.50 | $1.50 | Mistral's flagship: 262K context, tool calling and structured output for general European-sovereign agent stacks. |
| Mistral MediumMistral | mistral-medium-latest |
262K | 262K | $0.40 | $2.00 | Mid-tier Mistral: 262K context, tool calling and structured output, balanced cost for production agents. |
| Magistral MediumMistral | magistral-medium-latest |
128K | 16K | $2.00 | $5.00 | Mistral's reasoning model: 128K context, chain-of-thought reasoning and tool calling. |
| GLM-5Zhipu AI | glm-5 |
200K | 128K | $1.00 | $3.20 | Zhipu's open-weights flagship: ~200K context, reasoning and tool calling, agentic-oriented. |
| GLM-4.7Zhipu AI | glm-4.7 |
200K | 128K | $0.60 | $2.20 | Open-weights GLM-4.7: ~200K context, reasoning, tool calling and structured output at low cost. |
| GLM-4.6Zhipu AI | glm-4.6 |
200K | 128K | $0.43 | $1.74 | Open-weights GLM-4.6: ~200K context, reasoning and tool calling, a low-cost agentic workhorse. |
| Kimi K2Moonshot AI | kimi-k2 |
262K | 262K | see provider | see provider | Moonshot's open-weights agentic model: 262K context, reasoning, tool calling and structured output. |
pricing unit USD per 1M tokens (input / output). This page is served by Claude Fable 5 — the top row built the site you're reading.
