LLM API Cheat Sheet
Quick reference for the major LLM provider APIs. Covers authentication, chat completions, function calling, streaming, and model selection across OpenAI, Anthropic, Google Gemini, Mistral, and Cohere.
Model Comparison
Flagship Models (as of early 2026)
| Provider | Model | Context Window | Strengths | Input $/1M tokens | Output $/1M tokens |
|---|---|---|---|---|---|
| OpenAI | GPT-4o | 128K | Multimodal, fast, versatile | $2.50 | $10.00 |
| OpenAI | o3 | 200K | Deep reasoning, math, code | $10.00 | $40.00 |
| OpenAI | GPT-4o mini | 128K | Cheapest, high throughput | $0.15 | $0.60 |
| Anthropic | Claude Opus 4 | 200K | Complex reasoning, agentic | $15.00 | $75.00 |
| Anthropic | Claude Sonnet 4 | 200K | Best balance | $3.00 | $15.00 |
| Anthropic | Claude Haiku 3.5 | 200K | Speed, cost, classification | $0.80 | $4.00 |
| Gemini 2.5 Pro | 1M | Huge context, multimodal | $1.25 | $10.00 | |
| Gemini 2.0 Flash | 1M | Fast, cheap, multimodal | $0.10 | $0.40 | |
| Mistral | Mistral Large | 128K | Strong reasoning, EU-hosted | $2.00 | $6.00 |
| Mistral | Mistral Small | 128K | Cost-effective | $0.10 | $0.30 |
| Cohere | Command R+ | 128K | RAG-optimized, multilingual | $2.50 | $10.00 |
| Cohere | Command R | 128K | Cost-effective RAG | $0.15 | $0.60 |
Pricing Changes
LLM pricing changes frequently. These prices are approximate as of early 2026. Always verify current pricing on each provider's website before making architectural decisions.
Capabilities Matrix
| Capability | OpenAI | Anthropic | Gemini | Mistral | Cohere |
|---|---|---|---|---|---|
| Chat completions | Yes | Yes | Yes | Yes | Yes |
| Function calling | Yes | Yes (tool use) | Yes | Yes | Yes |
| Structured output | Yes (JSON schema) | Yes (tool use) | Yes (JSON schema) | Yes (JSON mode) | Yes (JSON mode) |
| Vision (images) | Yes | Yes | Yes | Yes (Pixtral) | No |
| Audio input | Yes (Whisper) | No | Yes (native) | No | No |
| Streaming | Yes | Yes | Yes | Yes | Yes |
| Embeddings | Yes | No (use Voyage) | Yes | Yes | Yes |
| Batch API | Yes | Yes | Yes | Yes | No |
| Prompt caching | Automatic | Explicit | Implicit | No | No |
| Extended thinking | Yes (o-series) | Yes | Yes (thinking mode) | Yes (thinking mode) | No |
Authentication
# OpenAI
from openai import OpenAI
client = OpenAI(api_key="sk-...") # or OPENAI_API_KEY env var
# Anthropic
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...") # or ANTHROPIC_API_KEY env var
# Google Gemini
import google.generativeai as genai
genai.configure(api_key="...") # or GOOGLE_API_KEY env var
# Mistral
from mistralai import Mistral
client = Mistral(api_key="...") # or MISTRAL_API_KEY env var
# Cohere
import cohere
client = cohere.ClientV2(api_key="...") # or CO_API_KEY env var// OpenAI
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: "sk-..." });
// Anthropic
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({ apiKey: "sk-ant-..." });
// Google Gemini
import { GoogleGenerativeAI } from "@google/generative-ai";
const genai = new GoogleGenerativeAI("...");
// Mistral
import { Mistral } from "@mistralai/mistralai";
const mistral = new Mistral({ apiKey: "..." });Chat Completions
OpenAI
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain REST in one sentence."},
],
temperature=0.7,
max_tokens=256,
)
print(response.choices[0].message.content)Anthropic
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
system="You are a helpful assistant.", # System prompt is a separate parameter
messages=[
{"role": "user", "content": "Explain REST in one sentence."},
],
temperature=0.7,
)
print(response.content[0].text)Google Gemini
model = genai.GenerativeModel(
model_name="gemini-2.5-pro",
system_instruction="You are a helpful assistant.",
)
response = model.generate_content("Explain REST in one sentence.")
print(response.text)Mistral
response = client.chat.complete(
model="mistral-large-latest",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain REST in one sentence."},
],
temperature=0.7,
max_tokens=256,
)
print(response.choices[0].message.content)Cohere
response = client.chat(
model="command-r-plus",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain REST in one sentence."},
],
temperature=0.7,
max_tokens=256,
)
print(response.message.content[0].text)Side-by-Side Differences
| Aspect | OpenAI | Anthropic | Gemini | Mistral | Cohere |
|---|---|---|---|---|---|
| System prompt | In messages array | Separate system param | system_instruction on model | In messages array | In messages array |
| Response path | .choices[0].message.content | .content[0].text | .text | .choices[0].message.content | .message.content[0].text |
| Max tokens | Optional (default varies) | Required | Optional | Optional | Optional |
| Default temperature | 1.0 | 1.0 | 1.0 | 0.7 | 0.3 |
Streaming
OpenAI
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a haiku about APIs."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")Anthropic
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=256,
messages=[{"role": "user", "content": "Write a haiku about APIs."}],
) as stream:
for text in stream.text_stream:
print(text, end="")Google Gemini
model = genai.GenerativeModel("gemini-2.5-pro")
response = model.generate_content("Write a haiku about APIs.", stream=True)
for chunk in response:
print(chunk.text, end="")Mistral
stream = client.chat.stream(
model="mistral-large-latest",
messages=[{"role": "user", "content": "Write a haiku about APIs."}],
)
for event in stream:
if event.data.choices[0].delta.content:
print(event.data.choices[0].delta.content, end="")Function Calling / Tool Use
OpenAI
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
},
"required": ["location"],
},
},
}]
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Weather in Tokyo?"}],
tools=tools,
tool_choice="auto",
)
# Check for tool calls
if response.choices[0].message.tool_calls:
tc = response.choices[0].message.tool_calls[0]
print(f"Call: {tc.function.name}({tc.function.arguments})")Anthropic
tools = [{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": { # Note: input_schema, not parameters
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
},
"required": ["location"],
},
}]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
messages=[{"role": "user", "content": "Weather in Tokyo?"}],
tools=tools,
)
# Check for tool use
for block in response.content:
if block.type == "tool_use":
print(f"Call: {block.name}({block.input})")
print(f"Tool use ID: {block.id}")Google Gemini
from google.generativeai.types import FunctionDeclaration, Tool
get_weather = FunctionDeclaration(
name="get_weather",
description="Get current weather for a location",
parameters={
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
},
"required": ["location"],
},
)
model = genai.GenerativeModel(
model_name="gemini-2.5-pro",
tools=[Tool(function_declarations=[get_weather])],
)
response = model.generate_content("Weather in Tokyo?")
# Check for function calls in response.candidates[0].content.parts
for part in response.candidates[0].content.parts:
if hasattr(part, "function_call"):
print(f"Call: {part.function_call.name}({dict(part.function_call.args)})")Mistral
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"},
},
"required": ["location"],
},
},
}]
response = client.chat.complete(
model="mistral-large-latest",
messages=[{"role": "user", "content": "Weather in Tokyo?"}],
tools=tools,
tool_choice="auto",
)
# Same structure as OpenAI
if response.choices[0].message.tool_calls:
tc = response.choices[0].message.tool_calls[0]
print(f"Call: {tc.function.name}({tc.function.arguments})")Tool Calling Syntax Comparison
| Aspect | OpenAI | Anthropic | Gemini | Mistral |
|---|---|---|---|---|
| Schema key | parameters | input_schema | parameters | parameters |
| Tool wrapper | {"type": "function", "function": {...}} | {...} (flat) | FunctionDeclaration | {"type": "function", "function": {...}} |
| Response location | .tool_calls[].function | .content[] (type=tool_use) | .parts[].function_call | .tool_calls[].function |
| Tool result role | "tool" | "user" (with tool_result block) | "function" | "tool" |
| Parallel calls | Yes | Yes | Yes | Yes |
Tool Result Format: Anthropic Is Different
Anthropic requires tool results to be sent as a user message with a tool_result content block, not as a separate tool role:
# OpenAI / Mistral
{"role": "tool", "tool_call_id": "call_123", "content": "22°C, cloudy"}
# Anthropic
{"role": "user", "content": [
{"type": "tool_result", "tool_use_id": "toolu_123", "content": "22°C, cloudy"}
]}Structured Output
# OpenAI — guaranteed JSON schema conformance
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "List 3 programming languages with their types."}],
response_format={
"type": "json_schema",
"json_schema": {
"name": "languages",
"schema": {
"type": "object",
"properties": {
"languages": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"type": {"type": "string", "enum": ["compiled", "interpreted", "jit"]},
},
"required": ["name", "type"],
},
},
},
"required": ["languages"],
},
},
},
)
# Anthropic — use tool use for structured output
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
messages=[{"role": "user", "content": "List 3 programming languages with their types."}],
tools=[{
"name": "output_languages",
"description": "Output the list of programming languages",
"input_schema": {
"type": "object",
"properties": {
"languages": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"type": {"type": "string", "enum": ["compiled", "interpreted", "jit"]},
},
"required": ["name", "type"],
},
},
},
"required": ["languages"],
},
}],
tool_choice={"type": "tool", "name": "output_languages"}, # Force tool use
)
# Gemini — JSON schema
response = model.generate_content(
"List 3 programming languages with their types.",
generation_config=genai.GenerationConfig(
response_mime_type="application/json",
response_schema={...}, # JSON schema
),
)Embeddings
| Provider | Model | Dimensions | Max Tokens | Price (per 1M tokens) |
|---|---|---|---|---|
| OpenAI | text-embedding-3-large | 3072 (reducible) | 8191 | $0.13 |
| OpenAI | text-embedding-3-small | 1536 | 8191 | $0.02 |
| text-embedding-004 | 768 | 2048 | $0.006 | |
| Mistral | mistral-embed | 1024 | 8192 | $0.10 |
| Cohere | embed-v4.0 | 1024 | 512 | $0.10 |
| Voyage | voyage-3-large | 1024 | 32000 | $0.18 |
# OpenAI
resp = client.embeddings.create(model="text-embedding-3-small", input=["Hello world"])
vector = resp.data[0].embedding # list[float], length 1536
# Cohere
resp = client.embed(
model="embed-v4.0",
texts=["Hello world"],
input_type="search_document", # or "search_query"
embedding_types=["float"],
)
vector = resp.embeddings.float_[0]
# Mistral
resp = client.embeddings.create(model="mistral-embed", inputs=["Hello world"])
vector = resp.data[0].embedding
# Google
result = genai.embed_content(model="models/text-embedding-004", content="Hello world")
vector = result["embedding"]Error Handling Pattern
import time
def call_with_retry(fn, max_retries=3, base_delay=1.0):
"""Universal retry pattern for any LLM API."""
for attempt in range(max_retries + 1):
try:
return fn()
except Exception as e:
error_type = type(e).__name__
# Rate limit — always retry with backoff
if "rate" in error_type.lower() or "429" in str(e):
delay = base_delay * (2 ** attempt)
print(f"Rate limited. Retrying in {delay}s...")
time.sleep(delay)
continue
# Overloaded / server error — retry
if "overloaded" in str(e).lower() or "500" in str(e) or "529" in str(e):
delay = base_delay * (2 ** attempt)
time.sleep(delay)
continue
# Authentication, invalid request — do not retry
if "401" in str(e) or "400" in str(e) or "authentication" in str(e).lower():
raise
# Unknown error on last attempt
if attempt == max_retries:
raise
raise RuntimeError("Max retries exceeded")Quick Decision Guide
| I need... | Use |
|---|---|
| Best all-around model | GPT-4o or Claude Sonnet 4 |
| Cheapest for simple tasks | GPT-4o mini or Gemini 2.0 Flash |
| Longest context window | Gemini 2.5 Pro (1M tokens) |
| Best reasoning | o3 or Claude Opus 4 |
| EU data residency | Mistral (hosted in Europe) |
| Best RAG support | Cohere Command R+ |
| Image understanding | GPT-4o, Claude Sonnet 4, or Gemini |
| Audio processing | OpenAI Whisper or Gemini (native) |
| Cheapest embeddings | Google text-embedding-004 |
| Best embeddings quality | OpenAI text-embedding-3-large or Voyage 3 |
See Also
- OpenAI API Patterns — Deep dive into OpenAI-specific patterns
- Anthropic Claude API Patterns — Deep dive into Claude-specific patterns
- LLM Integration Patterns — Provider-agnostic architecture
- Embeddings Deep Dive — Embedding strategies beyond the API
- AI Agents Architecture — Building agents with tool calling
Test Yourself
What is the key difference in how Anthropic handles the system prompt vs OpenAI? Anthropic uses a separate
systemparameter; OpenAI puts it in the messages array withrole: "system".How does Anthropic return tool call results differently from OpenAI? Anthropic requires tool results as a
usermessage with atool_resultcontent block; OpenAI uses a separatetoolrole.What parameter is required for Anthropic but optional for OpenAI?
max_tokensis required for Anthropic.Which provider offers the largest context window (as of early 2026)? Google Gemini 2.5 Pro with 1M tokens.
How do you force Anthropic to use a specific tool for structured output?
tool_choice={"type": "tool", "name": "tool_name"}What is the cheapest embedding model listed? Google text-embedding-004 at $0.006 per 1M tokens.
What key name does Anthropic use for the tool schema (vs OpenAI's
parameters)?input_schemaHow do you access the response text from an Anthropic API call?
response.content[0].textWhat OpenAI feature guarantees the response matches a JSON schema? Structured output with
response_format: {"type": "json_schema", ...}What retry strategy should you use for rate limit (429) errors? Exponential backoff:
delay = base_delay * (2 ** attempt)
Common Gotchas
- Anthropic requires
max_tokens-- forgetting it raises an error. OpenAI and others default to a reasonable value, but Anthropic forces you to be explicit. - Tool result format differs between providers. OpenAI/Mistral use
role: "tool", Anthropic usesrole: "user"with atool_resultblock. Mixing formats causes silent failures. - LLM pricing changes frequently. Architectural decisions based on pricing can become outdated in weeks. Always check current pricing on each provider's website.
temperature=0does not guarantee determinism. Most providers say "mostly deterministic" at temperature 0 but do not guarantee identical outputs across requests or model versions.
One-Liner Summary
LLM APIs follow similar patterns across providers (chat completions, tool calling, streaming) but differ in subtle ways -- know each provider's system prompt placement, response paths, and tool calling syntax to switch between them effortlessly.