Skip to content
Unverified — AI-generated content. Help verify this page

LLM API Cheat Sheet

Quick reference for the major LLM provider APIs. Covers authentication, chat completions, function calling, streaming, and model selection across OpenAI, Anthropic, Google Gemini, Mistral, and Cohere.


Model Comparison

Flagship Models (as of early 2026)

ProviderModelContext WindowStrengthsInput $/1M tokensOutput $/1M tokens
OpenAIGPT-4o128KMultimodal, fast, versatile$2.50$10.00
OpenAIo3200KDeep reasoning, math, code$10.00$40.00
OpenAIGPT-4o mini128KCheapest, high throughput$0.15$0.60
AnthropicClaude Opus 4200KComplex reasoning, agentic$15.00$75.00
AnthropicClaude Sonnet 4200KBest balance$3.00$15.00
AnthropicClaude Haiku 3.5200KSpeed, cost, classification$0.80$4.00
GoogleGemini 2.5 Pro1MHuge context, multimodal$1.25$10.00
GoogleGemini 2.0 Flash1MFast, cheap, multimodal$0.10$0.40
MistralMistral Large128KStrong reasoning, EU-hosted$2.00$6.00
MistralMistral Small128KCost-effective$0.10$0.30
CohereCommand R+128KRAG-optimized, multilingual$2.50$10.00
CohereCommand R128KCost-effective RAG$0.15$0.60

Pricing Changes

LLM pricing changes frequently. These prices are approximate as of early 2026. Always verify current pricing on each provider's website before making architectural decisions.

Capabilities Matrix

CapabilityOpenAIAnthropicGeminiMistralCohere
Chat completionsYesYesYesYesYes
Function callingYesYes (tool use)YesYesYes
Structured outputYes (JSON schema)Yes (tool use)Yes (JSON schema)Yes (JSON mode)Yes (JSON mode)
Vision (images)YesYesYesYes (Pixtral)No
Audio inputYes (Whisper)NoYes (native)NoNo
StreamingYesYesYesYesYes
EmbeddingsYesNo (use Voyage)YesYesYes
Batch APIYesYesYesYesNo
Prompt cachingAutomaticExplicitImplicitNoNo
Extended thinkingYes (o-series)YesYes (thinking mode)Yes (thinking mode)No

Authentication

python
# OpenAI
from openai import OpenAI
client = OpenAI(api_key="sk-...")  # or OPENAI_API_KEY env var

# Anthropic
import anthropic
client = anthropic.Anthropic(api_key="sk-ant-...")  # or ANTHROPIC_API_KEY env var

# Google Gemini
import google.generativeai as genai
genai.configure(api_key="...")  # or GOOGLE_API_KEY env var

# Mistral
from mistralai import Mistral
client = Mistral(api_key="...")  # or MISTRAL_API_KEY env var

# Cohere
import cohere
client = cohere.ClientV2(api_key="...")  # or CO_API_KEY env var
typescript
// OpenAI
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: "sk-..." });

// Anthropic
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({ apiKey: "sk-ant-..." });

// Google Gemini
import { GoogleGenerativeAI } from "@google/generative-ai";
const genai = new GoogleGenerativeAI("...");

// Mistral
import { Mistral } from "@mistralai/mistralai";
const mistral = new Mistral({ apiKey: "..." });

Chat Completions

OpenAI

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain REST in one sentence."},
    ],
    temperature=0.7,
    max_tokens=256,
)
print(response.choices[0].message.content)

Anthropic

python
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    system="You are a helpful assistant.",  # System prompt is a separate parameter
    messages=[
        {"role": "user", "content": "Explain REST in one sentence."},
    ],
    temperature=0.7,
)
print(response.content[0].text)

Google Gemini

python
model = genai.GenerativeModel(
    model_name="gemini-2.5-pro",
    system_instruction="You are a helpful assistant.",
)
response = model.generate_content("Explain REST in one sentence.")
print(response.text)

Mistral

python
response = client.chat.complete(
    model="mistral-large-latest",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain REST in one sentence."},
    ],
    temperature=0.7,
    max_tokens=256,
)
print(response.choices[0].message.content)

Cohere

python
response = client.chat(
    model="command-r-plus",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain REST in one sentence."},
    ],
    temperature=0.7,
    max_tokens=256,
)
print(response.message.content[0].text)

Side-by-Side Differences

AspectOpenAIAnthropicGeminiMistralCohere
System promptIn messages arraySeparate system paramsystem_instruction on modelIn messages arrayIn messages array
Response path.choices[0].message.content.content[0].text.text.choices[0].message.content.message.content[0].text
Max tokensOptional (default varies)RequiredOptionalOptionalOptional
Default temperature1.01.01.00.70.3

Streaming

OpenAI

python
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about APIs."}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Anthropic

python
with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    messages=[{"role": "user", "content": "Write a haiku about APIs."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

Google Gemini

python
model = genai.GenerativeModel("gemini-2.5-pro")
response = model.generate_content("Write a haiku about APIs.", stream=True)
for chunk in response:
    print(chunk.text, end="")

Mistral

python
stream = client.chat.stream(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Write a haiku about APIs."}],
)
for event in stream:
    if event.data.choices[0].delta.content:
        print(event.data.choices[0].delta.content, end="")

Function Calling / Tool Use

OpenAI

python
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"},
            },
            "required": ["location"],
        },
    },
}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto",
)

# Check for tool calls
if response.choices[0].message.tool_calls:
    tc = response.choices[0].message.tool_calls[0]
    print(f"Call: {tc.function.name}({tc.function.arguments})")

Anthropic

python
tools = [{
    "name": "get_weather",
    "description": "Get current weather for a location",
    "input_schema": {  # Note: input_schema, not parameters
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"},
        },
        "required": ["location"],
    },
}]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
)

# Check for tool use
for block in response.content:
    if block.type == "tool_use":
        print(f"Call: {block.name}({block.input})")
        print(f"Tool use ID: {block.id}")

Google Gemini

python
from google.generativeai.types import FunctionDeclaration, Tool

get_weather = FunctionDeclaration(
    name="get_weather",
    description="Get current weather for a location",
    parameters={
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"},
        },
        "required": ["location"],
    },
)

model = genai.GenerativeModel(
    model_name="gemini-2.5-pro",
    tools=[Tool(function_declarations=[get_weather])],
)

response = model.generate_content("Weather in Tokyo?")
# Check for function calls in response.candidates[0].content.parts
for part in response.candidates[0].content.parts:
    if hasattr(part, "function_call"):
        print(f"Call: {part.function_call.name}({dict(part.function_call.args)})")

Mistral

python
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"},
            },
            "required": ["location"],
        },
    },
}]

response = client.chat.complete(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto",
)

# Same structure as OpenAI
if response.choices[0].message.tool_calls:
    tc = response.choices[0].message.tool_calls[0]
    print(f"Call: {tc.function.name}({tc.function.arguments})")

Tool Calling Syntax Comparison

AspectOpenAIAnthropicGeminiMistral
Schema keyparametersinput_schemaparametersparameters
Tool wrapper{"type": "function", "function": {...}}{...} (flat)FunctionDeclaration{"type": "function", "function": {...}}
Response location.tool_calls[].function.content[] (type=tool_use).parts[].function_call.tool_calls[].function
Tool result role"tool""user" (with tool_result block)"function""tool"
Parallel callsYesYesYesYes

Tool Result Format: Anthropic Is Different

Anthropic requires tool results to be sent as a user message with a tool_result content block, not as a separate tool role:

python
# OpenAI / Mistral
{"role": "tool", "tool_call_id": "call_123", "content": "22°C, cloudy"}

# Anthropic
{"role": "user", "content": [
    {"type": "tool_result", "tool_use_id": "toolu_123", "content": "22°C, cloudy"}
]}

Structured Output

python
# OpenAI — guaranteed JSON schema conformance
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "List 3 programming languages with their types."}],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "languages",
            "schema": {
                "type": "object",
                "properties": {
                    "languages": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "type": {"type": "string", "enum": ["compiled", "interpreted", "jit"]},
                            },
                            "required": ["name", "type"],
                        },
                    },
                },
                "required": ["languages"],
            },
        },
    },
)

# Anthropic — use tool use for structured output
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    messages=[{"role": "user", "content": "List 3 programming languages with their types."}],
    tools=[{
        "name": "output_languages",
        "description": "Output the list of programming languages",
        "input_schema": {
            "type": "object",
            "properties": {
                "languages": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "type": {"type": "string", "enum": ["compiled", "interpreted", "jit"]},
                        },
                        "required": ["name", "type"],
                    },
                },
            },
            "required": ["languages"],
        },
    }],
    tool_choice={"type": "tool", "name": "output_languages"},  # Force tool use
)

# Gemini — JSON schema
response = model.generate_content(
    "List 3 programming languages with their types.",
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json",
        response_schema={...},  # JSON schema
    ),
)

Embeddings

ProviderModelDimensionsMax TokensPrice (per 1M tokens)
OpenAItext-embedding-3-large3072 (reducible)8191$0.13
OpenAItext-embedding-3-small15368191$0.02
Googletext-embedding-0047682048$0.006
Mistralmistral-embed10248192$0.10
Cohereembed-v4.01024512$0.10
Voyagevoyage-3-large102432000$0.18
python
# OpenAI
resp = client.embeddings.create(model="text-embedding-3-small", input=["Hello world"])
vector = resp.data[0].embedding  # list[float], length 1536

# Cohere
resp = client.embed(
    model="embed-v4.0",
    texts=["Hello world"],
    input_type="search_document",  # or "search_query"
    embedding_types=["float"],
)
vector = resp.embeddings.float_[0]

# Mistral
resp = client.embeddings.create(model="mistral-embed", inputs=["Hello world"])
vector = resp.data[0].embedding

# Google
result = genai.embed_content(model="models/text-embedding-004", content="Hello world")
vector = result["embedding"]

Error Handling Pattern

python
import time

def call_with_retry(fn, max_retries=3, base_delay=1.0):
    """Universal retry pattern for any LLM API."""
    for attempt in range(max_retries + 1):
        try:
            return fn()
        except Exception as e:
            error_type = type(e).__name__

            # Rate limit — always retry with backoff
            if "rate" in error_type.lower() or "429" in str(e):
                delay = base_delay * (2 ** attempt)
                print(f"Rate limited. Retrying in {delay}s...")
                time.sleep(delay)
                continue

            # Overloaded / server error — retry
            if "overloaded" in str(e).lower() or "500" in str(e) or "529" in str(e):
                delay = base_delay * (2 ** attempt)
                time.sleep(delay)
                continue

            # Authentication, invalid request — do not retry
            if "401" in str(e) or "400" in str(e) or "authentication" in str(e).lower():
                raise

            # Unknown error on last attempt
            if attempt == max_retries:
                raise

    raise RuntimeError("Max retries exceeded")

Quick Decision Guide

I need...Use
Best all-around modelGPT-4o or Claude Sonnet 4
Cheapest for simple tasksGPT-4o mini or Gemini 2.0 Flash
Longest context windowGemini 2.5 Pro (1M tokens)
Best reasoningo3 or Claude Opus 4
EU data residencyMistral (hosted in Europe)
Best RAG supportCohere Command R+
Image understandingGPT-4o, Claude Sonnet 4, or Gemini
Audio processingOpenAI Whisper or Gemini (native)
Cheapest embeddingsGoogle text-embedding-004
Best embeddings qualityOpenAI text-embedding-3-large or Voyage 3

See Also


Test Yourself
  1. What is the key difference in how Anthropic handles the system prompt vs OpenAI? Anthropic uses a separate system parameter; OpenAI puts it in the messages array with role: "system".

  2. How does Anthropic return tool call results differently from OpenAI? Anthropic requires tool results as a user message with a tool_result content block; OpenAI uses a separate tool role.

  3. What parameter is required for Anthropic but optional for OpenAI?max_tokens is required for Anthropic.

  4. Which provider offers the largest context window (as of early 2026)? Google Gemini 2.5 Pro with 1M tokens.

  5. How do you force Anthropic to use a specific tool for structured output?tool_choice={"type": "tool", "name": "tool_name"}

  6. What is the cheapest embedding model listed? Google text-embedding-004 at $0.006 per 1M tokens.

  7. What key name does Anthropic use for the tool schema (vs OpenAI's parameters)?input_schema

  8. How do you access the response text from an Anthropic API call?response.content[0].text

  9. What OpenAI feature guarantees the response matches a JSON schema? Structured output with response_format: {"type": "json_schema", ...}

  10. What retry strategy should you use for rate limit (429) errors? Exponential backoff: delay = base_delay * (2 ** attempt)

Common Gotchas

  • Anthropic requires max_tokens -- forgetting it raises an error. OpenAI and others default to a reasonable value, but Anthropic forces you to be explicit.
  • Tool result format differs between providers. OpenAI/Mistral use role: "tool", Anthropic uses role: "user" with a tool_result block. Mixing formats causes silent failures.
  • LLM pricing changes frequently. Architectural decisions based on pricing can become outdated in weeks. Always check current pricing on each provider's website.
  • temperature=0 does not guarantee determinism. Most providers say "mostly deterministic" at temperature 0 but do not guarantee identical outputs across requests or model versions.

One-Liner Summary

LLM APIs follow similar patterns across providers (chat completions, tool calling, streaming) but differ in subtle ways -- know each provider's system prompt placement, response paths, and tool calling syntax to switch between them effortlessly.

"What I cannot create, I do not understand." — Richard Feynman