Skip to content
Unverified — AI-generated content. Help verify this page

LangChain vs LlamaIndex vs Raw API

AI application frameworks promise to accelerate development by abstracting away the complexity of LLM integrations. But abstraction comes with a cost: hidden behavior, debugging difficulty, and performance overhead. This comparison evaluates when LangChain, LlamaIndex, and raw API calls are each the right choice — and when frameworks actively hurt you.

Overview

ToolFirst ReleasePrimary FocusAbstraction Level
LangChainOct 2022General-purpose LLM application frameworkHigh (chains, agents, tools, memory)
LlamaIndexNov 2022Data indexing and retrieval for LLM appsHigh (indexes, query engines, RAG pipelines)
Raw APIN/ADirect HTTP/SDK calls to LLM providersNone (full control)

The Core Question

Frameworks save time when your use case matches their abstractions. They waste time when your use case does not. Most production LLM applications start with a framework and eventually rewrite the critical path with raw API calls. Understanding where each approach excels helps you make that transition deliberate rather than painful.

Architecture Comparison

Feature Matrix

FeatureLangChainLlamaIndexRaw API
LLM provider support70+ providers40+ providers1 per integration
RAG pipelineRetrievers + chainsFirst-class (primary focus)Build from scratch
Document loaders160+ loaders (PDF, HTML, DB, etc.)160+ loadersWrite your own
Chunking strategiesRecursiveCharacterTextSplitter, etc.SentenceSplitter, SemanticSplitter, etc.Write your own
Vector store integrations50+ (Pinecone, Chroma, Weaviate, etc.)40+Direct SDK calls
AgentsReAct, Plan-and-Execute, OpenAI FunctionsSubQuestion, RouterQuery, ReActBuild from scratch
Memory / conversationBufferMemory, SummaryMemory, VectorMemoryChatMemory, conversation historySession storage (DB/Redis)
Structured outputOutput parsers, Pydantic modelsPydantic programsZod / JSON schema
StreamingStreaming callbacksStreaming responseNative SSE handling
ObservabilityLangSmith (first-party)LlamaTrace, ArizeCustom logging
EvaluationLangSmith evaluationsLlamaIndex evaluationsCustom eval scripts
CachingLLM cache (Redis, SQLite, etc.)Built-in response cacheYour caching layer
TypeScript supportLangChain.js (full parity)LlamaIndex.TSNative
Bundle size~2-5 MB (many deps)~1-3 MB0 (just SDK)
Learning curveHigh (many abstractions)Medium (focused on RAG)Low (just HTTP calls)
Debugging difficultyHigh (deep call stacks)MediumLow (you wrote it)
Upgrade stabilityFrequent breaking changesModerate breaking changesProvider SDK stability
Production readinessImproving (LangGraph)GoodFull control

Code & Config Comparison

Simple LLM Call

LangChain:

typescript
import { ChatOpenAI } from '@langchain/openai';
import { HumanMessage, SystemMessage } from '@langchain/core/messages';

const model = new ChatOpenAI({
  modelName: 'gpt-4o',
  temperature: 0.7,
});

const response = await model.invoke([
  new SystemMessage('You are a helpful assistant.'),
  new HumanMessage('What is the capital of France?'),
]);

console.log(response.content);

LlamaIndex:

typescript
import { OpenAI } from 'llamaindex';

const llm = new OpenAI({
  model: 'gpt-4o',
  temperature: 0.7,
});

const response = await llm.chat({
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' },
  ],
});

console.log(response.message.content);

Raw API (OpenAI SDK):

typescript
import OpenAI from 'openai';

const openai = new OpenAI();

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  temperature: 0.7,
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' },
  ],
});

console.log(response.choices[0].message.content);

Observation

For a simple LLM call, all three approaches require roughly the same amount of code. The framework adds no value here — it only adds a dependency. Frameworks earn their keep in more complex scenarios.

RAG Pipeline (Retrieval-Augmented Generation)

LangChain:

typescript
import { ChatOpenAI, OpenAIEmbeddings } from '@langchain/openai';
import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters';
import { MemoryVectorStore } from 'langchain/vectorstores/memory';
import { createRetrievalChain } from 'langchain/chains/retrieval';
import { createStuffDocumentsChain } from 'langchain/chains/combine_documents';
import { ChatPromptTemplate } from '@langchain/core/prompts';
import { PDFLoader } from '@langchain/community/document_loaders/fs/pdf';

// 1. Load documents
const loader = new PDFLoader('./docs/handbook.pdf');
const docs = await loader.load();

// 2. Split into chunks
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});
const chunks = await splitter.splitDocuments(docs);

// 3. Create vector store
const embeddings = new OpenAIEmbeddings();
const vectorStore = await MemoryVectorStore.fromDocuments(chunks, embeddings);

// 4. Create retrieval chain
const llm = new ChatOpenAI({ modelName: 'gpt-4o' });
const prompt = ChatPromptTemplate.fromTemplate(`
  Answer based on the context. If you don't know, say so.
  Context: {context}
  Question: {input}
`);

const documentChain = await createStuffDocumentsChain({ llm, prompt });
const retrievalChain = await createRetrievalChain({
  combineDocsChain: documentChain,
  retriever: vectorStore.asRetriever({ k: 5 }),
});

// 5. Query
const result = await retrievalChain.invoke({
  input: 'What is the vacation policy?',
});
console.log(result.answer);

LlamaIndex:

typescript
import {
  VectorStoreIndex,
  SimpleDirectoryReader,
  OpenAI,
  Settings,
} from 'llamaindex';

// 1. Configure
Settings.llm = new OpenAI({ model: 'gpt-4o' });

// 2. Load and index documents (handles chunking + embedding)
const documents = await new SimpleDirectoryReader().loadData('./docs');
const index = await VectorStoreIndex.fromDocuments(documents);

// 3. Query
const queryEngine = index.asQueryEngine({ similarityTopK: 5 });
const result = await queryEngine.query('What is the vacation policy?');
console.log(result.toString());

Raw API:

typescript
import OpenAI from 'openai';
import { readFileSync } from 'fs';
import pdfParse from 'pdf-parse';

const openai = new OpenAI();

// 1. Load and parse PDF
const pdfBuffer = readFileSync('./docs/handbook.pdf');
const pdfData = await pdfParse(pdfBuffer);
const text = pdfData.text;

// 2. Chunk the text
function chunkText(text: string, size = 1000, overlap = 200): string[] {
  const chunks: string[] = [];
  let start = 0;
  while (start < text.length) {
    chunks.push(text.slice(start, start + size));
    start += size - overlap;
  }
  return chunks;
}
const chunks = chunkText(text);

// 3. Generate embeddings
const embeddingResponse = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: chunks,
});
const embeddings = embeddingResponse.data.map((d) => d.embedding);

// 4. Store in memory (or use pgvector/Pinecone)
const index = chunks.map((chunk, i) => ({
  text: chunk,
  embedding: embeddings[i],
}));

// 5. Query: embed the question and find similar chunks
function cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0, normA = 0, normB = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    normA += a[i] * a[i];
    normB += b[i] * b[i];
  }
  return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}

const queryEmbedding = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'What is the vacation policy?',
});

const queryVec = queryEmbedding.data[0].embedding;
const ranked = index
  .map((item) => ({ ...item, score: cosineSimilarity(queryVec, item.embedding) }))
  .sort((a, b) => b.score - a.score)
  .slice(0, 5);

// 6. Generate answer with context
const context = ranked.map((r) => r.text).join('\n\n');
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'system', content: 'Answer based on the context. If you do not know, say so.' },
    { role: 'user', content: `Context:\n${context}\n\nQuestion: What is the vacation policy?` },
  ],
});

console.log(response.choices[0].message.content);

RAG Complexity

The raw API approach for RAG is 60+ lines vs LlamaIndex's 8 lines. This is where frameworks genuinely shine — RAG involves document loading, chunking, embedding, vector storage, retrieval, and synthesis. Writing all of this from scratch is educational but impractical for production.

Agent with Tool Use

LangChain (LangGraph):

typescript
import { ChatOpenAI } from '@langchain/openai';
import { createReactAgent } from '@langchain/langgraph/prebuilt';
import { tool } from '@langchain/core/tools';
import { z } from 'zod';

const searchTool = tool(
  async ({ query }) => {
    // Your search implementation
    const results = await fetch(`https://api.search.com?q=${query}`);
    return await results.text();
  },
  {
    name: 'web_search',
    description: 'Search the web for current information',
    schema: z.object({ query: z.string() }),
  }
);

const calculatorTool = tool(
  async ({ expression }) => {
    return String(eval(expression)); // simplified
  },
  {
    name: 'calculator',
    description: 'Evaluate mathematical expressions',
    schema: z.object({ expression: z.string() }),
  }
);

const agent = createReactAgent({
  llm: new ChatOpenAI({ modelName: 'gpt-4o' }),
  tools: [searchTool, calculatorTool],
});

const result = await agent.invoke({
  messages: [{ role: 'user', content: 'What is the population of Tokyo times 2?' }],
});

Raw API (Tool Use Loop):

typescript
import OpenAI from 'openai';

const openai = new OpenAI();

const tools: OpenAI.ChatCompletionTool[] = [
  {
    type: 'function',
    function: {
      name: 'web_search',
      description: 'Search the web for current information',
      parameters: {
        type: 'object',
        properties: { query: { type: 'string' } },
        required: ['query'],
      },
    },
  },
  {
    type: 'function',
    function: {
      name: 'calculator',
      description: 'Evaluate mathematical expressions',
      parameters: {
        type: 'object',
        properties: { expression: { type: 'string' } },
        required: ['expression'],
      },
    },
  },
];

async function executeTool(name: string, args: Record<string, string>): Promise<string> {
  switch (name) {
    case 'web_search':
      return fetch(`https://api.search.com?q=${args.query}`).then(r => r.text());
    case 'calculator':
      return String(eval(args.expression));
    default:
      return `Unknown tool: ${name}`;
  }
}

// Agent loop
const messages: OpenAI.ChatCompletionMessageParam[] = [
  { role: 'user', content: 'What is the population of Tokyo times 2?' },
];

const MAX_ITERATIONS = 10;
for (let i = 0; i < MAX_ITERATIONS; i++) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages,
    tools,
  });

  const choice = response.choices[0];
  messages.push(choice.message);

  if (choice.finish_reason === 'stop') {
    console.log(choice.message.content);
    break;
  }

  if (choice.message.tool_calls) {
    for (const toolCall of choice.message.tool_calls) {
      const args = JSON.parse(toolCall.function.arguments);
      const result = await executeTool(toolCall.function.name, args);
      messages.push({
        role: 'tool',
        tool_call_id: toolCall.id,
        content: result,
      });
    }
  }
}

Agent Loop Is Simple

The raw API agent loop is ~50 lines. Once you understand the pattern (call LLM, check for tool calls, execute tools, feed results back), it is straightforward to implement without a framework. The framework advantage is in pre-built tools, error handling, and observability — not the core loop itself.

Performance

Overhead Comparison

MetricLangChainLlamaIndexRaw API
npm install size~50-100 MB (with deps)~30-60 MB (with deps)~1-5 MB (provider SDK)
Import time500-1500ms300-800ms50-100ms
Simple LLM call overhead+5-15ms+3-10ms0ms (baseline)
RAG query overhead+20-50ms+15-30ms0ms (baseline)
Memory usage (idle)80-150 MB50-100 MB20-40 MB
Cold start (serverless)2-5s1-3s200-500ms

Bundle Size Impact (Frontend/Edge)

DeploymentLangChainLlamaIndexRaw API
Vercel Edge FunctionOften too largeOften too largeFits easily
Cloudflare WorkerDoes not fit (10 MB limit)Does not fitFits easily
Lambda (Node.js)Works (watch cold start)WorksNo issues
BrowserNot recommendedNot recommendedMinimal footprint

Serverless Cold Starts

LangChain and LlamaIndex add 1-4 seconds to serverless cold starts due to large dependency trees. For latency-sensitive APIs on serverless platforms, raw API calls can reduce p99 latency by 50-80% on cold starts.

Developer Experience

Strengths

LangChain:

  • Widest ecosystem: 70+ LLM providers, 160+ document loaders, 50+ vector stores
  • LangGraph for complex stateful agent workflows
  • LangSmith for production observability, tracing, and evaluation
  • LCEL (LangChain Expression Language) for composable chains
  • Largest community: most tutorials, examples, and Stack Overflow answers

LlamaIndex:

  • Best-in-class RAG: purpose-built for indexing and retrieval
  • Advanced chunking: sentence-based, semantic, hierarchical
  • Query planning: SubQuestionQueryEngine, RouterQueryEngine
  • Cleaner API than LangChain for data-centric use cases
  • LlamaParse for complex document parsing (tables, charts)

Raw API:

  • Zero abstraction tax: you understand every line of code
  • Smallest bundle, fastest cold starts, lowest memory
  • No dependency upgrade headaches or breaking changes
  • Full control over retries, caching, error handling, and observability
  • Easiest to debug (your code, your stack traces)

Pain Points

ApproachKey Frustration
LangChainFrequent breaking changes; over-abstracted (chains of chains of chains); difficult to debug (deep call stacks); heavy dependency tree
LlamaIndexTightly coupled to RAG patterns; less flexible for non-RAG use cases; TypeScript SDK lags behind Python
Raw APIMust build everything yourself (RAG, agents, memory, evaluation); no pre-built document loaders; no observability out of the box

Framework Maturity

DimensionLangChainLlamaIndexRaw API
API stabilityLow (frequent breaking changes)MediumHigh (provider SDK versioned)
Documentation qualityExtensive but disorganizedGood and focusedProvider docs (excellent)
Community sizeLargestLargeN/A
Production deploymentsGrowingGrowingDominant
Enterprise supportLangSmith (paid)LlamaCloud (paid)Provider support

When to Use Which

Decision Summary

ScenarioRecommended Approach
Simple chatbot / Q&ARaw API
RAG over complex documentsLlamaIndex
RAG over simple textRaw API + pgvector
Multi-step agent with many toolsLangGraph (LangChain)
Simple tool-use loopRaw API
Prototype / hackathonLangChain (fastest to demo)
Production API (latency-sensitive)Raw API
Evaluating multiple LLM providersLangChain (swap providers easily)
Data pipeline: ingest, index, queryLlamaIndex
Edge / serverless deploymentRaw API (smallest bundle)

Migration

LangChain to Raw API

typescript
// Step 1: Identify what LangChain is doing for you
// Common LangChain patterns → Raw API equivalents:

// LangChain ChatOpenAI → OpenAI SDK
// Before:
// const model = new ChatOpenAI({ modelName: 'gpt-4o' });
// const response = await model.invoke([new HumanMessage('Hello')]);

// After:
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});

// LangChain PromptTemplate → Template literal
// Before:
// const prompt = ChatPromptTemplate.fromTemplate('Translate {text} to {language}');
// const formatted = await prompt.invoke({ text: 'hello', language: 'French' });

// After:
function buildPrompt(text: string, language: string): string {
  return `Translate "${text}" to ${language}`;
}

// LangChain Memory → Your own session store
// Before:
// const memory = new BufferMemory();

// After:
const sessions = new Map<string, OpenAI.ChatCompletionMessageParam[]>();
function getHistory(sessionId: string) {
  return sessions.get(sessionId) || [];
}
function addMessage(sessionId: string, msg: OpenAI.ChatCompletionMessageParam) {
  const history = getHistory(sessionId);
  history.push(msg);
  sessions.set(sessionId, history);
}

// LangChain Retriever → Direct vector DB query
// Before:
// const retriever = vectorStore.asRetriever({ k: 5 });
// const docs = await retriever.invoke('my query');

// After:
const queryEmbedding = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'my query',
});
// Query your vector DB directly (pgvector, Pinecone, etc.)
const docs = await db.query(
  'SELECT content FROM documents ORDER BY embedding <=> $1 LIMIT 5',
  [queryEmbedding.data[0].embedding]
);

LlamaIndex to Raw API

typescript
// LlamaIndex VectorStoreIndex → pgvector + OpenAI SDK

// Before (LlamaIndex):
// const index = await VectorStoreIndex.fromDocuments(documents);
// const queryEngine = index.asQueryEngine();
// const result = await queryEngine.query('my question');

// After (Raw API + pgvector):
// 1. Parse documents yourself (pdf-parse, cheerio, etc.)
// 2. Chunk with a simple function
// 3. Embed with OpenAI
// 4. Store in pgvector
// 5. Query with SQL + cosine similarity
// 6. Synthesize response with LLM

// This is more code but gives you:
// - Full control over chunking strategy
// - Direct SQL access to your vector data
// - No framework cold start overhead
// - Easier debugging and observability

Raw API to LangChain (When Complexity Grows)

typescript
// When your raw API code has grown to include:
// - Multiple tool definitions
// - Complex state machines
// - Multi-turn conversation management
// - Parallel tool execution
// - Human-in-the-loop approval
//
// Consider migrating critical paths to LangGraph:

import { StateGraph, MessagesAnnotation } from '@langchain/langgraph';

const workflow = new StateGraph(MessagesAnnotation)
  .addNode('agent', agentNode)
  .addNode('tools', toolNode)
  .addEdge('__start__', 'agent')
  .addConditionalEdges('agent', shouldContinue)
  .addEdge('tools', 'agent');

const app = workflow.compile();
// LangGraph handles the state machine,
// tool routing, and checkpointing for you

Migration Strategy

Start with raw API calls for v1. As your application grows, identify the specific patterns where a framework adds genuine value (complex RAG, stateful agents, multi-provider support). Adopt the framework only for those specific patterns, not wholesale. This "framework as spice, not the main course" approach gives you the best of both worlds.

Verdict

LangChain is the most comprehensive LLM application framework. Its strength is breadth: 70+ providers, 160+ document loaders, and the LangGraph state machine for complex agents. Its weakness is abstraction overhead, frequent breaking changes, and the difficulty of debugging deep call chains. Use LangChain when you need its ecosystem breadth or LangGraph's stateful agent orchestration.

LlamaIndex is the best tool for RAG applications. It was purpose-built for the "ingest, index, query" pipeline and excels at document parsing, intelligent chunking, and query planning. If your primary use case is "let users ask questions about my documents," LlamaIndex is the most productive choice.

Raw API calls are the right default for production applications. The overhead of LLM frameworks is not justified for simple chat completions, basic tool use, or straightforward RAG. A 50-line agent loop and a direct pgvector query will outperform any framework on latency, bundle size, and debuggability.

Bottom Line

Do not use a framework by default. Start with raw API calls. Add LlamaIndex when you need sophisticated RAG (complex documents, query planning, multi-index routing). Add LangChain/LangGraph when you need stateful multi-step agents with tool orchestration. The best LLM applications use frameworks surgically — for the specific patterns where they add genuine value — not as a wholesale abstraction layer over simple API calls.

Which Would You Choose?

Scenario 1: You are building a chatbot that answers customer questions by searching through 10,000 PDF documents containing product manuals, legal contracts, and technical specifications with tables and charts.

Recommendation: LlamaIndex

LlamaIndex is purpose-built for this exact use case. LlamaParse handles complex PDF parsing (tables, charts, multi-column layouts), semantic chunking preserves document meaning, and the query engine handles retrieval and synthesis. Building this from scratch with raw APIs would require weeks of custom document parsing and chunking logic.

Scenario 2: You are adding a simple AI chat feature to your SaaS product. Users send a message, the LLM responds with text. No RAG, no tools, no agents. You deploy on Cloudflare Workers with a 10 MB bundle limit.

Recommendation: Raw API

A simple chat completion is 10-15 lines of code with the OpenAI SDK (~1 MB). LangChain would add 50-100 MB of dependencies that do not fit in a Worker and add no value for a simple request-response pattern. Do not use a framework when you do not need one.

Scenario 3: You are building a complex AI agent that takes a user's goal, breaks it into subtasks, uses 8 different tools (web search, SQL queries, API calls, file operations), handles errors with retries, requires human approval for destructive actions, and maintains state across sessions.

Recommendation: LangGraph (LangChain)

LangGraph's state machine model is designed for exactly this: multi-step agents with branching logic, tool orchestration, error handling, human-in-the-loop approval gates, and persistent checkpointing. Building this from scratch requires implementing your own state machine, which LangGraph handles out of the box.

Common Misconceptions

  • "You need LangChain to build any LLM app" — Most LLM applications are simple chat completions or basic RAG. LangChain adds value only for complex orchestration patterns. Start without it and add it when you hit a specific pain point.
  • "LlamaIndex is only for RAG" — LlamaIndex also supports agents (SubQuestion query engines, router query engines), but its primary strength is data indexing and retrieval. For agent-heavy workloads, LangGraph is a better fit.
  • "Raw API means writing everything from scratch" — The Vercel AI SDK provides streaming, tool calling, structured output, and multi-provider support without the heavy abstraction of LangChain. It is a middle ground between raw API calls and a full framework.
  • "LangChain's breaking changes mean it is unreliable" — LangChain has stabilized significantly with the LangChain v0.2+ / LangGraph split. Core APIs are more stable now, though the pace of change is still higher than traditional libraries.

Real Migration Stories

Many startups: LangChain to raw API — A common pattern in the LLM ecosystem: teams start with LangChain for rapid prototyping, then rewrite the critical path with raw API calls when they need better performance, debugging, and control. The framework was valuable for learning but became an obstacle in production.

Perplexity AI: Custom RAG pipeline — Perplexity, one of the most successful RAG applications, built their retrieval pipeline from scratch rather than using LlamaIndex or LangChain. Their custom pipeline gives them full control over ranking, citation extraction, and streaming — demonstrating that the most demanding RAG applications often outgrow framework abstractions.

Quiz

1. Why do LangChain and LlamaIndex add 1-5 seconds to serverless cold starts?

Both frameworks have large dependency trees (50-100 MB installed). Serverless platforms must load all dependencies into memory on cold start. Raw API calls with just the provider SDK (1-5 MB) cold-start 5-10x faster.

2. What is the "framework as spice" approach?

Start with raw API calls. Use a framework (LangChain, LlamaIndex) only for specific patterns where it adds genuine value — complex RAG, stateful agents, multi-provider orchestration. Do not wrap your entire application in a framework when only one component needs it.

3. How does LlamaIndex's VectorStoreIndex.fromDocuments() compare to building RAG from scratch?

LlamaIndex's one-liner handles document parsing, chunking (sentence splitting), embedding generation, vector storage, and index creation. The raw API equivalent requires 50-60 lines covering PDF parsing, custom chunking, OpenAI embedding calls, vector storage, and similarity search.

4. What is LangGraph, and how does it relate to LangChain?

LangGraph is LangChain's library for building stateful, multi-step agent workflows as directed graphs. It provides state machines, conditional edges, persistence (checkpointing), and human-in-the-loop approval gates. It replaces LangChain's older AgentExecutor with a more composable architecture.

5. When should you use the Vercel AI SDK instead of LangChain?

When you need streaming, tool calling, structured output, and multi-provider support without LangChain's heavy abstraction layer. The Vercel AI SDK is lighter (smaller bundle), simpler (fewer concepts), and production-focused (designed for Next.js and edge deployment).

One-Liner Summary

Start with raw API calls for simple LLM features, reach for LlamaIndex when your RAG pipeline gets complex, and adopt LangGraph when you need stateful multi-tool agents — frameworks are spice, not the main course.

"What I cannot create, I do not understand." — Richard Feynman