AI/ML Engineer Learning Path

A structured 12-week journey through the Knowledge Vault for engineers building AI/ML-powered products. This path covers ML fundamentals (30 pages), deep learning (25 pages), LangChain/LangGraph mega guides, fine-tuning, guardrails, AI testing, model serving, GPU infrastructure, RAG architecture, AI agents, and production MLOps.

Who This Is For

Software engineers transitioning into AI/ML engineering roles
Backend engineers adding AI capabilities to existing products
Data scientists who want to learn the engineering side of ML
Anyone building LLM-powered applications in production

Prerequisites

Solid programming skills (Python + one backend language)
Basic understanding of APIs and databases
Basic math (linear algebra, calculus, probability) -- or willingness to learn
No prior ML experience required

Total estimated time: ~60 hours across 12 weeks

Learning Progression

Week 1-2: ML Foundations (Part 1 of 30 pages)

Estimated reading time: 6 hours

Build the math and conceptual foundations before touching models.

[ ] Required -- Machine Learning Overview (15 min)
[ ] Required -- Math Foundations (35 min)
[ ] Required -- ML Workflow (25 min)
[ ] Required -- Python ML Ecosystem (25 min)
[ ] Required -- Data Preparation (25 min)
[ ] Required -- Linear Regression (30 min)
[ ] Required -- Logistic Regression (25 min)
[ ] Required -- Evaluation Metrics (25 min)
[ ] Required -- Cross-Validation (20 min)
[ ] Required -- Model Selection (25 min)
[ ] Reference -- Scikit-learn Cheat Sheet (10 min)
[ ] Reference -- Python Cheat Sheet (10 min)

Checkpoint

After this section you should be able to: explain the ML workflow, implement linear and logistic regression, evaluate models with precision/recall/F1/AUC, and perform cross-validation correctly.

Week 2-3: ML Algorithms (Part 2 of 30 pages)

Estimated reading time: 6 hours

Master the classical ML algorithms that form the foundation of modern AI.

[ ] Required -- Decision Trees (25 min)
[ ] Required -- Random Forests (25 min)
[ ] Required -- Gradient Boosting (30 min)
[ ] Required -- Ensemble Methods (25 min)
[ ] Required -- SVM (25 min)
[ ] Required -- KNN (20 min)
[ ] Required -- Naive Bayes (20 min)
[ ] Required -- Clustering (25 min)
[ ] Required -- Hyperparameter Tuning (25 min)
[ ] Required -- Algorithm Selection Guide (20 min)
[ ] Optional -- Feature Engineering Advanced (25 min)
[ ] Optional -- Dimensionality Reduction (25 min)
[ ] Optional -- Anomaly Detection (20 min)
[ ] Optional -- Recommendation Systems (25 min)
[ ] Optional -- Time Series ML (25 min)
[ ] Optional -- ML Interpretability (25 min)
[ ] Optional -- ML Checklist (15 min)
[ ] Optional -- Topic Modeling (20 min)
[ ] Optional -- Association Rules (15 min)

Checkpoint

After this section you should be able to: choose the right algorithm for a given problem, tune hyperparameters with grid/random/Bayesian search, and explain the bias-variance tradeoff.

Week 3-4: Deep Learning Foundations

Estimated reading time: 5 hours

Transition from classical ML to deep learning. Understand neural networks, PyTorch, and training techniques.

[ ] Required -- Deep Learning Overview (15 min)
[ ] Required -- Neural Network Basics (35 min)
[ ] Required -- PyTorch Fundamentals (30 min)
[ ] Required -- Training Techniques (25 min)
[ ] Required -- Architecture Selection Guide (25 min)
[ ] Required -- Transfer Learning (25 min)
[ ] Required -- DL Checklist (20 min)

Checkpoint

After this section you should be able to: implement neural networks in PyTorch, apply training techniques (BatchNorm, dropout, LR scheduling), and choose the right architecture for a given task.

Week 4-5: DL Architectures (Part 2 of 25 pages)

Estimated reading time: 6 hours

Master the architectures that power modern AI: CNNs, RNNs, Transformers, and generative models.

[ ] Required -- Transformers (30 min)
[ ] Required -- Language Models (30 min)
[ ] Required -- BERT Family (25 min)
[ ] Required -- NLP Fundamentals (25 min)
[ ] Required -- Text Generation (25 min)
[ ] Optional -- CNN (25 min)
[ ] Optional -- RNN & LSTM (25 min)
[ ] Optional -- Diffusion Models (25 min)
[ ] Optional -- GANs (25 min)
[ ] Optional -- Multimodal Models (25 min)
[ ] Optional -- Model Optimization (25 min)
[ ] Optional -- Reinforcement Learning (25 min)
[ ] Optional -- Papers Reading List (20 min)

Checkpoint

After this section you should be able to: explain transformer attention mechanism, understand the difference between BERT and GPT architectures, and fine-tune pretrained models.

Week 5-6: LLM Integration

Estimated reading time: 5 hours

Integrate LLMs into production with proper engineering around prompts, rate limits, costs, and fallbacks.

[ ] Required -- AI/ML Engineering Overview (15 min)
[ ] Required -- LLM Integration (35 min)
[ ] Required -- OpenAI API (25 min)
[ ] Required -- Anthropic Claude API (25 min)
[ ] Required -- Prompt Engineering Advanced (30 min)
[ ] Required -- Prompt Caching (20 min)
[ ] Required -- Multimodal AI (25 min)
[ ] Optional -- Vercel AI SDK (20 min)
[ ] Optional -- HuggingFace (25 min)
[ ] Reference -- LLM APIs Cheat Sheet (10 min)

Checkpoint

After this section you should be able to: build production LLM integrations with caching and fallbacks, implement advanced prompt engineering, and manage token budgets and costs.

Week 6-7: LangChain & LangGraph

Estimated reading time: 5 hours

LangChain and LangGraph are the dominant frameworks for building LLM-powered applications.

[ ] Required -- LangChain (40 min)
[ ] Required -- LangGraph (40 min)
[ ] Required -- LangSmith (25 min)
[ ] Required -- LlamaIndex (30 min)
[ ] Required -- CrewAI & AutoGen (25 min)

Comparisons:

[ ] Required -- LangChain vs LlamaIndex (20 min)
[ ] Required -- OpenAI vs Anthropic vs Google (20 min)

Checkpoint

After this section you should be able to: build complex LLM applications with LangChain, implement stateful multi-step agents with LangGraph, trace and debug with LangSmith, and choose between frameworks.

Week 7-8: RAG & Embeddings

Estimated reading time: 5 hours

RAG is the dominant pattern for building AI products that answer questions from private data.

[ ] Required -- RAG Architecture (40 min)
[ ] Required -- Embeddings (35 min)
[ ] Required -- Vector Databases (35 min)
[ ] Required -- Data Annotation (25 min)
[ ] Required -- Search Service Blueprint (40 min)
[ ] Optional -- Elasticsearch Internals (25 min)

Checkpoint

After this section you should be able to: design a complete RAG pipeline, implement hybrid search (vector + keyword), choose chunking strategies, and evaluate retrieval quality.

Week 8-9: AI Agents

Estimated reading time: 4 hours

AI agents use LLMs to plan and execute multi-step tasks with tools.

[ ] Required -- AI Agents (40 min)
[ ] Required -- LLM Integration (35 min -- focus on function calling)
[ ] Required -- AI in Production (30 min)
[ ] Optional -- Job Queue Blueprint (40 min)
[ ] Optional -- Circuit Breaker (20 min)

Checkpoint

After this section you should be able to: build ReAct and Plan-Execute agents, implement guardrails and human-in-the-loop, and debug agent reasoning traces.

Week 9-10: Fine-Tuning & Guardrails

Estimated reading time: 5 hours

Customize models for your domain and keep them safe in production.

[ ] Required -- Fine-Tuning (35 min)
[ ] Required -- AI Guardrails (30 min)
[ ] Required -- AI Testing (30 min)
[ ] Required -- ML Pipelines (30 min)
[ ] Optional -- Model Optimization (25 min)
[ ] Optional -- Text Generation (25 min -- focus on RLHF/DPO)

Checkpoint

After this section you should be able to: fine-tune models with LoRA/QLoRA, implement content safety guardrails, design AI evaluation suites, and build reproducible ML pipelines.

Week 10-11: Model Serving & GPU Infrastructure

Estimated reading time: 5 hours

Deploy and serve models at scale with proper GPU management and infrastructure.

[ ] Required -- Model Serving (30 min)
[ ] Required -- GPU Kubernetes (30 min)
[ ] Required -- AI Infrastructure Overview (15 min)
[ ] Required -- Docker Overview (15 min)
[ ] Required -- Production Dockerfiles (25 min)
[ ] Required -- Kubernetes Overview (15 min)
[ ] Required -- HPA, VPA & KEDA (25 min)
[ ] Optional -- AWS Lambda (25 min)
[ ] Optional -- GCP Cloud Run (25 min)
[ ] Optional -- Serverless Patterns (25 min)

Checkpoint

After this section you should be able to: deploy model serving endpoints with auto-scaling, manage GPU resources on Kubernetes, containerize ML models, and choose between serverless and container-based inference.

Week 11: AI Testing & MLOps

Estimated reading time: 4 hours

[ ] Required -- AI Testing (30 min -- deep read)
[ ] Required -- Test Architecture (25 min)
[ ] Required -- Integration Testing (25 min)
[ ] Required -- CI/CD Overview (15 min)
[ ] Required -- GitHub Actions Deep Dive (30 min)
[ ] Optional -- Property-Based Testing (25 min)

Week 12: Production Blueprints & Capstone

Estimated reading time: 4 hours

[ ] Required -- Search Service Blueprint (40 min)
[ ] Required -- Analytics Pipeline Blueprint (40 min)
[ ] Required -- Feature Flag Blueprint (35 min)
[ ] Optional -- Chat Service Blueprint (35 min)
[ ] Optional -- Notification Service Blueprint (35 min)

What You Will Be Able to Do After This Path

Implement and evaluate classical ML algorithms (30 pages of foundations)
Build and train deep learning models (25 pages of architectures)
Integrate LLMs with LangChain, LangGraph, and LlamaIndex
Design and build RAG pipelines with vector databases
Fine-tune models with LoRA and evaluate with custom benchmarks
Implement AI guardrails for content safety and hallucination prevention
Serve models at scale on GPU Kubernetes clusters
Build end-to-end AI testing and monitoring pipelines

ML/DL Engineer Path -- Deep dive into DL architectures and research
Data Scientist Path -- Math foundations, EDA, and statistical modeling
Data Engineer Path -- Data pipelines that feed ML systems
Backend Engineer Path -- APIs and infrastructure for AI products
Platform Engineer Path -- GPU infrastructure and model serving platforms

Total Progress

This path contains approximately 120 pages (30 ML + 25 DL + 25 AI engineering + 40 infrastructure/blueprints). Budget 12 weeks at 5 hours per week. The ML + DL foundations (weeks 1-5) are essential before diving into LLM integration.

AI/ML Engineer Learning Path ​

Who This Is For ​

Prerequisites ​

Learning Progression ​

Week 1-2: ML Foundations (Part 1 of 30 pages) ​

Week 2-3: ML Algorithms (Part 2 of 30 pages) ​

Week 3-4: Deep Learning Foundations ​

Week 4-5: DL Architectures (Part 2 of 25 pages) ​

Week 5-6: LLM Integration ​

Week 6-7: LangChain & LangGraph ​

Week 7-8: RAG & Embeddings ​

Week 8-9: AI Agents ​

Week 9-10: Fine-Tuning & Guardrails ​

Week 10-11: Model Serving & GPU Infrastructure ​

Week 11: AI Testing & MLOps ​

Week 12: Production Blueprints & Capstone ​

What You Will Be Able to Do After This Path ​

Cross-References to Related Paths ​

Related Pages

AI/ML Engineer Learning Path

Who This Is For

Prerequisites

Learning Progression

Week 1-2: ML Foundations (Part 1 of 30 pages)

Week 2-3: ML Algorithms (Part 2 of 30 pages)

Week 3-4: Deep Learning Foundations

Week 4-5: DL Architectures (Part 2 of 25 pages)

Week 5-6: LLM Integration

Week 6-7: LangChain & LangGraph

Week 7-8: RAG & Embeddings

Week 8-9: AI Agents

Week 9-10: Fine-Tuning & Guardrails

Week 10-11: Model Serving & GPU Infrastructure

Week 11: AI Testing & MLOps

Week 12: Production Blueprints & Capstone

What You Will Be Able to Do After This Path

Cross-References to Related Paths