Track 6 · Shipping · Cyan
Shipping & adaptation
Capstone track — from API to production architecture. You can call models, run RAG, version prompts, ship bounded agents, and block merges on eval regression. Now you wire it into services customers depend on: streaming chat endpoints, LLM gateways with routing and fallbacks, fine-tuning when prompts plateau, multimodal document pipelines, structured extraction APIs, and the architecture patterns teams use at scale. Shipping means observable, cost-controlled, gracefully degrading, and evaluatable—not a notebook cell that returns 200 once. Assumes Tracks 1–5 complete: LLM fundamentals, RAG, prompts, agents, and eval CI gates.
Guides in this track
Six deep-dive chapters. All guides are live—read in order.
Reading order: Hello Ship → LLM gateway & infrastructure → Fine-tuning & adaptation → Multimodal ingest & processing → Structured data extraction & agents → Production AI architecture patterns
-
01
Hello Ship — API to production
What “shipped” means, production readiness checklist, streaming SSE, fallback strategies, and a minimal FastAPI/Spring Boot service with health checks and structured logging.
-
02
LLM gateway & infrastructure
Centralized routing, semantic caching, rate limits, multi-tenant keys, model registry, and shared observability across product teams.
-
03
Fine-tuning & adaptation
When to fine-tune vs RAG vs prompts, LoRA/PEFT, dataset curation, eval before deploy, and rollback when adaptation regresses.
-
04
Multimodal ingest & processing
PDF/image/audio pipelines, OCR vs vision models, chunking scanned docs, async workers, and cost-aware preprocessing.
-
05
Structured data extraction & agents
JSON schema extraction, function calling for ETL, validation loops, human-in-the-loop review queues, and agentic document workflows.
-
06
Production AI architecture patterns
Reference architectures for copilots, search, batch extraction, multi-region deployment, compliance boundaries, and platform vs product team split.