The architecture behind modern AI products — RAG pipelines, agentic workflows, LLM inference at scale, vector databases, function calling, and AI gateways. Every concept comes with an animation and deep-dive.
Retrieval-augmented generation: how chatbots cite sources without hallucinating.
HNSW, IVF, and product quantization — how databases search billions of embeddings in milliseconds.
KV cache, continuous batching, speculative decoding — what actually makes ChatGPT fast.
Multi-agent orchestration: planner, executor, critic — and how they coordinate without falling over.
How LLMs decide when to call APIs, the schemas they emit, and the round-trip back to natural language.
The traffic-cop in front of your LLM: routing, caching, fallbacks, rate limits, observability.