The full stack of building serious AI systems: how LLMs serve at scale, how retrieval keeps them grounded, and how agents and tool-use turn them into actual products.
How a single prompt becomes a streaming response — and what makes it fast or slow.
How models stop hallucinating: the vector store, the retriever, and the prompt assembly.
When the model decides to call code instead of just predicting words.