Yash Narwal built 'How LLMs Work' based on Andrej Karpathy's lectures, and the training visualization is the standout feature. At step 1, the model outputs pure noise. By step 500, you see local coherence. By step 32,000, it's writing fluent English. That progression alone teaches you more than most blog posts.
The guide walks you through the full pipeline, from raw data to conversational AI. Data collection starts with Common Crawl's 2.7 billion pages, filtered down to FineWeb, a 44TB dataset with roughly 15 trillion tokens. Tokenization uses Byte Pair Encoding, and here's the fun part: there's a live tokenizer where you can type your name and see exactly how GPT-4, Claude, or Llama 3 would split it into tokens. Try it. It's weirdly satisfying.
Beyond pre-training, you get base model behavior (the 'internet simulator' that autocompletes rather than answers questions), post-training through SFT and RLHF, and practical topics like hallucination and RAG. Hacker News users flagged a real gap: the guide doesn't fully address how static embeddings handle words with multiple meanings depending on context. It's a fair critique, and if you're already familiar with embeddings, you'll notice it. But for developers trying to understand LLMs for the first time, this is still one of the clearest starting points I've seen.