Notes from Deep Dive into LLMs

Selected learnings from Andrej Karpathy's deep dive into LLMs

Feb 14, 2025

Andrej Karpathy delivers an outstanding deep dive into the mechanics of state-of-the-art LLMs. If every billionaire had the opportunity to sit with Karpathy for this exact breakdown, they would. And so would you. At ~3.5 hours long, it’s an investment, but one that pays off if you’re serious about understanding how LLMs work.

While some sections cover foundational material, I found a bunch of fascinating tidbits. Sharing them here:

Video Breakdown

The talk can be roughly divided into three segments:


Pretraining

  • The majority of this section covers the fundamentals of how base-model LLMs are trained. This largely feels like old information but is still a solid overview.
  • Karpathy describes LLMs as “lossy compression of the internet”—a phrase that perfectly captures their nature.
  • Watching this made me miss the unfiltered chaos of base models—reminded me of the GPT-2/3 /completions endpoint. I’d love to see non-chat endpoints return.
  • Places like hyperbolic now offer base-model endpoints, which is neat.

(Karpathy introduces these core concepts around 54:00 and continues until 1:02:00.)


Post-training / Supervised Fine-tuning

  • <| im_start |> stands for “imaginary monologue”, not “instant message.” I always thought it was the latter—turns out, Karpathy doesn’t know why either.
  • I didn’t realize special tokens like <| im_start |> are added after training. Makes sense, but it was a gap in my understanding.
  • Tool calls become custom tokens—which, in hindsight, feels obvious.
  • I’ve always felt unsettled about chat completions—many AI use cases don’t fit a chat paradigm. But after seeing how SFT is done on Q/A datasets, I get it. If you’re aiming for helpfulness, fine-tuning on Q/A pairs is the best approach.
  • It also explains why models always respond confidently—their training data is presented that way.
  • This also clarifies why LLMs sometimes lack thread-level empathy. SFT data is mostly Q/A pairs, not full conversations. I suspect multi-turn datasets will improve this.

Hallucinations & Training Models to Say "I Don't Know"

  • Karpathy asks an important question: “How do we know what the model knows vs. doesn’t know?”
  • Skip to 1:25:00 for Karpathy’s breakdown.
  • A brilliantly simple approach: train the model with Q/A pairs where the answer is “I don’t know.”
  • But how do you generate these? The trick is counterintuitive:
    • Feed the model context.
    • Have it generate Q/A pairs.
    • Later, ask it those same questions.
    • If it gets an answer wrong, add that to the “I don’t know” dataset.
  • This pattern of SFT works because of "this internal neuron that we presume exists and empirically this turns out to probably be the case" -- I don't know if the world is ready for stochastic software.

Karpathy’s Analogies

  • The knowledge in model parameters = “Vague recollection” (like something you read a month ago).
  • The knowledge in the context window = “Working memory” (like something you read a minute ago).

“Models Need Tokens to Think”

  • I always describe this differently. I tell people LLMs can press any button on the keyboard EXCEPT backspace. They cannot edit mistakes—only justify them.
    • This means you don’t want them to jump to an answer. You want them to explain before committing.
    • Asking them to compute 123456789 + 123456789 is a bad idea because they process left-to-right, but addition needs right-to-left thinking. By the time they realize they need to carry a one, it's too late. This is why LLMs are +10% better at math when you reverse the numbers first.

AI “Fails” That Social Media Loves


Post-training / Reinforcement Learning

  • Another analogy: How we teach kids.
    • Reading the textbook = Pre-training.
    • Studying examples = Supervised fine-tuning.
    • Doing homework & checking answers = Reinforcement learning.
  • Chain-of-thought reasoning emerges naturally during RL—it’s not hardcoded.
  • The more tokens burned, the better the model performs. I used to force LLMs to explain their reasoning—it’s fascinating to see RL making them do it naturally.

Why RL is Still Hard

  • Things with unclear reward functions are hard to optimize.
  • Example: Humor is extremely difficult to train for.
  • RLHF (reinforcement learning from human feedback) tries to solve this by training a model to predict human scores, but it’s imperfect.
  • Reward hacking: RL models exploit their reward function when possible.
    • Example: A model might discover that “the the the ee e e eeee 1” tricks the scoring model into a high reward.
    • OpenAI has an excellent article on this: Faulty Reward Functions.

Karpathy’s talk is packed with insights. If you have an afternoon, it’s absolutely worth watching.