2outube

Best clips & highlights from “Let's build the GPT Tokenizer”

by Andrej Karpathy

Here are the 6 most clip-worthy moments — auto-detected from the transcript. Tap a timestamp to jump straight to it on YouTube.

🎬 Turn these moments into shareable vertical clipsCaptions, 9:16, ready to post — early access

Tokenization: LLM's Hidden Nightmare

The speaker's strong dislike for a crucial LLM component creates an immediate, relatable hook.

Caption Ever wonder what's the most annoying part of building LLMs? It's not what you think. #LLM #AI #Tokenization

Tokens: The Atom of LLMs

It defines a core concept of LLMs in a simple, impactful metaphor.

Caption If LLMs are built from atoms, what are they? Discover the fundamental unit of large language models. #AIExplained #Tokens #LLM

LLM Weirdness? Blame Tokenization!

It offers a surprising explanation for common LLM quirks, intriguing anyone who's used them.

Caption Ever notice LLMs doing weird things? The answer might surprise you. It's all about tokenization! #AIProblems #LLMSecrets #TechExplained

Magikarp Broke GPT?

Uses highly specific, bizarre examples (Magikarp, YAML vs. JSON) to illustrate tokenization's hidden influence.

Caption Why did "Solid Gold Magikarp" make GPT go crazy? And why YAML over JSON? The answer is wild. #GPT #AI #HiddenTruth

The "Egg" Tokenization Mystery

Demonstrates the arbitrary and case-sensitive nature of tokenization with a simple, relatable word.

Caption How many tokens is "egg"? It depends! See the bizarre and inconsistent world of LLM tokenization. #LLM #AI #TokenizationMystery

Python Indentation: LLM's Nightmare

Provides a specific, relatable example for programmers of how tokenization makes LLMs bad at Python.

Caption Why was GPT-2 so bad at Python? It's not the code, it's the spaces! #Python #GPT #AI #CodingProblems

Generated from the full transcript of this video · 2outube — change youtube to 2outube on any video.