2outube

Best clips & highlights from “Deep Dive into LLMs like ChatGPT”

by Andrej Karpathy

Here are the 6 most clip-worthy moments — auto-detected from the transcript. Tap a timestamp to jump straight to it on YouTube.

🎬 Turn these moments into shareable vertical clipsCaptions, 9:16, ready to post — early access

The Internet is 44 Terabytes?!

It presents a surprising fact that the vastness of the internet, when filtered for text, is a manageable 44 terabytes.

Caption You think the internet is HUGE? 🤯 The *text* data for LLMs is only 44TB. Mind blown! #AI #LLM #DataScience

What AI Doesn't Learn

It reveals the crucial filtering process that prevents LLMs from ingesting harmful or undesirable content.

Caption Think AI learns *everything*? Think again! 🚫 Here's what's filtered out to keep LLMs safe. #AIethics #DataFiltering #LLMSafety

How Language Filters Shape AI

It highlights how language filtering decisions directly impact an LLM's multilingual capabilities and potential biases.

Caption Why is your AI better at English than Spanish? 🗣️ It's all in the language filtering! #AIbias #MultilingualAI #LLMdevelopment

The Internet: A Text Tapestry

It uses a powerful metaphor to describe the vast, raw text data that neural networks learn to model.

Caption Imagine the entire internet as a giant tapestry of text. 🤯 This is what neural networks learn to mimic! #AI #NeuralNetworks #BigData

Bytes Are Just Emojis

It simplifies the abstract concept of bytes as unique symbols by likening them to emojis, making it highly relatable.

Caption What if every byte was an emoji? 🤯 This is how computers see data! #DataScience #Emojis #TechExplained

How AI Compresses Language

It explains the clever algorithm (Byte Pair Encoding) used to efficiently represent text for neural networks.

Caption How do LLMs handle massive amounts of text efficiently? 🤔 It's all about Byte Pair Encoding! #LLM #Tokenization #AIhacks

Generated from the full transcript of this video · 2outube — change youtube to 2outube on any video.