Book companion hub

If you want to understand LLMs beyond API calls, the best way is to build one from the ground up. Build a Large Language Model (From Scratch) walks through text embedding, attention, GPT-style model architecture components, pretraining, and finetuning in Python and PyTorch.

Also: Amazon book · Manning course

Build a Large Language Model From Scratch book cover

GitHub repository Star 96.3k

Amazon 4.5 (504)

High-level mental model for building a large language model from scratch
A compact map of the book's core path: text data, text embedding, attention, model implementation, pretraining, and finetuning.

Study Guide

1. Read

Start with the chapter

Read the chapter first so the implementation has context. I recommend doing a first read-through pass without coding at first.

Open book page
2. Watch

Use video as an optional second pass

Use the video course after reading if you want the same implementation ideas explained in a different format.

Open video playlist
3. Code

Build alongside the book chapter

Retype and run the code after reading each chapter for the best (but most time-intensive) learning experience. Otherwise, execute the notebooks cell by cell and edit small parts when you want to explore an idea. (I have some more tips on reading books here, if you are interested)

Open code repository
4. Exercises

Use the exercises as the check

Try the exercises at the end of each chapter before looking at the solutions. The exercises help self-check whether you understood the chapter implementation.

Feedback

See book page for more testimonials.

"If you want to become a top-tier ML / AI Engineer, you need to understand what's going on under the hood."

"I got a serious closeup look at what goes on inside an LLM."

"This is the best technical book I have ever studied by a large margin."

"Ultimate hands on guide to build foundational models. This is the book you want to buy if you want to go deep."

Chapter Map

Chapter 1 High-level orientation to LLMs and the model-building path. Open Chapter 1 code
Chapter 2 Text data, text embedding, byte pair encoding, and input-target construction. Open Chapter 2 code
Chapter 3 Self-attention, causal attention, multi-head attention, and transformer blocks. Open Chapter 3 code
Chapter 4 GPT model implementation plus modern architecture concept guides. Open local concept guides
Chapter 5 Pretraining, loss functions, text generation, sampling, and model loading. Open Chapter 5 code
Chapter 6 Classification finetuning and using a pretrained LLM for a supervised task. Open Chapter 6 code
Chapter 7 Instruction finetuning, prompt formatting, and instruction-following behavior. Open Chapter 7 code
Appendix A Introduction to PyTorch, including notebook code and distributed training notes. Open Appendix A code
Appendix B References and further reading for the main chapters. Open Appendix B resources
Appendix C Exercise solutions for checking the chapter implementations. Open Appendix C solutions
Appendix D Training-loop additions such as learning-rate schedules and other practical refinements. Open Appendix D code
Appendix E Parameter-efficient finetuning with LoRA. Open Appendix E code

Architecture Concept Guides

The book covers the core implementation path through text embedding, attention, GPT-style model architecture components, pretraining, and finetuning. The concept guides below are advanced follow-up material for connecting those basics to current model architectures, memory use, and serving tradeoffs. They are best read after completing the book.

Attention

A Visual Guide to Attention Variants in Modern LLMs

Use this article for a visual pass through MHA, MQA, GQA, MLA, sparse attention, sliding-window attention, and hybrid designs.

Read the article
Feed-forward layers

MoE and SwiGLU

Use these guides to connect sparse expert routing and gated feed-forward layers to model capacity and inference cost.

Open MoE explainer
Architecture comparison

The Big LLM Architecture Comparison

Use this guide to compare current decoder-style LLM architectures, including normalization, position handling, attention choices, and MoE layers.

Read the guide

Related From-Scratch Articles

These articles are optional follow-up reads for specific implementation details. They are best read after completing the book.

Self-Attention from Scratch

A compact standalone implementation of the core mechanism behind transformer LLMs.

Read the article

BPE Tokenizer from Scratch

A practical implementation of byte pair encoding, one of the first pieces to understand before LLM training.

Read the article

KV Cache from Scratch

A focused explanation of the inference cache that makes autoregressive generation efficient.

Read the article

Qwen from Scratch

A bridge from the book's baseline implementation to a modern open-weight model family.

Read the article

LoRA and DoRA from Scratch

A code-oriented path into parameter-efficient adaptation after you understand the base model.

Read the article

LLM Evaluation from Scratch

Benchmarks, verifiers, leaderboards, and LLM judges as practical tools for checking model behavior.

Read the article

Where to Go Next

After finishing the book, these are the next places I would go. Continue with reasoning methods, or use the gallery to compare current model architectures more broadly.

Reasoning

Build a Reasoning Model (From Scratch)

Continue here after the LLM basics if you want inference-time scaling, reinforcement learning, and distillation.

Open reasoning hub
Reference

LLM Architecture Gallery

Compare architecture figures, attention mechanisms, decoder types, and implementation links across model families.

Open gallery