Build a Large Language Model (From Scratch)

Book companion hub

If you want to understand LLMs beyond API calls, the best way is to build one from the ground up. Build a Large Language Model (From Scratch) walks through text embedding, attention, GPT-style model architecture components, pretraining, and finetuning in Python and PyTorch.

Code repository Manning book YouTube course

Also: Amazon book · Manning course

Build a Large Language Model From Scratch book cover

GitHub repository Star 99.3k

Amazon 4.5 (504)

High-level mental model for building a large language model from scratch — A compact map of the book's core path: text data, text embedding, attention, model implementation, pretraining, and finetuning.

Study Guide

1. Read

Start with the chapter

Read the chapter first so the implementation has context. I recommend doing a first read-through pass without coding at first.

Open book page

2. Watch

Use video as an optional second pass

Use the video course after reading if you want the same implementation ideas explained in a different format.

Open video playlist

3. Code

Build alongside the book chapter

Retype and run the code after reading each chapter for the best (but most time-intensive) learning experience. Otherwise, execute the notebooks cell by cell and edit small parts when you want to explore an idea. (I have some more tips on reading books here, if you are interested)

Open code repository

4. Exercises

Use the exercises as the check

Try the exercises at the end of each chapter before looking at the solutions. The exercises help self-check whether you understood the chapter implementation.

Feedback

See book page for more testimonials.

"If you want to become a top-tier ML / AI Engineer, you need to understand what's going on under the hood."

Via Miguel Otero Pedrido, Senior Machine Learning Engineer at Zapier

"I got a serious closeup look at what goes on inside an LLM."

Via Ganapathy Subramaniam, Gen AI developer

"This is the best technical book I have ever studied by a large margin."

Via Soumitri Kadambi, Director Artificial Intelligence at ZeOmega

"Ultimate hands on guide to build foundational models. This is the book you want to buy if you want to go deep."

Antonio Gulli, Google Sr Director

Chapter Map

Chapter 1	High-level orientation to LLMs and the model-building path.	Open Chapter 1 code
Chapter 2	Text data, text embedding, byte pair encoding, and input-target construction.	Open Chapter 2 code
Chapter 3	Self-attention, causal attention, multi-head attention, and transformer blocks.	Open Chapter 3 code
Chapter 4	GPT model implementation plus modern architecture concept guides.	Open local concept guides
Chapter 5	Pretraining, loss functions, text generation, sampling, and model loading.	Open Chapter 5 code
Chapter 6	Classification finetuning and using a pretrained LLM for a supervised task.	Open Chapter 6 code
Chapter 7	Instruction finetuning, prompt formatting, and instruction-following behavior.	Open Chapter 7 code
Appendix A	Introduction to PyTorch, including notebook code and distributed training notes.	Open Appendix A code
Appendix B	References and further reading for the main chapters.	Open Appendix B resources
Appendix C	Exercise solutions for checking the chapter implementations.	Open Appendix C solutions
Appendix D	Training-loop additions such as learning-rate schedules and other practical refinements.	Open Appendix D code
Appendix E	Parameter-efficient finetuning with LoRA.	Open Appendix E code

Architecture Concept Guides

The book covers the core implementation path through text embedding, attention, GPT-style model architecture components, pretraining, and finetuning. The concept guides below are advanced follow-up material for connecting those basics to current model architectures, memory use, and serving tradeoffs. They are best read after completing the book.

Attention

A Visual Guide to Attention Variants in Modern LLMs

Use this article for a visual pass through MHA, MQA, GQA, MLA, sparse attention, sliding-window attention, and hybrid designs.

Read the article

Feed-forward layers

MoE and SwiGLU

Use these guides to connect sparse expert routing and gated feed-forward layers to model capacity and inference cost.

Open MoE explainer

Architecture comparison

The Big LLM Architecture Comparison

Use this guide to compare current decoder-style LLM architectures, including normalization, position handling, attention choices, and MoE layers.

Read the guide

Where to Go Next

After finishing the book, these are the next places I would go. Continue with reasoning methods, or use the gallery to compare current model architectures more broadly.

Reasoning

Build a Reasoning Model (From Scratch)

Continue here after the LLM basics if you want inference-time scaling, reinforcement learning, and distillation.

Open reasoning hub

Reference

LLM Architecture Gallery

Compare architecture figures, attention mechanisms, decoder types, and implementation links across model families.

Open gallery

Build a Large Language Model (From Scratch)

Study Guide

Start with the chapter

Use video as an optional second pass

Build alongside the book chapter

Use the exercises as the check

Feedback

Chapter Map

Architecture Concept Guides

A Visual Guide to Attention Variants in Modern LLMs

MoE and SwiGLU

The Big LLM Architecture Comparison

Related From-Scratch Articles

Self-Attention from Scratch

BPE Tokenizer from Scratch

KV Cache from Scratch

Qwen from Scratch

LoRA and DoRA from Scratch

LLM Evaluation from Scratch

Where to Go Next

Build a Reasoning Model (From Scratch)

LLM Architecture Gallery