The Core Engine: How AI Learns and Predicts the Next Token

From internet text → patterns → probabilities → generation

Apr 12, 2026

LLMs for Engineers — Part 4

Welcome back to our LLM Journey. If you are new to here , don’t miss to check previous posts before diving into this post.

So far, we’ve built the pipeline:

Internet → filtered data
What’s Actually Behind ChatGPT?
Anurudh
·
Mar 29
Read full story
Data → tokens (numbers)
Where ChatGPT Actually Learns From
Anurudh
·
Mar 29
Read full story

Now comes the most important step:

👉 How does the model actually learn from this data?

🧠 Step 1: What the model actually sees

After preprocessing + tokenization:

👉 The entire internet becomes:

[1543, 9281, 77, 201, 9912, ...]

Important realization

This is NOT:

sentences
paragraphs
knowledge

👉 It is just a long sequence of tokens

In real systems:

👉 ~15 trillion tokens

Think like this

It’s like:

👉 A massive “stream of language”

And the goal is:

👉 Learn how this stream flows

🧩 Step 2: Create training examples (windows)

We don’t feed everything at once.

We take small chunks:

[router, sends, data, packet]

This is called:

👉 Context

The task

Given this:

👉 Predict next token

Example

router sends data packet → ?

Correct answer:

👉 “forward”

👇 Now visualize what happens

⚙️ Step 3: Model makes predictions

The neural network outputs:

👉 Probability for EVERY token (~100K)

Example:

forward → 4%
drop → 2%
receive → 1%

Important ⚠️

At start:

👉 These are RANDOM

Because:

👉 Model starts with random parameters

🔧 Step 4: Compare with correct answer

We already know:

👉 Correct = “forward”

Goal

👉 Increase its probability
👉 Decrease others

🔁 Step 5: Update the model

This is the key idea from transcript:

We “nudge” the model slightly toward correct answers

Think like this

🎛️ Model = billions of knobs
Training = adjusting them slightly

🔄 Step 6: Repeat at massive scale

This process happens:

across trillions of tokens
in parallel
millions of times

What is happening?

👉 Model is learning:

Statistical patterns of token sequences

Example patterns learned

“capital of France” → Paris
“router sends data” → packet
“once upon a” → time

⚙️ Step 7: What is inside the neural network?

Let’s simplify :

It’s just a function

👉 Input → Output

Input:

👉 Token sequence

Inside:

matrix multiplications
additions
transformations

Output:

👉 100,000 probabilities

Key idea

It’s a giant mathematical expression

Parameters

👉 Billions of numbers

Before training:

👉 Random behavior

After training:

👉 Pattern-aware behavior

🧠 Important insight (VERY IMPORTANT)

The model does NOT learn:

facts
rules
logic

It learns:

👉 what usually comes next

🔮 Step 8: Inference (Generation)

Now training is DONE.

Model is frozen.

Now what happens?

You give input:

router sends

Step-by-step

Step 1:

Predict next token → “data”

Step 2:

Append → “router sends data”

Step 3:

Predict → “packet”

Step 4:

Repeat…

👇 Visualize this loop

🎲 Step 9: Sampling (critical concept)

The model does NOT always pick:

👉 highest probability

It samples

Think:

👉 rolling a weighted dice

Result

Same input:

router sends data

Can produce:

packet forward
packet to destination
packet via protocol

⚠️ This explains EVERYTHING

From transcript:

👉 Model is stochastic

That’s why:

outputs vary
hallucinations happen
creativity exists

🤯 Step 10: Why it feels intelligent

Because:

👉 It has compressed internet patterns into weights

Key idea

Model is like a lossy compression of the internet

Meaning

It doesn’t store exact data
It stores patterns

⚠️ Reality check

The model is NOT:

thinking
reasoning
verifying

It is:

👉 Predicting what is likely next

🧠 Engineer’s mental model

Think of it like:

Data → token stream
Model → probability engine
Output → sampled token
Loop → repeat

🔁 Training vs Inference

Training:

👉 Learn patterns

Inference:

👉 Generate sequences

💡 One powerful takeaway

ChatGPT = trained probability machine over token sequences

🚀 What’s coming next

Now we understand:

how models learn
how they generate

But something is missing…

👉 Why does ChatGPT behave like an assistant?

🔜 Next in this series

👉 Base model → ChatGPT (alignment & post-training)

If you truly understood this post…

👉 You now understand 80% of how LLMs work.

Let’s go deeper 🚀

SMiles:)

Anurudh

GrowAIth

What’s Actually Behind ChatGPT?

Where ChatGPT Actually Learns From

Discussion about this post

Ready for more?