Coming Soon
E2OECD AILit — Engaging with AI, Competency 2
How Does AI Work?
Learning from Data
AI doesn’t think like a human — it doesn’t reason, remember, or care. What it does is spot patterns in data. Imagine teaching a small kid to recognize a cat: you don’t hand them a biology textbook, you just show them pictures. Twenty cats in, they can point at a new one and get it right. AI works the same way, only at a scale no human can match. GrabFood learned your favorite kaprao by watching millions of orders. Gmail learned what spam looks like from billions of marked emails. Different apps, same trick.
Before AI Could Learn
For most of computing history, teaching a machine meant writing rules. If you wanted a program to recognize spam, you sat down and listed the signs: words like “FREE!!!”, suspicious links, strange sender addresses. The program followed your list. The moment a spammer changed tactics, your list was useless.
This worked for narrow problems — calculator logic, tax rules, chess openings. It fell apart on anything messy. What makes a photo a cat photo? You can try to write rules (“has whiskers”, “has ears”), but a partially hidden cat still looks like a cat to you, while your rules fail. For decades, AI researchers hit this wall again and again.
The breakthrough was a shift in philosophy: stop writing rules; let the machine find them. Show it thousands of cat photos and thousands of not-cat photos, and tell it which is which. The model finds the rules on its own — including rules a human could never write down, like “cat-shaped silhouette plus a specific fur texture”. This approach, machine learning, had been an idea since the 1950s, but only worked in practice once we had enough data, enough compute, and better math — roughly from 2012 onward. That shift also changed who can build AI: writing rules was a specialist job, but training a model just needs data and patience. That’s how apps like Shopee, LINE, Netflix, and Google Maps can all run serious AI without reinventing the wheel.
This worked for narrow problems — calculator logic, tax rules, chess openings. It fell apart on anything messy. What makes a photo a cat photo? You can try to write rules (“has whiskers”, “has ears”), but a partially hidden cat still looks like a cat to you, while your rules fail. For decades, AI researchers hit this wall again and again.
The breakthrough was a shift in philosophy: stop writing rules; let the machine find them. Show it thousands of cat photos and thousands of not-cat photos, and tell it which is which. The model finds the rules on its own — including rules a human could never write down, like “cat-shaped silhouette plus a specific fur texture”. This approach, machine learning, had been an idea since the 1950s, but only worked in practice once we had enough data, enough compute, and better math — roughly from 2012 onward. That shift also changed who can build AI: writing rules was a specialist job, but training a model just needs data and patience. That’s how apps like Shopee, LINE, Netflix, and Google Maps can all run serious AI without reinventing the wheel.
Sort It Out
Drag each item into the right category.
Items (6)
Good training data
Drag items here
Biased data
Drag items here
Too little data
Drag items here
Train Your Own AI
Sort these items into categories, then press Train to see how accurate your AI model is.
Click an item, then click a category
The Training Process
Training Data is everything you show the model. If you want a spam filter, your training data is a big pile of emails — real spam and real legitimate messages, mixed together. If you want a voice assistant, it’s thousands of hours of people speaking. The model only knows what it has seen: train it only on English, it won’t understand Thai.
Labels are the answers you attach to each example. “This email is spam.” “This image contains a cat.” “This sentence is Thai.” The model doesn’t learn on its own — someone (often a human) has to mark each example so the model knows what to aim for. When Google Maps learns which photos are of restaurants versus parking lots, a person somewhere labeled the first batch. Labels are the whole reason the model can learn anything at all.
The Model is the recipe that turns examples into patterns. Early on, it guesses randomly. After seeing enough labeled examples, it starts finding rules — not “if subject contains FREE then spam” (that’s old-school), but statistical patterns across thousands of signals at once. The model updates its internal numbers every time it gets something wrong, slowly getting better.
Accuracy is how often the model’s guess matches the real label. 95% accuracy sounds great — until you realize 95% on 10 million daily emails means 500,000 wrong calls. Accuracy also doesn’t tell you which way the errors lean: a spam filter that catches no spam but flags zero legit emails is 99% accurate if spam is rare, and useless. Real teams look at more specific numbers, like how often it misses a real threat versus flagging something safe.
The punchline: data quality is everything. Better labels and more variety beat a cleverer algorithm almost every time. Two teams can use the exact same model architecture; the one with cleaner, more diverse training data wins.
Labels are the answers you attach to each example. “This email is spam.” “This image contains a cat.” “This sentence is Thai.” The model doesn’t learn on its own — someone (often a human) has to mark each example so the model knows what to aim for. When Google Maps learns which photos are of restaurants versus parking lots, a person somewhere labeled the first batch. Labels are the whole reason the model can learn anything at all.
The Model is the recipe that turns examples into patterns. Early on, it guesses randomly. After seeing enough labeled examples, it starts finding rules — not “if subject contains FREE then spam” (that’s old-school), but statistical patterns across thousands of signals at once. The model updates its internal numbers every time it gets something wrong, slowly getting better.
Accuracy is how often the model’s guess matches the real label. 95% accuracy sounds great — until you realize 95% on 10 million daily emails means 500,000 wrong calls. Accuracy also doesn’t tell you which way the errors lean: a spam filter that catches no spam but flags zero legit emails is 99% accurate if spam is rare, and useless. Real teams look at more specific numbers, like how often it misses a real threat versus flagging something safe.
The punchline: data quality is everything. Better labels and more variety beat a cleverer algorithm almost every time. Two teams can use the exact same model architecture; the one with cleaner, more diverse training data wins.
Garbage In, Garbage Out
If the examples you feed a model are wrong, lopsided, or too narrow, the model will learn the wrong lesson. Train a face recognition model on faces from one demographic, and it’ll misfire on everyone else. Label emails inconsistently, and your spam filter will never trust itself. The model is only as good as what it has been shown. This is why teams spend more time cleaning and checking data than they spend tuning algorithms — and it’s why “bigger AI” alone never fixes a bad dataset.
Why This Matters
Understanding how AI learns changes how you interact with it. When ChatGPT gives you a confident but wrong answer, now you know why: somewhere in its training data, there were examples pointing it that way, and no one corrected the pattern. When a recommendation feed keeps pushing the same type of content, it’s because your past clicks trained it to expect more of the same.
This also explains the stories you see in the news — AI models that behave badly. The problem usually isn’t the algorithm; it’s the data. A hiring tool that favors one group learned that bias from past hiring data. A medical AI that misdiagnoses one group was trained mostly on another. You can’t debug AI by looking at the code alone.
In the next mission, “Can We Trust AI?”, you’ll go one step further: now that you know how AI learns, what happens when it’s wrong — or when bad actors use AI to fool you on purpose?
This also explains the stories you see in the news — AI models that behave badly. The problem usually isn’t the algorithm; it’s the data. A hiring tool that favors one group learned that bias from past hiring data. A medical AI that misdiagnoses one group was trained mostly on another. You can’t debug AI by looking at the code alone.
In the next mission, “Can We Trust AI?”, you’ll go one step further: now that you know how AI learns, what happens when it’s wrong — or when bad actors use AI to fool you on purpose?
Check Your Understanding
1. What does AI need to learn?
2. What happens if training data is biased?
3. What is 'accuracy' in AI?
Answer all questions. You need 70% to pass.