D1OECD AILit β€” Designing AI, Competency 1

Data Is the Heart of AI

No Data, No AI

Every AI system learns from data. The quality, quantity, and diversity of that data directly determines how well the AI performs. Mislabeled data causes wrong predictions. Biased data causes biased AI. Understanding data is the first step to designing good AI.

Data Lab

Fix the mislabeled training data, then explore what happens with biased datasets.

Some images are mislabeled! Click to toggle between Cat/Dog labels to fix them.

What Makes Good Training Data

Correct labels β€” Every example must be accurately labeled. One wrong label can mislead the entire model.

Sufficient quantity β€” AI needs enough examples to find patterns. Too few examples = unreliable predictions.

Diverse representation β€” The data must represent all the cases the AI will encounter. If you only train with photos of white cats, the AI won’t recognize black cats.

Clean and consistent β€” Remove duplicates, fix formatting issues, and ensure consistent quality across the dataset.
Bias In = Bias Out
If your training data doesn’t represent the real world, your AI won’t work for everyone. Historical data often contains societal biases β€” AI trained on it will amplify those biases.

Check Your Understanding

1. What happens when training data has wrong labels?
2. What is representation bias?
3. How can you improve AI accuracy?
4. Why is data diversity important?

Answer all questions. You need 70% to pass.