Coming Soon
M3OECD AILit — Managing AI, Competency 3
Quality Control for AI
Trust But Verify
AI output can look polished and professional while being subtly wrong. A confident-sounding statistic might be fabricated. A balanced-looking analysis might contain hidden bias. A factual claim might be a hallucination — content the model generated because it was statistically plausible, not because it was true. Building the habit of verification is essential for anyone who uses AI for anything that will be read, shared, or acted on.
This isn’t paranoia. It’s how current AI works. Language models generate output by predicting what word most likely comes next, based on patterns in their training data. They don’t have access to a fact database. They don’t know what they don’t know. When a user asks a question outside the model’s knowledge, the model doesn’t say “I don’t know” — it continues the pattern, and the result often sounds identical to a fact. The failure is silent, confident, and formatted exactly like a correct answer. Quality control is the skill of catching the difference before it reaches anyone else.
This isn’t paranoia. It’s how current AI works. Language models generate output by predicting what word most likely comes next, based on patterns in their training data. They don’t have access to a fact database. They don’t know what they don’t know. When a user asks a question outside the model’s knowledge, the model doesn’t say “I don’t know” — it continues the pattern, and the result often sounds identical to a fact. The failure is silent, confident, and formatted exactly like a correct answer. Quality control is the skill of catching the difference before it reaches anyone else.
Hallucinations, and the Lawyer Who Learned the Hard Way
The most famous cautionary tale in AI history happened in a New York courtroom in 2023. A lawyer named Steven Schwartz, representing a client in a personal injury lawsuit against the airline Avianca, filed a legal brief citing six prior cases that supported his argument. When opposing counsel couldn’t find any of the cases, the judge investigated. Every single one was fake. Schwartz had asked ChatGPT to research case law for him, and ChatGPT had invented plausible-sounding case names, judicial opinions, and legal reasoning — complete with fake citation numbers. Schwartz, who later admitted he hadn’t realized ChatGPT could fabricate, faced sanctions. The case, Mata v. Avianca, became a teaching example in law schools almost immediately.
Schwartz wasn’t careless — he was operating on the old assumption that a research tool returns information rather than generates it. That assumption was the problem. AI language models don’t look things up. They produce statistically likely text. When asked “find me cases where X happened,” a model doesn’t search a database of cases — it composes sentences that look like case law. If real cases exist that match, the model often surfaces them; if not, it makes them up, with no internal signal to the user that it has crossed from retrieval into fabrication.
Hallucination is not a bug that will be fixed in the next model version. It’s a consequence of how current language models generate text. Newer models hallucinate less for well-known topics and hedge uncertainty more carefully — but they still fabricate under pressure, especially for specific facts, citations, quotes, statistics, and anything recent. The users who get burned are the ones who assumed “smarter model” meant “more reliable.” The ones who don’t get burned have built a verification habit into their workflow and keep it there. This mission is about building that habit.
Schwartz wasn’t careless — he was operating on the old assumption that a research tool returns information rather than generates it. That assumption was the problem. AI language models don’t look things up. They produce statistically likely text. When asked “find me cases where X happened,” a model doesn’t search a database of cases — it composes sentences that look like case law. If real cases exist that match, the model often surfaces them; if not, it makes them up, with no internal signal to the user that it has crossed from retrieval into fabrication.
Hallucination is not a bug that will be fixed in the next model version. It’s a consequence of how current language models generate text. Newer models hallucinate less for well-known topics and hedge uncertainty more carefully — but they still fabricate under pressure, especially for specific facts, citations, quotes, statistics, and anything recent. The users who get burned are the ones who assumed “smarter model” meant “more reliable.” The ones who don’t get burned have built a verification habit into their workflow and keep it there. This mission is about building that habit.
Sort by Verification Level
Drag each AI output into the bucket that matches how much verification it needs.
Items (8)
Always verify
Drag items here
Usually fine
Drag items here
Safe to accept
Drag items here
QA Challenge
This AI-generated article contains 4 hidden errors. Click on the parts you think are wrong.
Click on the parts of this article that contain errors (4 errors hidden)
AI in Education: A Complete Overview. Artificial intelligence is transforming education worldwide. According to a 2024 UNESCO report, 94% of teachers globally now use AI daily in their classrooms. Studies show that AI-powered tutoring can improve student performance, and these tools are particularly effective in STEM subjects. However, AI in education works equally well for all students regardless of their socioeconomic background or access to technology. The technology behind educational AI primarily uses deep learning, which was invented by Google in 2015. These systems analyze student performance data to personalize learning paths. Since AI tutors are always accurate and never make mistakes, they can fully replace human teachers in most subjects. The future of education will likely involve a blend of AI and human instruction.
Found: 0/4
Compare
Toggle between the two versions.
“Prompt engineering emerged as a formal discipline in 2019, when Stanford researcher Dr. Emily Watson published the seminal paper ‘Linguistic Scaffolding for Transformer Models.’ Cited over 50,000 times, it established the five core principles still used today. By 2022, 87% of Fortune 500 companies had at least one dedicated prompt engineer on staff, with average salaries reaching $375,000 according to a McKinsey survey. Prompt engineering is now recognized as the fifth core discipline of computer science.” — Problem: almost none of this is verifiable. “Dr. Emily Watson,” the paper title, the 50,000 citations, the 87% stat, the $375K figure, the “fifth core discipline” claim — all plausible-sounding AI fabrications. If you published this, readers would trust it because it reads like fact.
The Five Failure Patterns and How to Catch Each One
AI failures aren’t random. They cluster into five recurring patterns. Once you know the shapes, you can spot them in your own output fast — and you’ll stop being surprised by the specific ways AI is wrong.
1. Fabricated citations. The signature failure. AI invents sources: paper titles, book authors, court cases, URLs, quotes attributed to real people. They sound real because they’re built from real-sounding patterns. How to catch: before trusting any specific citation, search for it. If the paper doesn’t appear in Google Scholar, the book isn’t in a library catalog, or the URL returns a 404, it doesn’t exist. One-minute check, catches most of the damage.
2. Overly specific statistics. AI loves numbers, because numbers feel authoritative. It will happily write “67% of enterprise teams saw a 3.2x productivity gain within 14 months” about topics where no such study exists. How to catch: when you see a specific stat, ask the model “what’s the source for that percentage?” Then open the source. If it can’t produce one, or the source doesn’t actually say what the AI claimed, strike the number. In verified writing, replace suspicious stats with directional language (“many teams report gains”) unless you can cite a real source.
3. Hidden bias. AI output can look balanced while quietly centering one perspective. A tech-industry article that discusses “how AI helps workers” without mentioning job loss. A historical summary that frames one side’s narrative as neutral. How to catch: after reading AI output, ask yourself what’s missing. Whose perspective is absent? Whose interests aren’t represented? If the piece feels uniformly one-note, that’s often the signal.
4. Logical slippage. AI can produce paragraphs where sentence 1 and sentence 3 contradict each other, or where a conclusion doesn’t follow from the claims before it. The fluency of the writing hides the gap. How to catch: read the output aloud, or have someone else read it. Sentences that flow in prose often reveal logical jumps when spoken. Another trick: ask the model to summarize its own conclusion, then check the summary against the argument that led there.
5. Confident wrong facts on well-known topics. The most dangerous kind, because you might know enough to sense something’s off but not enough to catch the error. Dates get shifted by a year. Quotes get attributed to the wrong person. Technical terms get swapped. How to catch: for any topic where a wrong fact would be embarrassing, cross-reference a second source before publishing. Wikipedia is often enough for dates and basic facts; for anything technical or specialized, go to primary sources.
The meta-rule that covers all five: AI output is a claim, not a source. Every factual claim worth publishing deserves a 30-second verification. That’s the whole skill.
1. Fabricated citations. The signature failure. AI invents sources: paper titles, book authors, court cases, URLs, quotes attributed to real people. They sound real because they’re built from real-sounding patterns. How to catch: before trusting any specific citation, search for it. If the paper doesn’t appear in Google Scholar, the book isn’t in a library catalog, or the URL returns a 404, it doesn’t exist. One-minute check, catches most of the damage.
2. Overly specific statistics. AI loves numbers, because numbers feel authoritative. It will happily write “67% of enterprise teams saw a 3.2x productivity gain within 14 months” about topics where no such study exists. How to catch: when you see a specific stat, ask the model “what’s the source for that percentage?” Then open the source. If it can’t produce one, or the source doesn’t actually say what the AI claimed, strike the number. In verified writing, replace suspicious stats with directional language (“many teams report gains”) unless you can cite a real source.
3. Hidden bias. AI output can look balanced while quietly centering one perspective. A tech-industry article that discusses “how AI helps workers” without mentioning job loss. A historical summary that frames one side’s narrative as neutral. How to catch: after reading AI output, ask yourself what’s missing. Whose perspective is absent? Whose interests aren’t represented? If the piece feels uniformly one-note, that’s often the signal.
4. Logical slippage. AI can produce paragraphs where sentence 1 and sentence 3 contradict each other, or where a conclusion doesn’t follow from the claims before it. The fluency of the writing hides the gap. How to catch: read the output aloud, or have someone else read it. Sentences that flow in prose often reveal logical jumps when spoken. Another trick: ask the model to summarize its own conclusion, then check the summary against the argument that led there.
5. Confident wrong facts on well-known topics. The most dangerous kind, because you might know enough to sense something’s off but not enough to catch the error. Dates get shifted by a year. Quotes get attributed to the wrong person. Technical terms get swapped. How to catch: for any topic where a wrong fact would be embarrassing, cross-reference a second source before publishing. Wikipedia is often enough for dates and basic facts; for anything technical or specialized, go to primary sources.
The meta-rule that covers all five: AI output is a claim, not a source. Every factual claim worth publishing deserves a 30-second verification. That’s the whole skill.
Your Reputation Is on the Line
When you publish or share AI-generated content, it carries your name. An AI hallucination becomes your mistake, not the AI’s. The Mata v. Avianca lawyer found out the hard way; so have doctors, journalists, students, and executives since. The defense “ChatGPT told me” doesn’t hold up in court, in peer review, or in professional judgment. Build the habit now: every factual claim you ship with your name on it gets a verification pass. It costs 30 seconds and saves careers.
From Catching Errors to Designing AI That Fails Better
Quality control catches AI failures after they happen. The next, deeper skill is designing AI workflows so failures happen less, and matter less when they do. That’s the shift from being an AI user to being an AI steward.
What does “designing AI” mean for someone who isn’t a machine learning engineer? It means the choices ordinary professionals make when they introduce AI into their work. Which data gets fed to the model. Which outputs get human review. Which failure modes the team agrees to watch for. How success is measured beyond speed. What happens when the AI gets something wrong — does someone notice? Does the team learn from it? Does the workflow change?
These choices don’t require PhD-level technical skill. They require the same judgment you’ve been building across Managing AI: knowing what AI can and can’t do, recognizing augmentation vs replacement, spotting quality problems before they ship. Applied at the team or workflow level, that judgment is AI design.
The next track, Designing with AI, covers this final layer. You’ll learn how data shapes outcomes (D1), how rule-based and learning systems differ in practice (D2), and how AI can be deployed for community benefit instead of just corporate productivity (D3). The tools don’t get bigger from here — your responsibility does. Whether AI ends up helping the people around you, or just speeding up the ones it was already going to help, comes down to how people like you decide to deploy it.
What does “designing AI” mean for someone who isn’t a machine learning engineer? It means the choices ordinary professionals make when they introduce AI into their work. Which data gets fed to the model. Which outputs get human review. Which failure modes the team agrees to watch for. How success is measured beyond speed. What happens when the AI gets something wrong — does someone notice? Does the team learn from it? Does the workflow change?
These choices don’t require PhD-level technical skill. They require the same judgment you’ve been building across Managing AI: knowing what AI can and can’t do, recognizing augmentation vs replacement, spotting quality problems before they ship. Applied at the team or workflow level, that judgment is AI design.
The next track, Designing with AI, covers this final layer. You’ll learn how data shapes outcomes (D1), how rule-based and learning systems differ in practice (D2), and how AI can be deployed for community benefit instead of just corporate productivity (D3). The tools don’t get bigger from here — your responsibility does. Whether AI ends up helping the people around you, or just speeding up the ones it was already going to help, comes down to how people like you decide to deploy it.
Check Your Understanding
1. What is an AI hallucination?
2. What’s the best way to fact-check AI output?
3. What is a 'guardrail' in AI quality control?
4. Who is responsible when published AI-generated content contains errors?
Answer all questions. You need 70% to pass.