This metrics tool terrifies bad developers

Start free trial
SitePoint Premium
Stay Relevant and Grow Your Career in Tech
  • Premium Results
  • Publish articles on SitePoint
  • Daily curated jobs
  • Learning Paths
  • Discounts to dev tools
Start Free Trial

7 Day Free Trial. Cancel Anytime.

August 2025: Ran a client photo through my background removal tool. The edge looked chewed. Like someone attacked it with a chainsaw set to "drunk."

Tried three more photos. Same weird artifacts.

Went back to a February project. Re-uploaded the exact same photo to the exact same tool. Worse results in August than in February.

Same photo. Same tool. Six months apart. Measurably worse output.

This is model collapse – AI systems training on synthetic data generated by other AI systems. It's not theoretical. It's degrading the tools you use right now.

What Model Collapse Actually Is

Model collapse occurs when machine learning models train on datasets containing significant amounts of AI-generated content. Each training iteration compounds small errors and biases, leading to progressive quality degradation.

Think of it like photocopying a document, then photocopying that photocopy, then copying the copy. By the tenth generation, the text is barely readable and you've introduced artifacts that were never in the original.

That's what's happening with your design tools. Except instead of photocopiers, it's machine learning models training on their own output. (Your tools are around generation five. Maybe six.)

Research published in Nature showed that when AI models train on synthetic data, output quality degrades within five training cycles. By generation 30, handwritten digits converged into a single blurry shape. Quality drops, diversity disappears.

The problem is mathematical: AI systems optimize toward patterns in their training data. When that data increasingly consists of other AI outputs – which themselves optimized toward degraded patterns – the feedback loop accelerates degradation.

And the internet, where most AI training data comes from, is now estimated to be 50-60% AI-generated content. (Turns out letting AI flood the internet with content has consequences. Who knew.)

Your background remover? Probably retrained on images that included AI-generated backgrounds or AI-processed edges. So the new version learned from AI output, not from human-judged quality standards.

Testing for Degradation in Production Tools

I proved this to myself the paranoid way: saved 10 samples from old projects with their AI outputs and dates. Ran those same inputs through current tool versions three months later. Compared.

Worse edge quality. More artifacts. Less consistency. More manual correction time.

If your same inputs give worse outputs six months later, your tools are degrading. Test it yourself. (Mine did. Yours probably will too.)

Where Degradation Shows Up in Design Workflows

Background Removal Tools

Background removal APIs that used to handle complex scenarios reliably now produce inconsistent results. Hair edges that required minimal cleanup six months ago now need significant manual work.

The algorithm forgot scenarios it used to handle fine. That's not a bug. That's training data degrading because it includes AI-processed edges that were already slightly wrong.

AI Image Generators

Image generation tools trained on datasets increasingly contaminated with AI art exhibit specific degradation patterns:

  • Hands have incorrect finger counts or anatomical issues
  • Facial proportions drift toward generic "AI look"
  • Background details become increasingly abstract
  • Color relationships feel slightly off

Generated an illustration for a client deck. Looked fine at first glance. But something felt wrong. The proportions weren't quite right. The color relationships were slightly weird.

Client rejected it. Had to commission a human illustrator. Three-day turnaround became eight days. Missed their board meeting.

That "wrong but can't explain why" feeling? That's model collapse creating subtle degradation you can't articulate but clients definitely notice.

Researchers at Rice University found that when image generation models train on their own output, glitches and artifacts accumulate. Eventually: distorted images, mangled fingers, wrinkled patterns.

Text and Copy Tools

AI writing assistants show homogenization: outputs sound similar regardless of tone specifications. Models training on increasingly uniform synthetic text.

I used an AI writing assistant for first-draft captions. It used to give varied suggestions with different tones. Now everything sounds samey. That's AI training on AI writing – it loses the edges that make writing interesting.

If all your AI drafts sound like they came from the same person, that's not you imagining things.

Why Tool Makers Can't Easily Fix This

The fundamental problem: AI companies need massive training datasets. The internet was that dataset. But now the internet is majority synthetic content.

Training AI on today's internet is like learning to cook from recipes written by people who learned from AI recipes. Eventually everyone's making the same thing wrong.

They can try to filter out AI-generated content, but good luck with that:

Detection is unreliable: AI detection tools have false positive rates of 15-20%. Try filtering billions of images with a tool that's wrong one in five times. (This is why your spam folder catches your actual emails.)

Scale makes curation impossible: Manual review of billions of images isn't feasible. Unless you have an army of interns with perfect judgment, infinite patience, and zero bathroom breaks.

Hybrid content is everywhere: Human layout + AI illustration + human edits = what, exactly? The "real human content" ship has sailed.

Economic incentives favor volume: More training data = "better" models. The fact that it causes collapse later is a problem for Future Quarter. (Future Quarter's problem list is getting long.)

By 2026, researchers estimate over 90% of online content will be AI-generated or AI-influenced. AI companies are training their models on an internet made of their own exhaust fumes.

My background removal tool degraded in five months. Next year, that'll be three months. Then one. This accelerates.

Building AI-Resilient Design Systems (Or: How To Stop Trusting AI By Default)

Since AI tool degradation is accelerating, here's how to build workflows that remain reliable:

Implement Multi-Stage Quality Checks

I don't trust AI outputs anymore. Ever. My workflow now: AI generation (2 minutes), automated quality check (1 minute), human review of flagged issues (5-10 minutes), manual refinement (10-15 minutes). Total: 18-28 minutes versus the 5 minutes AI-only used to take.

I used to ship background removal same day. Now I tell clients three days. The extra QA time isn't optional – it's the difference between work that gets accepted versus redoing it entirely. (Client redos cost more than the QA time. Learned that one the expensive way.)

Version Pin Your Tools

I learned this one the expensive way. Had a background removal tool that worked perfectly. Auto-updated overnight. Next morning, every edge looked wrong. Spent three hours rolling back versions to find the one that worked.

Now when I find a tool version that works, I freeze it: document version number, save the installer if possible, note date and performance. Test before upgrading. Never auto-upgrade.

November 2024 updates can perform worse than August 2024 versions. This happens more often than tool makers admit.

(Yes, "don't upgrade to the latest version" goes against everything we've been taught. But so does debugging why your tool got worse after an "improvement.")

Maintain Human-Curated Reference Libraries

I keep a folder of design references I know are human: real product screenshots from verified sources, design work with documented human authorship, historical references from pre-2022 (before AI image generators were good enough to contaminate things), direct client work with known provenance.

Search "modern dashboard design" now and you're seeing AI-generated examples trained on AI that was trained on AI. Feedback loops all the way down.

My curated library is my escape hatch. When AI gives me generic output, I compare against known-good references. Usually shows me exactly what's missing.

Document Design Decisions

I started documenting every design choice. Not for anyone else – for myself. When I pick blue (#2E5C8A) for a CTA, I note why: testing showed 15% better conversion than green. Date, context, everything.

Builds a knowledge base that isn't contaminated by AI feedback loops. When AI suggests green buttons, I point to my notes showing blue performed better.

(Feels like overkill until AI confidently recommends something you already proved doesn't work. Then it feels smart.)

Practical Detection Strategies

I started running the same image three times through my background removal tool. If I got three different results, something was broken. Hair and fur edges degrade first – I check them weekly now. Compare against six-month-old outputs. When I'm manually fixing more than 20% of the output, the tool's not working anymore.

For image generation: I generate five variations from identical prompts to see how similar they are. (Too similar means collapse.) Zoom to 200% and check hands – wrong finger counts or joints that bend weird are reliable tells. I track my manual correction time. When it increases, quality's degrading.

For text: if every output uses "delve," "leverage," and "holistic," something's collapsed. I run plagiarism checks to catch regurgitation. Compare five generations for tone. Track how much I'm rewriting. When I'm rewriting everything, I just write it myself instead.

What This Means for Developer Workflows

AI tools are now Schrödinger's assistants: simultaneously helpful and broken until you actually check the output.

If you're building systems that rely on AI APIs:

Don't assume API stability. Test monthly. Performance degrades between versions. Sometimes within versions. One API update and you're shipping garbage.

Build quality monitoring. Log quality metrics. Alert when scores drop. You want early warning before users notice. They will notice. Then they'll complain. Then they'll leave.

Plan for manual fallbacks. AI shouldn't be a single point of failure. Have human review ready. One bad API version can tank your pipeline.

Version your integrations. When an API version works, stick with it. Never auto-upgrade. Newer doesn't mean better.

Educate stakeholders. "AI makes it instant" isn't true anymore. The "5-minute turnaround" they heard about? That's now 20 minutes for shippable output. Budget accordingly.

The Reality Check

Model collapse isn't theoretical. It's affecting production tools now.

The internet is majority synthetic content. Every model retrained on internet data inherits more contamination. This accelerates, not improves.

Test your tools. Build quality checks. Don't trust AI by default.

AI degradation makes human judgment more valuable. Your ability to spot degraded outputs, understand what feels wrong, and fix it based on actual needs – these skills don't collapse.

Next time your background removal tool gives you weird edges, trust that instinct. It's probably model collapse.

Same goes for the illustration that looks "off" or the copy that sounds generic. AI trained on AI produces photocopies of photocopies. By generation ten, you can't read the text.

Your tools are around generation five. Maybe six. They're not getting better.

Tanya DonskaTanya Donska

Tanya Donska fixes the parts of SaaS products people complain about. She runs DNSK.WORK, a London UX/UI design agency for scaling teams who've outgrown duct-tape design. Works with companies like Deutsche Telekom and IQVIA where UX mistakes cost actual money.

© 2000 – 2026 SitePoint Pty. Ltd.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.