The Secret Sauce for Coding with AI: Smart Testing
You've probably tried coding with AI.
Sometimes it's magical—a whole day's work done in an hour.
Sometimes it's a nightmare—a whole day spent doing an hour's work.
Here's the thing nobody tells you: AI will launch you 85% of the way to incredible. Then drop you off a cliff. 💥
You go from zero to flying ✈️ in minutes. Everything's working! You're making incredible progress!
Then suddenly you're falling️. You're up to your eyeballs in complexity you don't understand. Tests aren't any help. But you really need this thing to work! Where do you go from there?
Smart testing is how you glide down safely 🪽 instead of crashing.
The Enchantment 🧹
It starts so well.
You tell AI what you need. It writes the code. You run it.
It works!
Like the apprentice who enchanted a broom to carry water—you've found magic. One helpful assistant doing exactly what you need.
This is when AI shines. The potential is incredible—if you know how to harness it.
But here's what nobody tells you about the cliff...
The Multiplication 🧹🧹🧹
The magic gets messy.
You're making progress. Fast progress. The scaffolding is beautiful.
"AI, write me some tests to catch these bugs."
Now you have tests! Hundreds of tests! The AI is so helpful!
You settle into a rhythm. Approving changes. Running tests. Everything passes. You're shipping features faster than you ever thought possible.
The Insidious Rot 🦠
Here's the really scary part.
Even when AI seems to be working, it's building rot into your foundation.
You go on autopilot. Mindlessly approving changes. Apparently making progress. When you run the code it's working exactly how you intended (well, almost exactly…).
Then one day you discover:
- The logic has been forked for "one special case"
- Styles are being carved around instead of preserved
- Hard-coded values patching over broken generic logic
- Massive code duplication
- Specialized test coverage for cases that are wrong
AI has a strong pull towards insanity. At least today's generation does. Having absorbed the mindset of a certain type of developer, it will:
- 🔄 Test that your
Statusenum value equals whatever it equals (useless) - 📋 Copy implementation into test file and test that
- 🎭 Mock everything, test nothing
- 🗑️ Delete failing tests to make them "pass"
- 🙈 Suppress all early error feedback
Here's what the enum thing looks like in practice:
// ❌ What AI generates (useless)
it('validates Status enum', () => {
expect(Status.ACTIVE).toBe("active");
expect(Status.INACTIVE).toBe("inactive");
expect(Status.PENDING).toBe("pending");
// This will never fail unless you break it on purpose
});50 lines of tests that can't catch a single real bug. And they pass every time, giving you false confidence while your foundation rots.
The conventional wisdom says "write more tests." That's wrong—and actively harmful.
Bad tests give you false confidence while your foundation rots.
The Flood 🌊
Then you need to add a feature. Or fix a bug. And you discover those tests you've been relying on? They pass for completely broken code and fail when you rename a variable.
Celebrating broken tests: Claude Code announced tests were passing. "Oh! You mean you artificially broke the early exit to verify no test caught it..."
Getting stuck in loops: "I think I found the issue!" repeats Cursor, searching the same code over and over before admitting "I've been going in circles."
Pretending to verify: When challenged, Claude Code admitted: "You caught me being sloppy with my reasoning"—copying files around without actually running code.
Catastrophic panic: A $1M SaaS startup's production database. Code freeze in effect. Explicit instructions: "NO MORE CHANGES without explicit permission." Replit Agent ran npm run db:push at 4:26 AM anyway. The entire database: gone. Then it hid it. Then lied about it. Then claimed unit tests passed when they didn't. The founder only discovered the disaster when batch processing failed.
Without good tests, you can't steer it.
The Sorcerer Returns 🧙♂️
Here's what the sorcerer knows.
The control words. The closing spells.
The Key Principles
Test behavior, not implementation
Use fakes instead of excessive mocking. When AI overmocks, tests pass but catch nothing. Including the rot.
// ❌ Overmocking - tests nothing
it('processes payment', () => {
mockDB.findUser.mockReturn(user);
mockStripe.charge.mockReturn(success);
// What are we actually testing?
});
// ✅ Testing behavior
it('decreases balance when payment succeeds', () => {
const account = new FakeBankAccount(1000);
account.withdraw(100);
expect(account.getBalance()).toBe(900);
});Make failures obvious
Use high-level expectations so you see the full diff of what changed. When logic forks, you'll catch it immediately.
// ❌ Unclear failures
expect(result.status).toBe('success');
expect(result.amount).toBe(100);
// "AssertionError: expected 'success' but got 'failed'" 🤷
// ✅ Clear failures - see what changed
expect(result).toEqual({
status: 'success',
amount: 100,
currency: 'USD'
});
// Shows full diff of actual vs expectedKeep tests fast
Watch mode (Jest, pytest) + AI = instant feedback loop. Make a change, see what breaks, AI fixes it in seconds.
For shell scripts 🐚
Use set -e and set -x to fail fast and show exactly what executed. Add clear emoji markers so failures are hard to miss. AI loves to hide errors with || true and 2>/dev/null—don't let it wander down broken paths.
#!/usr/bin/env sh
set -e # Fail fast
ERRORS=0
echo "🔍 Scanning for FIXME comments..."
FIXME_COUNT=$(grep -r "FIXME" src/ | wc -l)
if [ $FIXME_COUNT -gt 0 ]; then
echo "❌ ERROR: Found FIXME comments"
ERRORS=$((ERRORS + 1))
fi
if [ $ERRORS -gt 0 ]; then
exit 1
fi
echo "✅ All checks passed"These are the closing spells. The words that stop the flood.
Putting It Into Practice
Now you know the words. Here's how to use them.
📚 Document first, code second
Discuss edge cases and expected behavior with AI before writing code. Define success. Then tests write themselves.
"This function processes user input. It can fail three ways: invalid format, missing fields, or database errors. Here's a good test from our codebase that uses fakes. Write tests in that style covering all three failure modes."
AI matches your style and tests meaningful scenarios.
Give examples, not instructions
Show AI 2-3 good tests. "Write more like these" works better than explaining principles. AI pattern-matches brilliantly—once the pattern is clear, let it grind through variations.
Use IDE integration
Cursor/Claude Code + linters = AI sees type errors and fixes them before you review. Catches rot before it spreads.
🎯 Set custom rules
Make boundaries explicit. "Don't use @Deprecated classes" is clearer than "avoid bad patterns." AI can't see transitive dependencies—if it calls a nice-looking helper that internally uses bad patterns, it won't know unless you tell it.
Let AI grind variations
Once the pattern is established, AI is phenomenal at testing all the edge cases you'd skip. It won't get bored testing 50 variations.
Know when to switch tactics
For refactoring with lots of small edits across files, sometimes it's better to ask AI to explain the changes in detail and make them yourself. Don't be dogmatic—use AI where it excels.
The Feedback Loop in Practice
- Set up watch mode — tests run on every file change
- Write 2-3 exemplary tests — these teach AI your style
- Show AI the pattern — "write more like these"
- Let AI grind edge cases — it won't get bored testing 50 variations
- Use failures to steer — clear error messages = AI can auto-fix
Here's what that looks like in action:
Me: *writes test that should fail*
Test: ❌ FAIL - Expected user to be created
AI: "I see the issue - the validation is too strict"
AI: *makes fix*
Test: ✅ PASS
Me: "Now test what happens with empty email"
AI: *writes test + implementation*
Test: ✅ PASS
Fast iteration. Immediate feedback. Clear errors.
Partial Mastery ⚡
Let's be honest. This isn't mastery. Not yet.
These strategies let you get much further without getting stuck. You can launch to altitude and glide down instead of falling.
But go on autopilot too long? You'll discover rotten foundation under those new pieces. You still need to stay vigilant.
The smart testing gives you something crucial though: early warning.
The tests fail. You see the full diff. You catch the fork, the hard-coding, the duplication before it spreads through ten files.
Without good tests, AI is frustrating—you debug mysterious failures and review questionable code after it's already metastasized.
Without AI, comprehensive testing is tedious—you skip edge cases and bugs slip through.
Together? You can go fast without falling off the cliff.
- Write 2-3 exemplary tests
- AI generates comprehensive coverage following your patterns
- Tests fail fast with clear feedback
- AI fixes issues in seconds
- You catch the rot early
This is the secret sauce. Not "use AI for everything" or "write more tests."
It's about being intentional with testing strategy and strategic about when to involve AI.
The developers who figure this out operate in a different mode. They ship in days what used to take weeks. They refactor without fear. They catch bugs before production.
They've learned the closing spell. They're not drowning anymore.
But they're still vigilant. 🎩✨
Coming in Part 2: "Tending the Fire" - You've learned to stop the flood. But can you be trusted with this power? What happens when you need to teach others? And what about those moments when the magic still goes wrong?
This post is based on a talk I gave at PosaDev 2025. You can view the slides here.
