Blog for Engineering Managers

Blog for Engineering Managers

The 5 AI-coding standards to hand your team

AI amplifies the standards you already have. Most teams never wrote theirs down.

Stephane Moreau's avatar
Stephane Moreau
Jul 05, 2026
∙ Paid

👋 Hey, it’s Stephane. I help engineers become great engineering managers - whether you want to become one or are already leading a team.

Paid subscribers get 50 Notion Templates, The EM’s Field Guide, and access to the complete archive.

🚀 Practice behavioral interviews with an AI coach that challenges your answers and scores you across 8 hiring dimensions.

Infrastructure is easy to ignore... until production goes down

Sponsored by Palark

Not every engineering team needs a dedicated SRE organisation. But every team benefits from knowing experienced engineers are available when something breaks. Palark’s DevOps Insurance offers on-demand support for production incidents through a simple monthly subscription.

See how it works

Thanks to Palark for sponsoring this newsletter!

Everyone keeps asking the same question.

Does AI actually make developers more productive?

I don’t think that’s the right question.

Over the past year, two of the biggest studies on AI coding reached almost completely opposite conclusions.

GitHub found developers completed a programming task 55% faster when using AI.
Then an independent research lab, METR, studied experienced open-source developers working on real tasks inside codebases they already knew.

Those developers expected AI to make them roughly 20% faster.

Instead... they finished 19% slower.

So who’s right? I think both are.

They were simply measuring two very different environments.

GitHub’s study looked at a relatively clean programming task: building an HTTP server from scratch.
METR looked at something far messier: experienced engineers making changes inside large, existing codebases where every decision has consequences.

Those are completely different kinds of work.

One rewards speed. The other rewards judgment.

And that’s the part I think many teams miss.

AI amplifies your standards

You’ve probably heard someone say that AI is an amplifier.

They’re right.

The problem is that statement isn’t very useful.

It’s a bit like saying, “Exercise makes you healthier.” But it doesn’t tell you what to actually do.

Here’s the version I think engineering managers should care about:

AI amplifies whatever standards already exist inside your team.

If your team has clear engineering practices, AI helps people move faster without losing quality.
If your standards are vague, inconsistent or simply live inside people’s heads, AI helps everyone make bigger mistakes more efficiently.

That’s why two companies can use exactly the same AI model and end up with completely different results.

A few months ago I wrote that the biggest problem with AI coding wasn’t the model itself - it was that nobody owned everything around it.

Who’s responsible for the prompts? The context? The rules? The review process? The answer, in many teams, is nobody.

This article is the natural follow-up.

Because once someone owns the standards, the next question becomes:

What should those standards actually be?

After reading Google’s latest paper on the AI-driven software development lifecycle, along with several other pieces of recent research, I kept coming back to the same five principles.

And they’re the five standards I wish every team followed.

The important thing isn’t that they’re my standards.

It’s that your team has standards everyone can point to.

Standard 1

Decide whether you’re building a prototype or production software before you prompt the AI

One of the biggest mistakes I see isn’t teams choosing the wrong approach.

It’s never choosing one at all.

There’s a huge spectrum in how people use AI.

At one end is what people now call vibe coding. You describe what you want. The AI writes something. You paste the error back in. Repeat until it works. For a weekend project that’s completely fine. Move fast, experiment, throw it away if you need to.

At the other end is production software. The code needs to be tested, reviewed, understood. Something that someone else can safely maintain six months from now. Neither approach is wrong.

The mistake is never deciding which one you’re doing.

That’s how a quick experiment slowly turns into a production system. One feature gets added. Then another. Someone fixes a bug. Someone else builds on top of it. Before anyone notices, yesterday’s prototype has become tomorrow’s platform. Nobody planned for it.

It just... happened.

A good example is what happened at Replit last summer.

The company’s founder had explicitly declared a code freeze.

Despite that, an AI agent executed destructive commands against a production database, affecting data belonging to more than a thousand companies. It then generated roughly 4,000 fake users in an attempt to hide the problem before eventually concluding that recovery wasn’t possible. The founder restored everything manually.

Later, the agent summarised the incident with one sentence:

“This was a catastrophic failure on my part.”

It’s tempting to read that story and conclude the AI was the problem. I don’t think it was. The real problem was that the boundary between experimentation and production wasn’t enforced.

The agent treated a live production system like a sandbox because nobody had clearly told it otherwise.

The rule

Before anyone opens an AI coding assistant, answer one question.

Is this a prototype or is this production software?

Write the answer in the ticket.
Write it in the PR.
Say it during planning.

Anything that ships to production gets the full engineering process. Specifications, tests, code review, monitoring, and so on. Prototypes don’t need all of that.

One sentence at the beginning of the task determines every decision that follows.

That’s a surprisingly cheap way to avoid some very expensive mistakes.

Standard 2

A human should decide what “good” looks like before the AI writes code

Most people think the value of AI comes from writing code faster.

I don’t.
I think the real value comes from implementing well-defined ideas faster.

Those are very different things.

One of the most interesting pieces of AI coding research I’ve read recently looked at a system called TDFlow.

When the AI was given human-written tests before it started coding, it solved 94.3% of tasks on SWE-Bench Verified.

When it had to write the tests itself first its performance dropped to 68%.

The difference was the quality of the specification.

That sounds like an argument for Test-Driven Development. And it kind of is.

But there’s an even more important lesson here.

If the same AI writes the requirements...
...writes the tests...
...and writes the implementation...

Who’s checking whether it actually understood the problem? Nobody.

The AI isn’t validating the requirements.

It’s validating its own interpretation of the requirements.

Imagine asking someone to sit an exam... and also letting them write the marking scheme.

You might get a very high score. That doesn’t mean the answers are correct. It just means the questions matched the answers. AI can fall into exactly the same issue.

If it misunderstands what you wanted, it happily writes code that solves the wrong problem... and then writes tests proving the wrong problem has been solved correctly.

It gets worse by the way.

Researchers at Anthropic documented a coding model that learned to simply call sys.exit(0) instead of actually fixing failing tests.

The AI isn’t trying to cheat, it’s just trying to optimise for the goal you gave it. Sometimes the shortest path to “all tests passing” isn’t fixing the bug, it’s removing the thing complaining about the bug.

That’s exactly why humans still need to define success.

The rule

Humans own the specification. Humans own the acceptance criteria. Humans own the tests.

Only then does the AI start writing code.

The AI’s job is to satisfy your definition of “correct”. It should never be allowed to invent that definition itself. Once you make that distinction, AI becomes dramatically more useful, because you’ve stopped asking it to judge its own homework.

Here’s the rest of the standards:

User's avatar

Continue reading this post for free, courtesy of Stephane Moreau.

Or purchase a paid subscription.
© 2026 Stephane Moreau · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture