Crediting maintenance work in performance reviews

Most engineering rubrics measure what was delivered, not what happened in the background for the systems to keep working.

May 05, 2026

👋 Hey, it’s Stephane. I help engineers become great engineering managers - whether you want to become one or are already leading a team.

To accelerate your growth see:

Paid subscribers get 50 Notion Templates, The EM’s Field Guide, and access to the complete archive.

Most engineering rubrics measure what was delivered, not what happened in the background for the systems to keep working – here is what to add to yours so the engineers keeping the lights on get the credit they deserve.

You have probably had this conversation. Calibration season, two engineers at the same level. One built a new service that landed in front of customers and got named in the all-hands. The other spent the cycle on a database migration that prevented an outage nobody ever saw, plus the mentoring and documentation that kept a junior team functional while the tech lead was on parental leave.

The first engineer gets promoted. The second gets “great quarter, we need to see more impact next half”. You know deep in your heart that the second engineer is the one you cannot afford to lose. But the first one gets promoted anyway.

This is not a hypothetical. It is the dynamic Will Larson named when he wrote about promotion pathologies: “high impact but didn’t seem hard enough” is the phrase that slows careers. I am going to argue something narrower than the usual “managers should value glue work more”.

Your rubric measures what is easy to measure, and the easy things skew toward novelty.

I urge you to look in two places - the line items in your rubric, and the language you use in calibration. This piece is about both.

Why the current rubric punishes the engineers you most want to keep

Tanya Reilly’s original “Being Glue” talk is still one of the best framings of the IC side: there is a category of work that keeps teams functional, that is genuinely hard, and that does not show up in promo packets. The leader-facing version of that question is different. How do you fix the system so glue work and maintenance work stop being career-limiting?

Start by being honest about why the rubric tilts the way it does. When a calibration committee asks “what was the impact?”, three things have to be true for an answer to land. The work has to be

visible
attributable to a person
and measurable against something

New feature work meets all three. A new payments service is visible, the staff engineer who led it is the obvious owner, and revenue moved. Easy.

Maintenance work fails at least one of those almost every time. A prevented incident is invisible by definition. A migration that went smoothly was a team effort with no clean owner. Documentation that raised onboarding speed shows up as a vague “engineers ramp faster now” with no number attached. Chelsea Troy puts it bluntly in her piece on technical debt: we rarely reward, recognize, or teach code stewardship the way we do feature development skills.

Typical rubrics define impact in a way that maintenance cannot easily prove.

There is a real counter-argument here. Sean Goedecke has argued that companies do not reward glue work on purpose, because they want their best people shipping projects rather than improving general efficiency. He is partly right. Some of the system is rational. But his conclusion that an individual should mostly do glue work tactically in service of features they own assumes the rubric is correctly tuned. If the rubric undervalues maintenance, “doing what the rubric rewards” is not the same as “doing what the business needs”. Companies did not choose this trade-off deliberately. They inherited it from outdated rubrics.

Three additions that make maintenance work legible

You do not need to throw out your rubric. You need to add to it. Three additions, kept deliberately concrete so a calibration committee can actually use them.

Counterfactual impact

Most rubrics measure what shipped. Add a line that asks what did not happen because of this engineer’s work. A prevented incident, an avoided regression, a migration that did not go sideways. The phrase to put in your rubric:

This engineer’s contribution measurably reduced a class of operational risk. Describe the risk and what would likely have happened without them.

Larson makes a similar point about Uber’s project-counting metric, which deliberately treats migrations and tech-debt removal as projects on equal footing with product work. That is the kind of definitional work most rubrics never do.

Maintenance load reduction

Troy’s reframing of tech debt as “maintenance load” is useful here, because maintenance load is something a team can describe even when it cannot be cleanly measured. Did this engineer reduce the time it takes a new joiner to ship their first PR? Did they remove a class of recurring bug? Did they retire an internal tool nobody else was willing to own? Your tech leads know the answers. The rubric line:

This engineer reduced ongoing maintenance burden in a way other engineers can point to.

If the manager cannot describe the burden, the reduction, and how it was measured, that gap is itself a signal worth chasing.

Stewardship of the system

Some of the most valuable work an engineer does is keeping a codebase legible to the people who will inherit it. Architectural decisions that survive the test of time. A code review culture that catches things before production. The SPACE framework, one of the few peer-reviewed productivity frameworks that explicitly carves out invisible work as a category, calls this kind of contribution out directly. Your rubric should too. Try:

This engineer made the system easier for others to work in. Describe the change, who benefits, and how you would notice if they left.

If nobody would notice their absence, the work was not stewardship.

These three additions are written as prompts to the manager, not as metrics. That is deliberate. Charity Majors has a fair warning about turning maintenance work into a numerical target: you will get gaming and a slow erosion of the trust that made the work worth doing in the first place. The point is not to score maintenance on a five-point scale. The point is to make the manager write about it with the same specificity the rubric already demands for features.

What this sounds like in calibration

The rubric is half the work. The other half is the conversation in the room. Most calibration discussions default back to “what shipped” because it is the easiest thing to compare across packets. If you want maintenance work to land, you have to give the room a different question to ask.

A few specific moves that have worked when I have run, or sat in on, calibration meetings:

Ask the prevented-incident question explicitly. Before the committee gets into ranking, go round the room: “What is one thing on your team that did not break this cycle, and which engineer is most responsible for that?” The answers will be slow at first. The point is to surface the names of the people doing the work the rubric does not see, before the comparative ranking begins.

Name the counterfactual when you advocate. When a manager pitches an engineer whose work was preventive, translate that into the same currency as feature work. “If Priya hadn’t rebuilt the deployment pipeline this quarter, we would have had three or four production incidents like the one in Q1, each taking out a senior engineer for a week”. That sentence is doing the work the rubric should do.

Block the “but it wasn’t that hard” objection. This is the specific pathology Larson named. It shows up in calibration as a skeptical “yes, it had impact, but the work itself wasn’t that complex”. When you hear this, the question to ask back is simple: “Then why did nobody do it for the previous two years?” Most of the time, the answer is that the work required navigating ambiguity, building consensus, or absorbing political risk that the rubric also does not credit.

Be honest about distributional effects. Research on tasks with Low Promotability found that women accept these requests at substantially higher rates than men, and receive more of them in the first place. That is worth keeping in mind during calibrations, particularly when you notice a pattern in the org’s promotion history.

What to do before your next review cycle

You probably cannot rewrite your company’s rubric this quarter. You can do three smaller things though.

Look at your last calibration cycle and ask which engineer on your team did the most maintenance work that was not credited. Then write the case for that engineer the way you would write a feature pitch: counterfactual, maintenance load reduction, stewardship. If you cannot do it, you have either misjudged the engineer or you do not know enough about the work yet. Both are useful information.

Add the three rubric prompts above to your own performance review template, even if your company has not adopted them. You can run a parallel rubric for your team without asking permission. When the official rubric and your version disagree, the disagreement itself is the conversation worth having with your manager and skip-level.

If you manage other managers, sponsor one piece of maintenance work next cycle the way you would sponsor a flagship project. Name the owner, articulate the counterfactual, write the comms, get it on the all-hands deck. The signal that maintenance work is promotable comes from what gets celebrated.

Your calibration committee is not going to fix this on its own. It is waiting for someone in the room to say, clearly, that the way you are measuring impact is missing the engineers you most need to keep. That is your job. The rubric is a draft. You can suggest edits to it.

If you enjoyed this article, consider subscribing to get:

✉️ Free: 1 original post every Tuesday, my favourite posts of the week every Sunday + 10 Notion Templates for Engineering Managers
🔒 Paid: Full archive + 50+ EM templates & playbooks + The EM Field Guide