90% of engineers use AI daily. But AI productivity isn't just faster code — it's incidents resolved faster, reviews automated, communication streamlined. This framework measures the full picture and shows where to invest next.
A comic to spark the conversation, a self-test to make it personal, and the full plan to go deep.
Two quantitative metrics collected automatically, plus three qualitative signals driven by engineers themselves.
| # | Type | Metric | What It Shows | Frequency | Collection |
|---|---|---|---|---|---|
| 1 | Quant | Token consumption per engineer | Depth — how much work delegated to AI | Weekly | Automatic (platform telemetry) |
| 2 | Quant | Distinct AI skills used regularly | Breadth — how many work categories covered | Weekly | Logs or lightweight self-report |
| 3 | Qual | Work engineers want to hand to AI | Unmet demand — where pain exists today | Monthly | 1-question survey |
| 4 | Qual | Missing agents engineers need | Opportunity roadmap — where to invest next | Monthly | 1-question survey |
| 5 | Qual | AI agent adoption (4 dimensions) | Maturity — current use, planned use, build needs, learning gaps | Quarterly | 4-question survey |
The full document — background, pain points, metrics, and implementation timeline.
Our engineering team has achieved strong AI tool adoption — over 90% of engineers use Claude Code daily, with supplementary use of GitHub Copilot. Despite this high adoption rate, we lack a structured way to answer a fundamental question: How much productivity has the team gained from AI, and where should we invest next?
This document proposes a measurement framework grounded in real engineering workflows — covering not only code development, but the full spectrum of daily engineering work.
Enable leadership to:
A common assumption is that AI productivity = faster code writing. In practice, writing code represents only a fraction of an engineer's week. The majority of engineering time is consumed by operational and collaborative work that AI code-completion tools do not address.
To measure AI productivity accurately, we must first understand where engineering time actually goes.
Current state: This is where AI adoption is strongest. Engineers use Claude Code and GitHub Copilot for code generation, refactoring, test writing, and scaffolding.
Pain points:
AI opportunity: Largely addressed by current tools. Remaining gains come from deeper delegation — giving AI agents multi-step implementation tasks rather than line-by-line completion.
Current state: Engineers spend significant time on incident triage, diagnosis, mitigation, and postmortem. Much of this is repetitive investigation — pulling logs, checking dashboards, correlating with known issues.
Pain points:
AI opportunity: High impact, largely untapped. AI agents could auto-triage incoming ICMs, pull relevant logs and metrics, suggest mitigation steps based on historical patterns, and draft postmortem summaries. This is where the biggest productivity unlock exists for operations-heavy teams.
Current state: Keeping services healthy is a constant background tax — patching, certificate rotations, monitoring configuration drift, capacity planning.
Pain points:
AI opportunity: Many maintenance tasks are rule-based and repetitive — ideal candidates for AI automation. Agents could handle routine patching workflows, flag configuration drift, and prepare compliance documentation.
Current state: Code review is essential but time-consuming. Reviewers context-switch, read through large diffs, and often catch the same categories of issues repeatedly.
Pain points:
AI opportunity: AI can handle first-pass reviews (style, patterns, security, common bugs), allowing human reviewers to focus on architecture, design, and business logic. AI can also generate codebase documentation and onboarding guides.
Current state: Meetings, emails, Teams threads, and status updates consume a large portion of engineering time. Much of this is context-sharing rather than decision-making.
Pain points:
AI opportunity: AI can summarize threads, extract action items, draft status reports, and prepare meeting context so meetings are shorter and more decision-focused. This area is underexplored but high-potential.
To build a framework that is both actionable and sustainable, we apply a structured approach across two dimensions:
Every type of engineering work can become more productive with AI. This framework measures adoption and impact without judging the value of any individual's work. It provides both a current-state dashboard and a forward-looking investment roadmap.
What it measures: The volume and complexity of work engineers delegate to AI agents.
Why it matters: Token consumption is automatically tracked, requires no self-reporting, and is directly proportional to task complexity. An engineer fixing a typo consumes minimal tokens. An engineer delegating a full ICM investigation, building a feature end-to-end, or generating a design document consumes significantly more.
Collection method: Aggregate from AI platform billing/usage data (Anthropic API, internal AI gateway).
Frequency: Daily collection, weekly reporting.
Action items:
What it measures: How many different categories of work each engineer covers with AI tools — coding, debugging, code review, testing, documentation, planning, security review, ICM triage, etc.
Why it matters: An engineer using AI only for code completion has narrow integration. An engineer using AI across coding, debugging, code review, documentation, testing, and planning has fundamentally shifted their workflow.
| Token Consumption | Skills Breadth | Interpretation |
|---|---|---|
| High | Few | Deep but narrow — coding only |
| Low | Many | Experimenting but not relying on AI |
| High | Many | Truly AI-native workflow |
| Low | Few | Early adoption — room to grow |
Collection method: AI tool plugin/skill invocation logs, or lightweight weekly self-report.
Frequency: Weekly.
Action items:
What it measures: The types of repetitive or tedious work engineers wish they could delegate to AI but currently cannot.
Why it matters: This surfaces real pain points from the people doing the work. It captures demand that usage metrics miss — the work that isn't being helped by AI yet, precisely because the tools don't exist.
Survey question: "What repetitive work do you wish you could hand to an AI agent?"
Collection: Anonymous monthly pulse survey (1 question, free-text). Frequency: Monthly.
Action items:
What it measures: Specific AI agents or skills that don't exist yet but would save engineers the most time.
Why it matters: This transforms the measurement framework from a backward-looking dashboard into a forward-looking investment plan. It gives leadership a concrete, engineer-driven list of what to build or adopt next.
Survey question: "What AI agent or skill doesn't exist yet but would save you the most time?"
Collection: Anonymous monthly pulse survey (1 question, free-text). Frequency: Monthly.
Action items:
What it measures: How engineers adopt AI agents across four stages — from daily usage to building custom agents — revealing the team's overall AI maturity trajectory.
Why it matters: Tool adoption is not binary. Engineers progress through stages: using existing agents, identifying new use cases, building custom agents, and deepening technical knowledge.
| Dimension | Survey Question |
|---|---|
| Current usage | "Which AI agents do you use in your daily work? How frequently?" |
| Planned adoption | "Which AI agents do you plan to start using in the near future?" |
| Custom agent needs | "What custom AI agents would you want to develop, and for what workflows?" |
| Learning interests | "What technical areas about AI agents do you want to learn more about?" |
Collection: Quarterly survey with four dimensions.
Action items: