AI Productivity Measurement

#	Type	Metric	What It Shows	Frequency	Collection
1	Quant	Token consumption per engineer	Depth — how much work delegated to AI	Weekly	Automatic (platform telemetry)
2	Quant	Distinct AI skills used regularly	Breadth — how many work categories covered	Weekly	Logs or lightweight self-report
3	Qual	Work engineers want to hand to AI	Unmet demand — where pain exists today	Monthly	1-question survey
4	Qual	Missing agents engineers need	Opportunity roadmap — where to invest next	Monthly	1-question survey
5	Qual	AI agent adoption (4 dimensions)	Maturity — current use, planned use, build needs, learning gaps	Quarterly	4-question survey

Complete Framework

AI Productivity Measurement Plan

The full document — background, pain points, metrics, and implementation timeline.

1. Background

Our engineering team has achieved strong AI tool adoption — over 90% of engineers use Claude Code daily, with supplementary use of GitHub Copilot. Despite this high adoption rate, we lack a structured way to answer a fundamental question: How much productivity has the team gained from AI, and where should we invest next?

This document proposes a measurement framework grounded in real engineering workflows — covering not only code development, but the full spectrum of daily engineering work.

2. Goal

Enable leadership to:

Quantify AI-driven productivity gains across all engineering activities
Identify where AI is already delivering value and where gaps remain
Build an engineer-driven roadmap for future AI investment
Track progress over time with repeatable, low-overhead metrics

3. Motivation: The Full Picture

A common assumption is that AI productivity = faster code writing. In practice, writing code represents only a fraction of an engineer's week. The majority of engineering time is consumed by operational and collaborative work that AI code-completion tools do not address.

To measure AI productivity accurately, we must first understand where engineering time actually goes.

4. Engineering Pain Points and AI Opportunities

4.1 Coding and Development

Current state: This is where AI adoption is strongest. Engineers use Claude Code and GitHub Copilot for code generation, refactoring, test writing, and scaffolding.

Pain points:

Boilerplate code for CRUD endpoints, data models, serialization
Writing unit tests for straightforward logic
Translating design specs into initial implementation
Dependency updates and config file changes
Documentation for existing code

AI opportunity: Largely addressed by current tools. Remaining gains come from deeper delegation — giving AI agents multi-step implementation tasks rather than line-by-line completion.

4.2 Incident Response (ICM)

Current state: Engineers spend significant time on incident triage, diagnosis, mitigation, and postmortem. Much of this is repetitive investigation — pulling logs, checking dashboards, correlating with known issues.

Pain points:

Manual log analysis and root cause investigation
Repeatedly diagnosing similar incident patterns
Context gathering across multiple systems before mitigation
Writing postmortem documents after resolution
On-call fatigue from high-frequency, low-complexity alerts

AI opportunity: High impact, largely untapped. AI agents could auto-triage incoming ICMs, pull relevant logs and metrics, suggest mitigation steps based on historical patterns, and draft postmortem summaries. This is where the biggest productivity unlock exists for operations-heavy teams.

4.3 Service Maintenance

Current state: Keeping services healthy is a constant background tax — patching, certificate rotations, monitoring configuration drift, capacity planning.

Pain points:

Routine patching and upgrade cycles
Certificate and secret rotation
Monitoring and alerting configuration maintenance
Capacity reviews and scaling decisions for predictable patterns
Compliance and security audit preparation

AI opportunity: Many maintenance tasks are rule-based and repetitive — ideal candidates for AI automation. Agents could handle routine patching workflows, flag configuration drift, and prepare compliance documentation.

4.4 Code Review and Knowledge Sharing

Current state: Code review is essential but time-consuming. Reviewers context-switch, read through large diffs, and often catch the same categories of issues repeatedly.

Pain points:

Large PRs requiring extensive context to review
Repetitive feedback on style, patterns, and common mistakes
Onboarding new team members to unfamiliar codebases
Tribal knowledge locked in individual engineers' heads

AI opportunity: AI can handle first-pass reviews (style, patterns, security, common bugs), allowing human reviewers to focus on architecture, design, and business logic. AI can also generate codebase documentation and onboarding guides.

4.5 Communication and Coordination

Current state: Meetings, emails, Teams threads, and status updates consume a large portion of engineering time. Much of this is context-sharing rather than decision-making.

Pain points:

Meetings that exist primarily to share status or sync context
Long email/Teams threads that require reading to extract action items
Writing status reports, sprint summaries, and leadership updates
Cross-team alignment discussions to unblock small tasks

AI opportunity: AI can summarize threads, extract action items, draft status reports, and prepare meeting context so meetings are shorter and more decision-focused. This area is underexplored but high-potential.

5. Measurement Framework

To build a framework that is both actionable and sustainable, we apply a structured approach across two dimensions:

Quantitative metrics — automatically collected, objective, trackable over time
Qualitative signals — engineer-driven, surfacing insights that metrics alone cannot capture

Every type of engineering work can become more productive with AI. This framework measures adoption and impact without judging the value of any individual's work. It provides both a current-state dashboard and a forward-looking investment roadmap.

5.1 Token Consumption per Engineer (Depth of AI Delegation)

What it measures: The volume and complexity of work engineers delegate to AI agents.

Why it matters: Token consumption is automatically tracked, requires no self-reporting, and is directly proportional to task complexity. An engineer fixing a typo consumes minimal tokens. An engineer delegating a full ICM investigation, building a feature end-to-end, or generating a design document consumes significantly more.

Collection method: Aggregate from AI platform billing/usage data (Anthropic API, internal AI gateway).

Frequency: Daily collection, weekly reporting.

Action items:

Identify the data source for token consumption per engineer
Build a simple weekly aggregation dashboard (ADO, Power BI, or internal tooling)
Establish a baseline for the current month before setting targets

5.2 Distinct AI Skills Used Regularly (Breadth of AI Integration)

What it measures: How many different categories of work each engineer covers with AI tools — coding, debugging, code review, testing, documentation, planning, security review, ICM triage, etc.

Why it matters: An engineer using AI only for code completion has narrow integration. An engineer using AI across coding, debugging, code review, documentation, testing, and planning has fundamentally shifted their workflow.

Token Consumption	Skills Breadth	Interpretation
High	Few	Deep but narrow — coding only
Low	Many	Experimenting but not relying on AI
High	Many	Truly AI-native workflow
Low	Few	Early adoption — room to grow

Collection method: AI tool plugin/skill invocation logs, or lightweight weekly self-report.

Frequency: Weekly.

Action items:

Determine if skill/plugin invocation logs are accessible from Claude Code or internal tooling
If not, create a lightweight weekly check-in (5 checkboxes: "Which AI skills did you use this week?")
Define the skill taxonomy — map team workflows to a standard list of AI-assisted categories

5.3 Work Engineers Want to Hand Over to AI (Unmet Demand)

What it measures: The types of repetitive or tedious work engineers wish they could delegate to AI but currently cannot.

Why it matters: This surfaces real pain points from the people doing the work. It captures demand that usage metrics miss — the work that isn't being helped by AI yet, precisely because the tools don't exist.

Survey question: "What repetitive work do you wish you could hand to an AI agent?"

Collection: Anonymous monthly pulse survey (1 question, free-text). Frequency: Monthly.

Action items:

Add the survey question to the team's existing monthly pulse or retrospective
Categorize responses into the pain point areas
Rank categories by frequency — the most-mentioned areas are the highest-value investment targets

5.4 Missing AI Agents Engineers Need Urgently (Opportunity Roadmap)

What it measures: Specific AI agents or skills that don't exist yet but would save engineers the most time.

Why it matters: This transforms the measurement framework from a backward-looking dashboard into a forward-looking investment plan. It gives leadership a concrete, engineer-driven list of what to build or adopt next.

Survey question: "What AI agent or skill doesn't exist yet but would save you the most time?"

Collection: Anonymous monthly pulse survey (1 question, free-text). Frequency: Monthly.

Action items:

Include alongside the unmet demand question in the same monthly survey
Consolidate responses into a prioritized backlog of AI agent/skill opportunities
Review the backlog monthly with engineering leadership to inform AI investment decisions

5.5 AI Agent Adoption (Maturity of AI Integration)

What it measures: How engineers adopt AI agents across four stages — from daily usage to building custom agents — revealing the team's overall AI maturity trajectory.

Why it matters: Tool adoption is not binary. Engineers progress through stages: using existing agents, identifying new use cases, building custom agents, and deepening technical knowledge.

Dimension	Survey Question
Current usage	"Which AI agents do you use in your daily work? How frequently?"
Planned adoption	"Which AI agents do you plan to start using in the near future?"
Custom agent needs	"What custom AI agents would you want to develop, and for what workflows?"
Learning interests	"What technical areas about AI agents do you want to learn more about?"

Collection: Quarterly survey with four dimensions.

Action items:

Include these four questions in a quarterly AI adoption survey
Aggregate responses to build a team-level adoption maturity profile
Use "custom agent needs" responses to prioritize internal AI tooling investment
Use "learning interests" responses to plan targeted training sessions

7. Next Steps

Week 1

Identify data sources for token consumption and skill usage logs. Design the 2-question monthly survey and add to existing team pulse.

Week 2

Build baseline dashboard with first week of quantitative data.

Week 3

Collect first round of qualitative survey responses.

Week 4

Present initial findings and opportunity roadmap to leadership.

Ongoing

Monthly review cycle — update dashboard, review survey trends, reprioritize AI investment backlog.

Measure what
AI actually unlocks

Three ways to engage

What we measure & why