Per-Agent KPIs Need a Work Record First

KateFounder / CEO

The phrase "per-agent KPIs" lands quickly because everyone already knows the shape of the problem.

If a person joins your company, you know how to reason about their work. They have a role. They have a manager. They have a trail of tasks, decisions, output, feedback, cost, and context. You may not have a perfect performance review process, but the basic unit is clear: one worker, one record.

AI agents break that assumption.

Three agents can work through the same OAuth account. Five agents can share the same provider bill. A coding agent, a research agent, and a content agent can all leave traces in Linear, GitHub, Notion, Vercel, and your model dashboard, but none of those tools knows that the traces belong to different workers.

So when a founder asks the obvious question - "which agents are earning their keep?" - the answer is usually a shrug.

Not because KPIs are impossible. Because the record underneath them does not exist yet.

The metric is only as good as the identity beneath it

A KPI sounds objective. Cost per output. Accepted tasks. Review pass rate. Failure rate. Time to resolution. Escalations. Reliability. Output volume. These are useful signals.

But a KPI without attribution is theatre.

If every agent acts as "Kate" in Linear, "Kate" in GitHub, and one blended line item in the Anthropic or OpenAI bill, then the dashboard can only tell you what the account did. It cannot tell you what Atlas did, what Nova did, or whether the content agent is quietly burning money while the coding agent is doing all the useful work.

That is why Cockpit starts with the work record.

The work record is not glamorous. It is the boring substrate: agent identity, owner, role, runtime, connected tools, activity history, attribution, and review context. It is the place where future cost, output, acceptance, and reliability metrics can land without becoming a spreadsheet of guesses.

What is live today, and what comes next

Today, Cockpit gives each agent a record and an attribution path. Linear is live. GitHub and Notion are in beta behind the same bridge model. Manual burn tracking is live so teams can start recording the cost surface around the AI workforce.

Provider-level billing attribution is next. Cost per output is next. Acceptance and reliability scores are next. The KPI layer is the destination, but the work record is the foundation.

That distinction matters because the market is going to fill with dashboards that look like they are grading agents.

Some will be tracing prompts. Some will be measuring model latency. Some will be showing aggregate usage. Those are useful, but they do not answer the operator's question unless the system can tie the work back to the specific agent responsible for it.

You cannot performance-review a fog.

Why startups feel this first

Startups are the first customers for this because they are the first teams where the AI workforce can outnumber the human one.

A big company can absorb ambiguity for a while. The AI work is a pilot, a procurement line, a sandbox. A startup cannot. If two humans are running ten agents, the difference between a useful agent and an expensive loop is the difference between leverage and chaos.

The founder does not need a philosophical answer to "are agents employees?"

They need an operating answer to "who did what, what did it cost, and should I give this agent more scope?"

That is the Cockpit thesis: per-agent KPIs are not a reporting feature. They are the management interface for companies where the workforce is partly autonomous.

The record comes first. The review follows.

← All posts Back to homepage →