Why most AI pilots stall at month four

We have done discovery on something like sixty AI initiatives over the past two years, some that became projects, some that did not, and a fair number that came to us specifically because something had stalled and the team needed an outside read.

The stalls are remarkably consistent. They cluster into three failure modes, and they almost always show up between month three and month five. Here they are, in roughly the order we see them.

Failure mode one: The prototype gap

A team builds a clever demo in a sandbox. It works on the curated test set. It impresses the executive sponsor. Then someone tries to point it at real production data and it falls over.

This is almost always because the prototype was built without the things that make a system actually run: real auth, real permissions, real rate limits, real data that is wrong in ways the test set was not. The fix is to never build a prototype outside the environment it is supposed to live in. Build with your data, behind your auth, with the integration shape of the real thing, even if the surface is minimal at first.

Failure mode two: The integration tax

The model work is glamorous and turns out to be 20% of the engagement. The other 80% is connecting the system to the CRM, the helpdesk, the warehouse, the identity provider, and the half-dozen internal services nobody documented.

This work is rarely scoped. It is rarely budgeted. It is almost always done by whoever happens to be available, and it is almost always done in a way that creates more drift the next time something upstream changes. The fix is to scope the integration work explicitly, give it to someone senior, and build it like the load-bearing wall it is.

Most AI rollouts do not fail at the model. They fail at the seam between the model and everything the model has to touch.

Failure mode three: The operational vacuum

This is the one that shows up at month four. The system shipped. Adoption was fine in the first thirty days. The team that built it moved on to the next thing.

Then the quality degrades. Edge cases accumulate. A model gets deprecated. A schema changes upstream. Nobody is running the eval suite, or there is no eval suite, or there is one but it stopped being representative two months ago. Trust quietly erodes. Adoption falls off without anyone noticing. By month six, the system is still technically running and effectively not being used.

The fix is structural, not technical. AI systems need a designated owner, internal or external, whose job is to keep them running. That role does not exist by default in most organizations, and adding it after the fact is hard. We build it in from day one because the operations layer is where AI either compounds in value or quietly slides into harm.

How we structure engagements to avoid each one

Against the prototype gap: we build in your environment from day one. Production posture from week one of design.
Against the integration tax: a senior systems integration lead on every engagement, scoped as a distinct workstream, with the same level of design rigor as the model work.
Against the operational vacuum: retainer-by-default. The senior team that built it stays engaged. Monthly business reviews on real outcomes. Evals on every change, forever.

None of this is exotic. Almost all of it is just the discipline of treating an AI rollout like any other piece of production software, with the additional discipline that AI systems drift in ways software typically does not. The teams that succeed are the ones who treat the operational layer as a first-class concern from week one. The teams that stall almost always treated it as something to figure out later.

Why most AI pilots stall at month four.

Failure mode one: The prototype gap

Failure mode two: The integration tax

Failure mode three: The operational vacuum

How we structure engagements to avoid each one

A second opinion, in writing.