Skip to content
AI Metric

19 May 2026

What a good AI pilot looks like

Most AI pilots in construction do not fail because the technology was wrong. They fail because the pilot was never designed to succeed: the wrong workflow, no baseline measurement, no owner, and no agreed definition of what "worked" would even mean.

Here is the structure we use. It fits in roughly six weeks, costs little, and ends with a number and a decision instead of a shrug.

Week zero: pick the right workflow

The pilot lives or dies here. The right candidate workflow has four properties:

  • High volume. It happens weekly or daily, so six weeks generates real evidence.
  • Real pain. The people doing it actively resent it. Adoption is free when you remove work people hate.
  • Recoverable mistakes. An error is caught in review, not discovered in a final account.
  • Measurable today. You can state, now, how long it takes and how often it goes wrong.

Monthly reporting, meeting minutes into actions, tender summarisation, inbox triage and site diary compilation all qualify. Anything touching contractual notices, payments or safety decisions does not belong in a first pilot.

Set the baseline before you start

If you do not measure the before, you cannot prove the after. Keep it light: two weeks of honest numbers. How many hours does the workflow take, who does it, what does it delay, what do the errors cost when they happen. Write it down. This is the single most skipped step, and skipping it is why so many pilots end in vibes instead of verdicts.

Run it with rules

Six weeks, one team, one workflow. A named owner inside the business, not the vendor and not us. Every output reviewed by a person while trust is being earned, with a simple log of what needed correcting. Clear data rules agreed up front, so nobody is improvising with sensitive information.

Then leave the process alone long enough to learn. Fiddling with the setup every three days resets the experiment each time.

Judge it like a commercial decision

At the end you want three numbers and an honest conversation:

  1. Hours. Baseline time versus pilot time, for the same output quality.
  2. Quality. Error and correction rates, from the review log, trending over the six weeks.
  3. Adoption. Did the team keep using it in week six without being chased? This is the most honest signal you will get. People do not voluntarily keep tools that waste their time.

Then decide like you would on plant: keep and scale, fix one specific weakness and re-run, or kill it and write down why. A cleanly killed pilot is a success. It cost you six weeks and taught you something true about your business. The expensive failure is the pilot that limps on for a year because nobody defined what failure looked like.

The multiplier

One well-run pilot does more than automate one workflow. It produces the playbook: how your data behaves, what your review process needs, how your people build trust in these tools. The second workflow rolls out in half the time, and the third is routine.

That is the actual goal. Not a demo that impresses a steering group, but a repeatable way for your business to turn AI capability into measured hours, one workflow at a time.

AI Metric is a construction-native AI consultancy. If your team is spending more time operating software than doing their job, get in touch or book a call.