Why AI in production almost always needs humans
A model in a demo gets it right 95% of the time. A process in production has to handle the remaining 5% without breaking customer trust, without sending a wrong email, without writing bad data into the ERP. That delta between demo and production is where human-in-the-loop earns its place.
HITL isn’t blind distrust of the model. It’s the assumption that generative AI makes confident mistakes, that the cost of some of those mistakes is high or irreversible, and that reviewing a draft is much cheaper than retracting a sent reply. For sensitive processes — customer communication, writes to the CRM, decisions with regulatory impact — the human inside the loop is what makes the agent deployable.
Three HITL patterns that work
The loop isn’t always the same shape. Three patterns cover most of what we see in production:
- Review-before-send. The agent drafts, a person approves with a click. Default pattern for customer emails, review responses, sales proposals and anything going out under the brand. The agent provides the speed; the human absorbs the risk.
- Confidence-gated. The model scores its own confidence. Above a threshold, it acts on its own. Below, it escalates to a person. Useful for classification, ticket triage or data enrichment where most cases are easy and only the residual needs human eyes.
- Exception escalation. The agent acts on its own in the normal flow but detects exception signals — complaint, urgency, ambiguous data, sensitive policy — and hands the case to a person with full context. Standard pattern for concierge, support and sales agents.
The three combine inside the same process. What matters is that the pattern is chosen on purpose, not by default.
Designing the loop: when, who, with what information
A badly designed loop turns into a bottleneck. A well-designed one feels invisible. Three questions decide which one you get:
- When? Before the output reaches a customer, mutates an external system or writes to a sensitive record. Not after.
- Who? The person with the authority and the context to decide — not whoever happens to be free. Routing reviews to the wrong team turns approval into a rubber stamp nobody really reads.
- With what information? The draft, the sources the agent used, the case context and the specific action being proposed. Without that, the person approves blind — which is worse than having no loop at all.
The loop also needs a clear channel: where the notification lands, what happens if no one picks it up, how long the person has to decide and what fires if the SLA expires. Without those, the queue grows and the project quietly dies.
When you can remove the human (and when you can’t)
HITL isn’t forever. The right question isn’t can we ship it without review? but what would have to be true for review to come off?. Three reasonable conditions:
- The unchanged-approval rate has been above 95% for several weeks at meaningful volume.
- The cost of any individual error is low and reversible — an internal draft, a classification you can fix, a non-critical record.
- There’s an after-the-fact audit mechanism: sampling, alerts, dashboards. The human exits the real-time flow, not the supervision.
For processes with regulatory, financial or safety impact, the human stays. Not out of fear: by design. The EU AI Act explicitly recognises human oversight as a mitigation mechanism for high-risk systems, and regulated sectors (healthcare, finance, legal) take it as a given.