Łukasz Miądowicz - AI, Growth & Platform PM

How to Actually Design Human Oversight Into Your Agentic AI Product

Let me be direct with you. After years working in enterprise AI, the thing I see kill agentic products in production is not the model. It is not the prompt. It is not even the data.

It is the handoff.

The moment your agent needs to do something consequential - move money, deploy code, send an email to a customer - and your team has not designed a clear, deliberate boundary between "the agent decides" and "a human decides," you are one bad inference away from a serious problem.

This is the piece most builders skip. They spend weeks on evals, on tracing, on prompt engineering. Then they bolt on a confirmation dialog at the end and call it governance. That is not governance. That is a liability dressed up as a feature.

So let me walk you through how I think about this, and what I have seen actually work.

The Math That Should Scare You

Here is something most teams do not internalize until it is too late: you cannot trust an agent's confidence score.

LLMs trained with RLHF are systematically overconfident. When a model tells you it is 90% sure, the real-world accuracy is closer to 75% [1]. That gap does not sound catastrophic on a single step. But run three agents in sequence - which is completely normal in any real workflow - and that "90% confidence" chain has roughly a 42% chance of all three steps being correct [1].

Think about that. You have built what looks like a high-confidence, well-tested pipeline. And nearly half the time, something in that chain is wrong.

This is why escalation gates are not a nice-to-have. They are the only honest response to how these systems actually behave in the wild.

The Three Models - and When Each One Makes Sense

I find it useful to think about oversight as a spectrum with three distinct positions. The mistake most teams make is treating this as binary - either the agent runs free or a human approves everything. Neither extreme works.

Human-in-the-Loop (HITL): Stop and Ask

This is the full gate. The agent reaches a decision point, freezes, and waits for a human to say yes or no before anything happens.

Use this for anything irreversible. Sending money. Deploying to production. Deleting data. Sending external communications on behalf of your company or your customer. If you cannot undo it in under five minutes, a human needs to approve it first [1] [3].

The implementation detail that trips most teams up here is state. You cannot just pause an agent mid-workflow and expect it to resume cleanly after a human takes ten minutes to review something. You need to serialize the agent's full working state - its memory, its reasoning chain, its proposed action - to a durable store like Redis, and then rehydrate it when the human responds [4]. Build this from day one. Retrofitting it later is painful.

One more thing: do not make the approval synchronous. Gateway timeouts, token expiry, session state - synchronous approval breaks in real infrastructure constantly. Design it async-first. The agent pauses, a notification goes out, the human responds in their own time, and the workflow picks back up [4].

Human-on-the-Loop (HOTL): Watch and Override

Here the agent acts, but a human is watching and can intervene.

This is the right model for reversible actions. The agent drafts a document, updates a CRM record, categorizes a support ticket, creates an internal report. These things can be undone. The agent moves fast, which is what you want, but every action is logged with enough context that a supervisor can roll it back with one click [2].

The key product requirement here is observability. You need a live feed of what the agent is doing, surfaced in a way that a non-technical operator can actually read and act on. Not raw logs. Not a JSON dump. A clean, human-readable audit trail with a rollback button next to every entry.

Human-out-of-the-Loop (HOOTL): Let It Run

Full autonomy, no interruption. The agent reads data, runs queries, does lookups, generates analysis - and nobody needs to review it before it happens.

This is correct for read-only, zero-side-effect actions. Searching a knowledge base. Pulling a report. Summarizing a document. Gating these actions is a waste of everyone's time and trains your team to ignore approval requests - which is exactly the behavior that gets you in trouble when something important actually needs review [1].

The critical rule here: the boundary has to be enforced at the infrastructure layer, not by the AI itself. If your agent can decide at runtime whether its own action needs approval, a prompt injection can talk it out of asking. The database permission for execute_sql_query should be read-only at the connection level. The agent does not get a vote on that [1].

A Simple Framework for Classifying Every Action

When I am working with teams, I push them to classify every action their agent can take into one of four tiers before they write a single line of orchestration code. It forces the right conversations early.

Tier What the Agent Does Oversight Model The Rule
1 Reads, queries, lookups HOOTL Let it run. Interrupting this creates fatigue.
2 Drafts, internal state changes HOTL Act and log. Make rollback trivial.
3 External API calls, third-party systems HOTL or HITL Route to a staging queue. Treat confidence scores with skepticism.
4 Money, production deploys, data deletion, external comms HITL Hard gate. No exceptions. No confidence score bypasses this.

Tier 4 is where most enterprise AI incidents happen. The agent was "very confident." The team had not built a hard gate. Something irreversible happened. I have seen this pattern more times than I would like.

The Failure Modes Nobody Talks About

Getting the architecture right is only half the problem. The other half is human behavior. And this is where I see even well-designed systems fall apart.

Approval Fatigue

If you route too many actions through HITL, your reviewers will start rubber-stamping everything. They will click "Approve" without reading it because they have approved the last 200 things and nothing bad happened. And then the one time something genuinely dangerous comes through, they approve that too [5].

The fix is not better training. The fix is volume control. Reserve HITL for Tier 4 actions only. And when something does require approval, make the UI work against complacency. Do not give people a single "Approve" button. Make them acknowledge a short checklist - what the action is, what happens if it goes wrong, what the rollback looks like. It takes ten extra seconds and it forces actual attention [5].

The Context Gap

The agent pauses and asks for approval. The human stares at a notification that says "Approve SQL execution?" and has no idea what that means or why the agent wants to do it.

This is a product design failure. When your agent escalates, the handoff package needs to tell the reviewer three things: what the agent is proposing to do, why it chose that action, and what the blast radius is if it goes wrong [5]. Give them the agent's reasoning, not just the raw action. That is the difference between a meaningful review and a rubber stamp.

Infinite Blocking

The agent needs approval for something time-sensitive. The reviewer is in a meeting. The workflow sits blocked for two hours, causing downstream timeouts and a very confused end user.

Set time-boxed decision lanes. Match the window to the risk: maybe 15 minutes for a financial transaction, 2 minutes for a PII access request. If the window expires, the system fails safe to denied, logs the timeout, and terminates the workflow gracefully [5]. Never let a pending approval block silently forever.

The Bigger Point

I have been in enough enterprise AI projects to know that the teams who get this right are not the ones with the best models or the most sophisticated prompts. They are the ones who treat the human handoff layer as a first-class engineering problem - not an afterthought, not a compliance checkbox, but a core part of the product.

Your agent is only as trustworthy as the boundaries you build around it. Get those boundaries right, and you can give it real autonomy where it matters. Get them wrong, and you will spend your time explaining to a customer why the agent did something it should never have been allowed to do.

Build the gate before you need it.


Frequently Asked Questions

Q: Can we just use the model's confidence score to decide when to escalate?

No. And I say that having watched teams try this in production. Models are overconfident by design - their training optimizes for sounding sure. A claimed 90% confidence is often closer to 75% real accuracy. More importantly, confidence errors stack across multi-step chains. Use action risk and reversibility to decide when to escalate. Not what the model thinks about itself.

Q: How do we stop reviewers from rubber-stamping approvals?

Two things. First, drastically reduce the number of approvals by restricting HITL to genuinely irreversible actions. Second, design the approval UI to require active engagement - a short checklist, not a single button. Volume control plus friction at the right moment.

Q: What is the hardest part of building HITL in practice?

State management. Human review times are unpredictable. Your system needs to pause an agent mid-workflow, serialize everything it knows, store it durably, and pick back up cleanly when the human responds - whether that is 30 seconds or 3 hours later. Most teams underestimate this until they try to retrofit it.


References

[1] Digital Applied. "Human-in-the-Loop Escalation Design for AI Agents." June 2026. https://www.digitalapplied.com/blog/human-in-the-loop-escalation-design-ai-agents-2026

[2] Elementum. "Human-in-the-Loop Agentic AI: How Enterprise Teams Deploy Agents Without Losing Control." March 2026. https://www.elementum.ai/blog/human-in-the-loop-agentic-ai

[3] AWS Well-Architected. "AGENTSEC04-BP02 Human-in-the-loop for critical decisions." June 2026. https://docs.aws.amazon.com/wellarchitected/latest/agentic-ai-lens/agentsec04-bp02.html

[4] Redis. "Human in the loop: Why your production AI systems need human oversight." April 2026. https://redis.io/blog/ai-human-in-the-loop/

[5] Strata. "Human-in-the-Loop: A 2026 Guide to AI Oversight That Actually Works." May 2026. https://www.strata.io/blog/agentic-identity/practicing-the-human-in-the-loop/

#HITL #agentic-ai #enterprise-ai #product-building