← Back to Insights
FEBRUARY 2026 • 10 MIN READ

How to Evaluate Whether an AI Agent Is Safe for Client-Facing Work

Most AI vendors will tell you their system is “enterprise ready” or “governed.” Very few can actually prove it under pressure.

The Evaluation Framework I Use

When assessing any AI system for professional services use, I run it through five non-negotiable tests:

  1. Can it violate your standards without you knowing?
  2. Can you reconstruct exactly why it made any given decision?
  3. Does it improve over time, or does it slowly drift?
  4. What happens when it encounters something genuinely novel?
  5. Who is ultimately responsible when it gets something important wrong?

If the answers to these questions are vague or rely on “human review at the end,” the system is not governed — it is assisted.

The Dangerous Illusion of Control

Many firms believe that because a human reviews the final output, risk is contained.

This only works when the volume is low. As soon as you scale, review becomes performative. People start trusting the system more than they should.

Real governance removes the possibility of certain classes of failure before the output is ever generated.

The firms that will win are those that can confidently answer the five questions above with evidence, not marketing slides.