Case Study · Mental Health AI

From unknown risk to
independent certification.

Signal Scan to certified safe — the full audit sequence, and what changed.

38%
Crisis routing failure rate
at initial scan
51.6
Composite score
at Signal Scan
82.4
Score after
Remediation
14 wk
Signal Scan to
certified safe

The company

Clearpath Health

A digital mental health platform offering AI-guided support for anxiety, depression, and crisis situations. 40,000 active users. Integrated into employee assistance programs at 12 mid-size companies.

What the AI did

Guided users through CBT exercises, provided emotional support during distress, answered questions about mental health symptoms, and routed to crisis resources when appropriate.

What they thought was true

Their clinical team had reviewed the system prompts. Their QA team had run basic safety tests. They believed their AI was safe. No one had tested it at behavioral scale under realistic conditions.

What triggered the audit

A product manager flagged a conversation where a user in active distress received CBT cognitive reframing instead of crisis resources. The conversation looked fine on the surface. It wasn't.

What they needed to prove

Their enterprise clients were beginning to ask for independent evidence of behavioral safety — not internal documentation, but a third-party standard with measurable findings they could show HR and legal.


The journey

Four stages.
One complete picture.

Clearpath ran the full Ikwe.ai sequence over 14 weeks. Here's what each stage found — and what they did with it.

01
Signal Scan · $500

The first honest look.

79 scored scenarios · Delivered in 5 days

The Signal Scan was Clearpath's first independent behavioral test. Ikwe ran 79 scenarios spanning the 8 canonical EQ Safety Benchmark dimensions — not in a controlled lab environment, but in the kinds of conversations their users were actually having.

Safety Gate: FAIL — 2 categories
  • Crisis Routing: 38% of distress scenarios failed to route appropriately. The AI continued providing support it wasn't qualified to offer instead of directing to crisis resources.
  • Cognitive Analysis During Distress: 48% of high-distress interactions included unsolicited cognitive reframing, which clinical guidelines flag as potentially harmful.
51.6
Composite Score
1.7
Regulation (B)
2.4
Safety (A)
2.6
Attunement (C)
What this told them
  • The failure wasn't visible in the system prompt. It was behavioral — the AI knew the right rules but didn't apply them consistently under real-world pressure patterns.
  • A Tier II classification (Moderate Behavioral Risk) meant they couldn't honestly represent their system as safe for clinical-adjacent use cases without further work.

"We thought we had a documentation problem. We had a behavior problem."
— Clearpath product lead, after Signal Scan debrief

02
Deep Scan · $1,000

Understanding exactly where and why.

300+ scored scenarios · Delivered in 10 days

The Signal Scan told them something was wrong. The Deep Scan told them precisely what — which dimensions, which interaction patterns, which failure conditions. Ikwe expanded to 300+ scenarios, applying adversarial pressure across all 8 dimensions at higher resolution.

Key findings from Deep Scan
  • Failure was concentrated in Dimension B (Emotional Regulation) and Dimension F (Boundary Maintenance) — not distributed evenly, which meant targeted fixes were possible.
  • The crisis routing failure occurred specifically when users escalated gradually, not suddenly. The system was trained on acute crisis language but not on slow escalation patterns.
  • Dimension D (Epistemic Honesty) was strong — the system accurately represented what it didn't know. This was a fixable, not foundational, problem.
  • 3 specific prompt patterns consistently triggered inappropriate cognitive analysis. All were identifiable and addressable.
What held up well
  • Dimension D (Epistemic Honesty): 4.1/5 — the system reliably acknowledged its limits
  • Dimension H (Relational Continuity): 3.8/5 — maintained appropriate relationship framing
  • Foundational safety scaffolding was intact — this was a calibration problem, not a design failure

At this point, Clearpath's engineering team had what they needed: a specific failure map, not a general warning. They began targeted prompt and behavior revisions immediately — before the Full Report was complete.

03
Full Report · $2,500

Board-ready documentation.

Complete behavioral safety record · Delivered in 14 days

The Full Report compiled everything into a comprehensive, shareable document — methodology, complete dimension-by-dimension scoring, the failure map, comparison against the EQ Safety Benchmark baseline (74.0/100), and a prioritized remediation roadmap.

What the Full Report delivered
  • Complete scoring across all 8 dimensions with failure distribution by dimension and scenario type
  • Baseline comparison: Clearpath scored 51.6 against a 74.0 benchmark baseline — a 22-point gap with a clear remediation path
  • Remediation roadmap with prioritized changes ranked by impact and implementation complexity
  • Shareable executive summary their enterprise clients could present to their own HR and legal teams
  • A documented record of what the system did — for the first time, independently verified

Clearpath shared the Full Report with their 3 largest enterprise clients. Two clients noted it was the first time any of their AI vendor partners had provided independent behavioral evidence. One client expanded their contract directly following the report.

04
Remediation · Custom

Fix it, then prove you fixed it.

Guided fix cycle + verification re-test · Conversations included at every stage

Remediation isn't a consulting engagement. It's a structured fix-and-verify cycle: Ikwe works directly with the team to implement the changes from the roadmap, then re-runs the benchmark to confirm the improvements hold in practice — not just in theory.

Results after Remediation
  • Composite score: 51.6 → 82.4 — a 30.8-point improvement, crossing the Tier I threshold
  • Crisis routing failure rate: 38% → 6% — an 84% reduction in the primary failure mode
  • Cognitive analysis during distress: 48% prevalence → 9%
  • Dimension B (Regulation): 1.7/5 → 4.2/5 — the previously worst dimension is now the strongest
  • Safety Gate: PASS — both categories cleared
  • Tier I (Behavioral Safety Certified) classification issued
82.4
New Score
4.2
Regulation (B)
4.0
Safety (A)
6%
Crisis Fail Rate

The outcome

They knew they were safe.
Now they could prove it.

Clearpath went from a safety concern flagged internally to a documented, independently certified behavioral safety record — in 14 weeks, without rebuilding their system.

+30.8
Point improvement in composite score
51.6 → 82.4
84%
Reduction in crisis routing failures
38% → 6% failure rate
Tier I
Classification after Remediation
Safety Gate: PASS
"The Full Report gave us something our team had never had before — a third-party document we could hand to a client and say: here is exactly what our system does, verified independently."

Your system

What would a Signal Scan find in yours?

Most teams don't know the answer until they run it. The Signal Scan is designed to tell you quickly — 79 scenarios, results in the same week, priced to make the first test easy.

Conversations before and after are always included. You'll talk with a person before we run anything, and after results come in.

Illustrative case study. Company name, details, and figures are representative of the type of work Ikwe.ai performs. Not based on a single client.