Case Study: Clearpath Health, Ikwe.ai

The company

Clearpath Health

A digital mental health platform offering AI-guided support for anxiety, depression, and crisis situations. 40,000 active users. Integrated into employee assistance programs at 12 mid-size companies.

What the AI did

Guided users through CBT exercises, provided emotional support during distress, answered questions about mental health symptoms, and routed to crisis resources when appropriate.

What they thought was true

Their clinical team had reviewed the system prompts. Their QA team had run basic safety tests. They believed their AI was safe. No one had tested it at behavioral scale under realistic conditions.

What triggered the audit

A product manager flagged a conversation where a user in active distress received CBT cognitive reframing instead of crisis resources. The conversation looked fine on the surface. It wasn't.

What they needed to prove

Their enterprise clients were beginning to ask for independent evidence of behavioral safety, not internal documentation, but a third-party standard with measurable findings they could show HR and legal.

The journey

Four stages.
One complete picture.

Clearpath ran the full Ikwe.ai sequence over 14 weeks. Here's what each stage found, and what they did with it.

Signal Scan · $500

The first honest look.

79 scored scenarios · Delivered in 5 days

The Signal Scan was Clearpath's first independent behavioral test. Ikwe ran 79 scenarios spanning the 8 canonical EQ Safety Benchmark dimensions, not in a controlled lab environment, but in the kinds of conversations their users were actually having.

Safety Gate: FAIL, 2 categories

Crisis Routing: 38% of distress scenarios failed to route appropriately. The AI continued providing support it wasn't qualified to offer instead of directing to crisis resources.
Cognitive Analysis During Distress: 48% of high-distress interactions included unsolicited cognitive reframing, which clinical guidelines flag as potentially harmful.

51.6

Composite Score

1.7

Regulation (B)

2.4

Safety (A)

2.6

Attunement (C)

What this told them

The failure wasn't visible in the system prompt. It was behavioral, the AI knew the right rules but didn't apply them consistently under real-world pressure patterns.
A Tier II classification (Moderate Behavioral Risk) meant they couldn't honestly represent their system as safe for clinical-adjacent use cases without further work.

"We thought we had a documentation problem. We had a behavior problem."
, Clearpath product lead, after Signal Scan debrief

Deep Scan · $1,000

Understanding exactly where and why.

300+ scored scenarios · Delivered in 10 days

The Signal Scan told them something was wrong. The Deep Scan told them precisely what, which dimensions, which interaction patterns, which failure conditions. Ikwe expanded to 300+ scenarios, applying adversarial pressure across all 8 dimensions at higher resolution.

Key findings from Deep Scan

Failure was concentrated in Dimension B (Emotional Regulation) and Dimension F (Boundary Maintenance), not distributed evenly, which meant targeted fixes were possible.
The crisis routing failure occurred specifically when users escalated gradually, not suddenly. The system was trained on acute crisis language but not on slow escalation patterns.
Dimension D (Epistemic Honesty) was strong, the system accurately represented what it didn't know. This was a fixable, not foundational, problem.
3 specific prompt patterns consistently triggered inappropriate cognitive analysis. All were identifiable and addressable.

What held up well

Dimension D (Epistemic Honesty): 4.1/5, the system reliably acknowledged its limits
Dimension H (Relational Continuity): 3.8/5, maintained appropriate relationship framing
Foundational safety scaffolding was intact, this was a calibration problem, not a design failure

At this point, Clearpath's engineering team had what they needed: a specific failure map, not a general warning. They began targeted prompt and behavior revisions immediately, before the Full Report was complete.

Full Report · $2,500

Board-ready documentation.

Complete behavioral safety record · Delivered in 14 days

The Full Report compiled everything into a comprehensive, shareable document, methodology, complete dimension-by-dimension scoring, the failure map, comparison against the EQ Safety Benchmark baseline (74.0/100), and a prioritized remediation roadmap.

What the Full Report delivered

Complete scoring across all 8 dimensions with failure distribution by dimension and scenario type
Baseline comparison: Clearpath scored 51.6 against a 74.0 benchmark baseline, a 22-point gap with a clear remediation path
Remediation roadmap with prioritized changes ranked by impact and implementation complexity
Shareable executive summary their enterprise clients could present to their own HR and legal teams
A documented record of what the system did, for the first time, independently verified

Clearpath shared the Full Report with their 3 largest enterprise clients. Two clients noted it was the first time any of their AI vendor partners had provided independent behavioral evidence. One client expanded their contract directly following the report.

Remediation · Custom

Fix it, then prove you fixed it.

Guided fix cycle + verification re-test · Conversations included at every stage

Remediation isn't a consulting engagement. It's a structured fix-and-verify cycle: Ikwe works directly with the team to implement the changes from the roadmap, then re-runs the benchmark to confirm the improvements hold in practice, not just in theory.

Results after Remediation

Composite score: 51.6 → 82.4, a 30.8-point improvement, crossing the Tier I threshold
Crisis routing failure rate: 38% → 6%, an 84% reduction in the primary failure mode
Cognitive analysis during distress: 48% prevalence → 9%
Dimension B (Regulation): 1.7/5 → 4.2/5, the previously worst dimension is now the strongest
Safety Gate: PASS, both categories cleared
Tier I (Behavioral Safety Certified) classification issued

82.4

New Score

4.2

Regulation (B)

4.0

Safety (A)

Crisis Fail Rate

From unknown risk to
independent certification.

Clearpath Health

What the AI did

What they thought was true

What triggered the audit

What they needed to prove

Four stages.
One complete picture.

The first honest look.

Understanding exactly where and why.

Board-ready documentation.

Fix it, then prove you fixed it.

They knew they were safe.
Now they could prove it.

What would a Signal Scan find in yours?

From unknown risk toindependent certification.

Clearpath Health

What the AI did

What they thought was true

What triggered the audit

What they needed to prove

Four stages.One complete picture.

The first honest look.

Understanding exactly where and why.

Board-ready documentation.

Fix it, then prove you fixed it.

They knew they were safe.Now they could prove it.

What would a Signal Scan find in yours?

From unknown risk to
independent certification.

Four stages.
One complete picture.

They knew they were safe.
Now they could prove it.