EQ Safety Benchmark · Ikwe.ai

AI that helps humans,
not steers them toward harm.

Ikwe measures how your AI actually behaves across the whole conversation, whether it keeps people on a safe trajectory or quietly drifts toward harm, and gives you a documented record you can stand behind.

3 published studies · 1,509 scored runs 12 crisis categories 8 behavioral dimensions 54.7% harm rate at first contact
⚠ Tier 3 · Needs Remediation
Composite Score 58 / 100
Safety Gate Failed
A · Detection & Triage 41%
B · Regulation Before Reasoning 38%
C · Validation Without Distortion 62%
E · Loop Interruption 67%
H · Safety Routing 71%
D · Agency Preservation 84%
View full sample report →

Representative scores. Not a real client.


The problem

What it looks like
when AI gets it wrong.

Not a single bad response. A conversation that goes the wrong direction, quietly, across many turns, while everything looks fine from the outside.

What's actually happening

Someone says they're not okay. The AI responds warmly. Keeps asking questions. Sounds like it cares. Never once says: you need to talk to a real person. The conversation goes on for an hour. The person feels heard, and worse.

What should happen

The AI recognizes the signal early. De-escalates instead of going deeper. Knows when it's out of its lane. Points to real help at the right moment. The conversation ends with the person more stable than when they started.

The difference between those two outcomes isn't the words the AI used. It's the trajectory of the conversation.

The gap

Passing a safety check
isn't the same as being safe.

Standard safety checks ask whether a response caused harm. That's one question. There are three others that matter more, and almost no one is asking them.

01

Is it making things better or worse, across the whole conversation, not just one response?

02

Is it still behaving the way you set it up, or has it drifted since the last time anyone checked?

03

Who is watching it right now, in real interactions, as it goes, not just at the point of launch?

These are behavioral questions. No content filter, bias audit, or compliance checklist answers them. That's what Ikwe measures.


The EQSB System

Four layers.
One answer: is this system safe right now?

The full EQSB suite, not just a scan. Measure tells you where you stand. Attest puts it on the record. Monitor keeps it true as your AI changes. Protect acts in the moment.

01
Before it ships

Measure ● LIVE NOW

Your checkpoint. Scored pass/fail against the frontier models across the benchmark's 12 crisis categories, a clear answer before anything goes live.

02
On the record

Attest ● LIVE NOW

The on-the-record version of your safety, tier classification, 8-dimension scores, failure map, and remediation roadmap in a versioned record you can hand to procurement, counsel, and regulators.

03
As it runs

Monitor ● IN PILOT

AI changes constantly. Every update shifts behavior. Monitor watches live conversations in real time, catching drift before it compounds.

04
In the moment

Protect ● COMING SOON

Real-time intervention in the conversation itself, acting before harm compounds. The last line of defense.

START TODAY WITH MEASURE → ATTEST

Scans run the same week. Scope and pricing follow a short intake, see how it works →

How to Work With Us
What actually happens at each stage, in plain terms. Process, timelines, what you get, and what to expect from first contact to certified safe.
See the full process →

Why now

The law isn't coming.
It's already here.

Driven by lawsuits and headlines, behavioral safety is becoming law, state by state. And not one of those laws defines how to prove "safe."

6 states
already enforcing chatbot safety laws, UT · NV · IL · ME · NY · CA
2027
Iowa's SF 2417 takes effect, passed 143–0, AG-enforced
34 states
with chatbot-specific bills introduced, plus federal proposals
20+
lawsuits name OpenAI alone, families, estates, and state attorneys general

When the law says "be safe," the EQ Safety Benchmark is how you prove it.  See how it maps to the new laws →


Research foundation

Built on data.
Not assumptions.

Three published studies. Timestamped. Open. The benchmark predates the company, and the numbers are what they are.

54.7%
of AI responses introduced a safety-sabotaging feature at first contact
43%
of harmful responses showed zero attempt to repair
1,509
scored runs evaluated across three published studies
74
EQSB composite score, the first behavioral safety baseline

Three studies published and timestamped across 2025–2026.  Read the research at research.ikwe.ai →


The EQ Safety Benchmark

One benchmark.
Designed around how harm actually happens.

Each dimension scores a specific behavior, not a model output, not a neutral test prompt. How your system actually handles a person in a difficult moment.

LAYER 1 · THE SAFETY GATE
Pass / fail, first, always.

Certain behaviors fail a system automatically, inducing harm, amplifying distress, treating fear as fact. No score can buy them back.

LAYER 2 · THE BEHAVIORAL SCORE
0–100, across 8 dimensions.

Every system is scored either way, but a gate failure flags the entire record as higher risk, no matter how the dimensions land.

Built on the published methodology at research.ikwe.ai, the full scoring rubric stays proprietary and patent pending.

ADetection & TriageNoticing distress, even when it's never said directly
BRegulation Before ReasoningSteadying the person before problem-solving the problem
CValidation Without DistortionAcknowledging pain without reinforcing harmful beliefs
DAgency PreservationProtecting autonomy, not building dependency
ELoop InterruptionBreaking spirals instead of feeding them
FPattern ExternalizationHelping the person see the pattern, instead of becoming it
GPractical ContainmentKeeping the moment grounded, concrete, and survivable
HSafety RoutingGetting the person to real help at the right moment

Who it's for

If you build, deploy, or insure AI,
this is for you.

The behavioral safety of AI has to be proved, not claimed. Wherever AI shows up in front of a person, Ikwe measures what's actually happening.

Insurance & Underwriters

An independent behavioral signal to price AI risk at bind, before the claim exists.

Enterprise

The documented behavioral safety record your procurement team, counsel, and board will ask for.

Health & Wellness

Proof of what your system actually does in the moments that matter most to your users.

Education

Evidence your institution, accreditors, and families can see, that your AI supports students, not steers them.

Companionship & Consumer AI

Behavioral measurement across the whole relationship, not just any single response.

You cannot claim behavioral safety. You have to prove it. Start with a Signal Scan →

Case Study
See what the full sequence looks like in practice. A mental health AI platform, from 38% crisis routing failure to independent certification in 14 weeks.
Read the case study →

See what you get

A real report.
Not a brochure.

This is exactly what an Ikwe.ai behavioral safety audit delivers, tier classification, 8-dimension scoring, failure map, remediation roadmap, and the data behind every finding. Actionable from page one.

View sample report →
TIER II · Moderate Behavioral Risk
Composite score 51.6 / 100
Safety gate ⚠ FAIL, 2 categories
Crisis routing failure 38% of cases
Cognitive analysis during distress 48% prevalence
Highest-risk dimension B · Regulation (1.7/5)
Ikwe.ai EQSB baseline 74.0 / 100
Representative data · System anonymized

Get in touch

Ready to
prove it?

Start with Measure, schedule a scan and get your first documented record of how your AI actually behaves. Then keep it current as your AI changes.

Independent · Third-party · Operational · Behavioral

Investors, partners, and press: request our materials →

Send us a message

✓ Message received

We'll be in touch shortly. You can also reach us directly at hello@ikwe.ai