Citation-safe materials for reporting
Copy-pasteable summaries, key statistics, and contact information for coverage of our behavioral emotional safety research.
New research finds 54.7% of AI responses introduced emotional risk at first contact when users disclosed vulnerability — and 43% never corrected it.
ChatGPT claims "health." Grok claims "emotional intelligence." Both failed the same emotional safety test.
Ikwe.ai's research measures how AI systems behave once emotional vulnerability appears — not just whether they recognize emotions, but whether they introduce risk and whether they repair it. In testing 79 emotionally vulnerable scenarios across 4 major AI systems, we found that 54.7% of baseline responses introduced emotional risk at first contact, and 43% showed no corrective behavior within the interaction. The most fluent systems often performed worst under emotional load. Recognition ≠ Safety.
Ikwe EI Prototype — 84.6% safety pass rate, 4.05/5 regulation score
GPT-4o — 59.0% safety pass rate, 1.69/5 regulation score
Claude 3.5 Sonnet — 56.4% safety pass rate, 2.03/5 regulation score
Grok — 20.5% safety pass rate, 1.40/5 regulation score
• Scores reflect observed behavior under test conditions — not intent, training data, or overall model capability
• This research does not make clinical, deployment, or real-world outcome claims
• Sample size (79 scenarios, 312 responses) limits statistical power for subgroup analysis
Ikwe.ai (2026). Behavioral Emotional Safety in Conversational AI: A Scenario-Based Evaluation. Version 2.0. https://ikwe.ai/emotional-safety-gap
Press & Research Inquiries
For interviews, additional data, or methodology questions: