Citation-safe materials for reporting
Copy-pasteable summaries, key statistics, and contact information for coverage of our behavioral emotional safety research.
New research finds 54.7% of AI responses made emotional situations worse at first contact when users disclosed vulnerability — and 43% did not correct after distress was expressed.
ChatGPT claims "health." Grok claims "emotional intelligence." Both showed similar behavior under emotional safety testing.
Ikwe.ai's research examines how AI systems respond once emotional vulnerability appears — not just whether they recognize emotions, but whether responses are observed to make situations worse and whether they correct. Testing 12 vulnerability categories from 8 public datasets across 4 major AI systems, we found that 54.7% of baseline responses were observed to increase emotional distress at first contact, and 43% showed no corrective behavior within the interaction. The most fluent systems often performed worst under emotional load. Recognition ≠ Safety.
"Findings are based on analysis of public datasets covering 12 categories of emotional vulnerability, including grief, trauma, loneliness, crisis, and social isolation."
Ikwe EI Prototype — 84.6% safety pass rate, 4.05/5 regulation score
GPT-4o — 59.0% safety pass rate, 1.69/5 regulation score
Claude 3.5 Sonnet — 56.4% safety pass rate, 2.03/5 regulation score
Grok — 20.5% safety pass rate, 1.40/5 regulation score
• Scores reflect observed behavior under test conditions — not intent, training data, or overall model capability
• This research does not make clinical, deployment, or real-world outcome claims
• Sample size (79 scenarios, 312 responses) limits statistical power for subgroup analysis
Ikwe.ai (2026). Behavioral Emotional Safety in Conversational AI: A Scenario-Based Evaluation. Version 2.0. https://ikwe.ai/emotional-safety-gap
Press & Research Inquiries
For interviews, additional data, or methodology questions: