The Overalignment Problem: When Safety Makes Models Useless
Safety is important. But there’s a failure mode nobody talks about: overalignment. Models so constrained they refuse legitimate requests. “I can’t help with that because it might be harmful.” You didn’t ask for anything…