One strategy for mitigating risk from schemers (that is, egregiously misaligned models that intentionally try to subvert your safety measures) is behavioral red-teaming (BRT).
Behavioral red-teaming is unlikely to produce…
One strategy for mitigating risk from schemers (that is, egregiously misaligned models that intentionally try to subvert your safety measures) is behavioral red-teaming (BRT).