Discussion about this post

User's avatar
hersheys's avatar

Very interesting read! There are many questions that I remain curious about. When should we expect the first case of scheming in the wild? Do we have the capabilities to catch such cases, or will it slip by our defenses? How closely can a model organism actually replicate a scheming model, and what gaps do we need to fill in order for our safeguards to be useful against actual schemers? Have we witnessed cases of escape-breaking eliciting higher capabilities in password-locked or scheming models?

No posts

Ready for more?