Discussion about this post

User's avatar
Yuvanesh Anand's avatar

Thanks for sharing proposals like these! Super useful for someone who’s new in this field to be able to break in and have some impact. I personally found subverting probes purely through prompting a super interesting proposal and took a stab at it myself!:

https://open.substack.com/pub/yuvaaaaa/p/models-can-be-prompted-to-fool-probes?r=1821xm&utm_medium=ios

Would love some feedback or to chat on results!

Expand full comment
Aaron's avatar

Thanks for sharing these proposals! They've been very useful for project idea generation for my students. But I'm a little unclear if this post should be interpreted as:

1. Redwood is already working on executing some of these ideas. External researchers should come up with novel extensions for your own projects.

2. Redwood is already working on executing some/all of these ideas. External researchers should contact the proposal author if they're interested in executing one of them.

3. These are all open ideas and we'd love for external researchers to directly jump on any of them.

Expand full comment
2 more comments...

No posts