Is the color scheme on the FPR/safety chart reversed? It seems the best AUC is colored in orange, while the one with worst improvements on safety is green.
I think it's correct. It shows that smarter models perform better as monitors (so orange, which is Sonnet 4.5, is the best model in this chart and also has the best safety curve)
Indeed, thanks for the reply! I somehow thought orange conveyed "worst" (so I would have flipped the color order), curious if there was a reason for not using green.
Oh I see, sorry for misunderstanding your question. I think I picked orange for Claude because of the anthropic color theme; I wasn't really thinking about whether this conveyed "better" or "worse" beyond the position of the line. Basically, no good reason
Is the color scheme on the FPR/safety chart reversed? It seems the best AUC is colored in orange, while the one with worst improvements on safety is green.
I think it's correct. It shows that smarter models perform better as monitors (so orange, which is Sonnet 4.5, is the best model in this chart and also has the best safety curve)
Indeed, thanks for the reply! I somehow thought orange conveyed "worst" (so I would have flipped the color order), curious if there was a reason for not using green.
Oh I see, sorry for misunderstanding your question. I think I picked orange for Claude because of the anthropic color theme; I wasn't really thinking about whether this conveyed "better" or "worse" beyond the position of the line. Basically, no good reason