Anthropic repeatedly accidentally trained…

Apr 14

Safely navigating the intelligence explosion will require much more careful development

Read →

4 Comments

Nick Bulka

Apr 14

Are we at all concerned about our private chat / coding data being trained on by Anthropic models?

They show that dark pattern of: How would you rate this chat? 1: Bad 2: Ok 3. Good

which gives them access to the whole chat through their terms of service, especially if you press the one key out of habit, expecting to accept the model’s next step.

Trust violation.

I could see how Mythos boost is just the culmination of a feature flag: extract high signal intelligence from user feedback sessions. Need clarity here, Anthropic.

JSD

Apr 14

> was isolated to three specific sub-domains of our environment mix: GUI computer use, office-related tasks, and a small set of STEM environments.

Given this, I wonder if CoT controllability is better on SWE-related tasks than on STEM, office-related, or GUI computer use tasks.

Reply (1)

JSD

Apr 14

Ah maybe not if this is mostly a propensity effect.

JSD

Apr 14

> It also seems plausible that the warm-start data contains examples where AIs are clearly doing some reasoning but don’t think about it and this transfers broadly.

Here "this transfers broadly" means, this transfers to the model not distinguishing thinking from output in other situations? Or is that not exactly what you meant?

Redwood Research blog

Anthropic repeatedly accidentally trained…