3 Comments
User's avatar
Alvin Ånestrand's avatar

If AIs are allowed to output unlimited filler tokens on their own initiative, do you think this may enable many steps of serial thinking at some level of capability, even without neuralese?

sim gishel's avatar

Do you have a hypothesis for the mechanism that increases performance on repeats on n-hop tasks?

For tasks that can be done in parallel, it makes sense. But that doesn't make sense here. The only hypotheses I can think of are

1) The repetitions increase the salience of the task in the model

2) The model might be doing the something fuzzy inside, that allows it to pseudo-parallelize, like doing the task probabilistically many times, where each internal thread has a low % chance of getting the right answer, but the % chance of the right answer is higher than any of the wrong answers, and by doing some kind of majority vote internally, the chance of getting the right one increases as you give it more space to spin up more threads.

Alvin Ånestrand's avatar

If I understand things correctly, some information is stored from the residual stream for each forward pass, that is then used in the coming forward passes. It is not full neuralese, just some values that do not need to be re-computed each pass.

But this apparently allows for rudimentary serial thinking.

(Edit: I think the mechanism is mostly the same for both filler tokens and problem repeats.)