Discussion about this post

User's avatar
sim gishel's avatar

Do you have a hypothesis for the mechanism that increases performance on repeats on n-hop tasks?

For tasks that can be done in parallel, it makes sense. But that doesn't make sense here. The only hypotheses I can think of are

1) The repetitions increase the salience of the task in the model

2) The model might be doing the something fuzzy inside, that allows it to pseudo-parallelize, like doing the task probabilistically many times, where each internal thread has a low % chance of getting the right answer, but the % chance of the right answer is higher than any of the wrong answers, and by doing some kind of majority vote internally, the chance of getting the right one increases as you give it more space to spin up more threads.

No posts

Ready for more?