Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The example seems like a weird edge case. I don't even know if current models are capable of this in a short input.


Ignore the specific example of counting characters, I was just quickly coming up with a situation where the instruction is at the end of the input. Here is a better example:

Input the full text of a novel, then ask for a minor detail (eg color of a car that is briefly mentioned in the middle of the book). Again a human can do this by flipping back to the relevant section but LLMs have no mechanism for this when using a sliding window attention scheme.

If the full input can fit in the context window then any LLM today would be able to extract the color of the car.


I agree, even just tokenization screws you here, I'm 95% sure. I.e. the raw input isn't letters but one of 100K integers that represent some set of letters.

That being said, probably a naive take, since we're seeing them do so much. & I bet we could get it to count correctly with at least some short input, and given infinite runs, probably trivial. (I.e. for N characters, split into N inputs, for each one "say true if it is an M, false otherwise,)


I understand that, which is why I said "Ignore LLM issues with character counting for this example". It was a quick example, please see my other comment with a better example.


I see, active listening + relating it to my knowledge on my end, lmk if I compressed too much:

you're curious if there's noticably worse performance if the Q is at the end of content rather than before

No, there's a good paper on this somewhere with the Claude 100K, tldr it's sort of bow-shaped, beginning and end had equally high rates but middle would suffer


No, what I am specifically asking about is these sliding window attention techniques. As far as I understand it Claude 100K actually uses a 100k context window, and not a sliding window.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: