Ignore the specific example of counting characters, I was just quickly coming up...

Ignore the specific example of counting characters, I was just quickly coming up with a situation where the instruction is at the end of the input. Here is a better example:

Input the full text of a novel, then ask for a minor detail (eg color of a car that is briefly mentioned in the middle of the book). Again a human can do this by flipping back to the relevant section but LLMs have no mechanism for this when using a sliding window attention scheme.

If the full input can fit in the context window then any LLM today would be able to extract the color of the car.