Wow. Cool. I have access to that model and have also seen some impressive context extraction. It also gave a really good summary of a large code base that I dumped in. I saw somebody analyze a huge log file, but we really need something like this needle in a needlestack to help identify when models might be missing something. At the very least, this could give model developers something to analyze their proposed models.
Funnily enough I ran a 980k token log dump against Gemini Pro 1.5 yesterday to investigate an error scenario and it found a single incident of a 429 error being returned by a third-party API provider while reasoning that "based on the file provided and the information that this log file is aggregated of all instances of the service in question, it seems unlikely that a rate limit would be triggered, and additional investigation may be appropriate", and it turned out the service had implemented a block against AWS IPs, breaking a system that loads press data from said API provider, leaving the customer who was affected by it without press data -- we didn't even notice or investigate that, and Gemini just randomly mentioned it without being prompted for that.
The article shows how much better GPT-4o is at paying attention across its input window compared to GPT-4 Turbo and Claude-3 Sonnet.
We've needed an upgrade to needle in a haystack for a while and this "Needle In A Needlestack" is a good next step! NIAN creates a prompt that includes thousands of limericks and the prompt asks a question about one limerick at a specific location.
I agree, I paid for Claude for a while. Even though they swear the context is huge and having a huge context uses up tokens like crack, it's near useless when source code in context just a few pages back. It was so frustrating as everything else was as good as anything and I liked the 'vibe'.
I used 4o last night and it was still perfectly aware of a C++ class I pasted 20 questions ago. I don't care about smart, I care about useful and this really contributes to the utility.
:) Yeah, I use my own internal markdown to generate really nice html (with fast latex-derived images for equations) and then full-on latex. (tool is
https://github.com/parrt/bookish)
I prefer reading on the web unless I'm offline. The latex its super handy for printing a nice document.
Even though it's shockingly common, I never cease to be surprised and delighted when authors who are on HN take the time to reply to comments about their work.
Thank you for doing this with Jeremy and sharing it with the world!
If you're interested, in my thesis I induced l1-regularized decision trees through a boosting style approach. Adding an l1 term and maximizing the gradient led to sparse tree.