Hacker News new | past | comments | ask | show | jobs | submit | parrt's comments login

Wow. Cool. I have access to that model and have also seen some impressive context extraction. It also gave a really good summary of a large code base that I dumped in. I saw somebody analyze a huge log file, but we really need something like this needle in a needlestack to help identify when models might be missing something. At the very least, this could give model developers something to analyze their proposed models.


Funnily enough I ran a 980k token log dump against Gemini Pro 1.5 yesterday to investigate an error scenario and it found a single incident of a 429 error being returned by a third-party API provider while reasoning that "based on the file provided and the information that this log file is aggregated of all instances of the service in question, it seems unlikely that a rate limit would be triggered, and additional investigation may be appropriate", and it turned out the service had implemented a block against AWS IPs, breaking a system that loads press data from said API provider, leaving the customer who was affected by it without press data -- we didn't even notice or investigate that, and Gemini just randomly mentioned it without being prompted for that.


That definitely makes it seem like it's noticing a great deal of its context window. impressive.


The article shows how much better GPT-4o is at paying attention across its input window compared to GPT-4 Turbo and Claude-3 Sonnet.

We've needed an upgrade to needle in a haystack for a while and this "Needle In A Needlestack" is a good next step! NIAN creates a prompt that includes thousands of limericks and the prompt asks a question about one limerick at a specific location.


I agree, I paid for Claude for a while. Even though they swear the context is huge and having a huge context uses up tokens like crack, it's near useless when source code in context just a few pages back. It was so frustrating as everything else was as good as anything and I liked the 'vibe'.

I used 4o last night and it was still perfectly aware of a C++ class I pasted 20 questions ago. I don't care about smart, I care about useful and this really contributes to the utility.


BTW, how did you manage all of the throughput to the models and navigate the various throttling strategies for all the models you mentioned?


Looks really cool and useful. Seems like GPT-4o it's a lot better than 4.


:) Yeah, I use my own internal markdown to generate really nice html (with fast latex-derived images for equations) and then full-on latex. (tool is https://github.com/parrt/bookish)

I prefer reading on the web unless I'm offline. The latex its super handy for printing a nice document.


Even though it's shockingly common, I never cease to be surprised and delighted when authors who are on HN take the time to reply to comments about their work.

Thank you for doing this with Jeremy and sharing it with the world!


Sure thing! Very enjoyable to have people use our work.


Glad to be of assistance! Yeah, It really annoyed me that this critical information was not listed in any one particular spot.


Thanks! Took me a year to discover the key nut there. L1 vs L2 regularization is not well described I found so I went nuts trying to nail it down.


If you're interested, in my thesis I induced l1-regularized decision trees through a boosting style approach. Adding an l1 term and maximizing the gradient led to sparse tree.


Also note we recently added 1D and 2D classifier decision boundary plots. See https://github.com/parrt/dtreeviz/blob/master/notebooks/clas...


Thanks. It's morphed over time as we add functionality so it's less clean than before.


That's a good idea. thanks!


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: