More

parrt · 2024-05-14T20:45:10 1715719510

Wow. Cool. I have access to that model and have also seen some impressive context extraction. It also gave a really good summary of a large code base that I dumped in. I saw somebody analyze a huge log file, but we really need something like this needle in a needlestack to help identify when models might be missing something. At the very least, this could give model developers something to analyze their proposed models.

19h · 2024-05-14T20:59:05 1715720345

Funnily enough I ran a 980k token log dump against Gemini Pro 1.5 yesterday to investigate an error scenario and it found a single incident of a 429 error being returned by a third-party API provider while reasoning that "based on the file provided and the information that this log file is aggregated of all instances of the service in question, it seems unlikely that a rate limit would be triggered, and additional investigation may be appropriate", and it turned out the service had implemented a block against AWS IPs, breaking a system that loads press data from said API provider, leaving the customer who was affected by it without press data -- we didn't even notice or investigate that, and Gemini just randomly mentioned it without being prompted for that.

parrt · 2024-05-14T21:00:02 1715720402

That definitely makes it seem like it's noticing a great deal of its context window. impressive.

parrt · 2024-05-13T21:54:06 1715637246

The article shows how much better GPT-4o is at paying attention across its input window compared to GPT-4 Turbo and Claude-3 Sonnet.

We've needed an upgrade to needle in a haystack for a while and this "Needle In A Needlestack" is a good next step! NIAN creates a prompt that includes thousands of limericks and the prompt asks a question about one limerick at a specific location.

mianos · 2024-05-15T04:31:46 1715747506

I agree, I paid for Claude for a while. Even though they swear the context is huge and having a huge context uses up tokens like crack, it's near useless when source code in context just a few pages back. It was so frustrating as everything else was as good as anything and I liked the 'vibe'.

I used 4o last night and it was still perfectly aware of a C++ class I pasted 20 questions ago. I don't care about smart, I care about useful and this really contributes to the utility.

parrt · 2024-05-13T21:39:19 1715636359

BTW, how did you manage all of the throughput to the models and navigate the various throttling strategies for all the models you mentioned?

parrt · 2024-05-13T21:39:07 1715636347

Looks really cool and useful. Seems like GPT-4o it's a lot better than 4.

parrt · on July 31, 2023

:) Yeah, I use my own internal markdown to generate really nice html (with fast latex-derived images for equations) and then full-on latex. (tool is https://github.com/parrt/bookish)

I prefer reading on the web unless I'm offline. The latex its super handy for printing a nice document.

cs702 · on July 31, 2023

Even though it's shockingly common, I never cease to be surprised and delighted when authors who are on HN take the time to reply to comments about their work.

Thank you for doing this with Jeremy and sharing it with the world!

parrt · on July 31, 2023

Sure thing! Very enjoyable to have people use our work.

parrt · on July 31, 2023

Glad to be of assistance! Yeah, It really annoyed me that this critical information was not listed in any one particular spot.

parrt · on Sept 28, 2021

Thanks! Took me a year to discover the key nut there. L1 vs L2 regularization is not well described I found so I went nuts trying to nail it down.

bravura · on Sept 28, 2021

If you're interested, in my thesis I induced l1-regularized decision trees through a boosting style approach. Adding an l1 term and maximizing the gradient led to sparse tree.

parrt · on Sept 28, 2021

Also note we recently added 1D and 2D classifier decision boundary plots. See https://github.com/parrt/dtreeviz/blob/master/notebooks/clas...

parrt · on Sept 28, 2021

Thanks. It's morphed over time as we add functionality so it's less clean than before.

parrt · on Sept 28, 2021

That's a good idea. thanks!