Hacker Newsnew | past | comments | ask | show | jobs | submit | deepsquirrelnet's commentslogin

I love using encoder models, and they are generally a better technology for this kind of application. But the price of GPU instances is too damn high.

I won’t lie that I’ve been unreasonably annoyed that I have to use a lot more compute than I need, for no other reason than an LLM API exists and it’s good enough in a relatively small throughput application.


One of the issues with using LLMs in content generation is that instruction tuning causes mode collapse. For example, if you ask an LLM to generate a random number between 1 and 10, it might pick something like 7 80% of the time. Base models do not exhibit the same behavior.

“Creative Output” has an entirely different meaning when you start to think about them in the way they actually work.


Creativity is a really ill-defined term, but generally it has a lot more to do with abstract thinking and understanding subtlety and nuance than with mode collapse. Mode collapse affects variation, which is probably a part of of creativity for some definitions of it, but they aren't the same at all.


SPLADE-easy: https://github.com/dleemiller/splade-easy

I wanted a simple retrieval index to use splade sparse vectors. This just encodes and serializes documents into flatbuffers and appends them into shards. Retrieval is just parallel flat scan, optionally with reranking.

The idea is just a simple, portable index for smaller data sizes. I’m targeting high quality hybrid retrieval, for local search, RAG or deep research scenarios.

SPLADE is a really nice “in-between” for semantic and lexical search. There’s bigger and better indexes out there like Faiss or Anserini, but I just kinda wanted something basic.

I was testing it on 120k docs in a simple cli the other day and it’s still as good as any web search experience (in terms of latency) — so I think it’ll be useful.

We’re still trying to clean up the API and do a thorough once over, so I’m not sure I’d recommend trying it yet. Hopefully soon.


I think that happened when gpt5 was released and pierced OpenAIs veil. While not a bad model, we found out exactly what Mr. Altman’s words are worth.


I haven’t used RCNN, but trained a custom YOLOv5 model maybe 3-4 years ago and was very happy with the results.

I think people have continued to work on it. There’s no single lab or developer, it mostly appears that the metrics for comparison are usually focused on the speed/MAP plane.

One nice thing is that even with modest hardware, it’s low enough latency to process video in real time.


FWIW this has happens in consulting too, not just product companies. Just swap “product” for “delivery”.


I think a less order biased, more straightforward way would be just to vectorize everything, perform clustering and then label the clusters with the LLM.


OP here. Yes that works too and get you to the same result. Remove risks for bias but the trade-off is higher marginal cost and latency.

The idea is also that this would be a classification system used in production whereby you classify data as it comes, so the "rolling labels" problem still exists there.

In my experience though, you can dramatically reduce unwanted bias by tuning your cosine similarity filter.


> For the searches we use hybrid dense + sparse bm25, since dense doesn't work well for technical words.

One thing I’m always curious about is if you could simplify this and get good/better results using SPLADE. The v3 models look really good and seem to provide a good balance of semantic and lexical retrieval.


I go back and forth on this. A year ago, I was optimistic and I have had 1 case where RL fine tuning a model made sense. But while there are pockets of that, there is a clash with existing industry skills. I work with a lot of machine learning engineers and data scientists and here’s what I observe.

- many, if not most MLEs that got started after LLMs do not generally know anything about machine learning. For lack of clearer industry titles, they are really AI developers or AI devops

- machine learning as a trade is moving toward the same fate as data engineering and analytics. Big companies only want people using platform tools. Some ai products, even in cloud platforms like azure, don’t even give you the evaluation metrics that would be required to properly build ml solutions. Few people seem to have an issue with it.

- fine tuning, especially RL, is packed with nuance and details… lots to monitor, a lot of training signals that need interpretation and data refinement. It’s a much bigger gap than training simpler ML models, which people are also not doing/learning very often.

- The limited number of good use cases means people are not learning those skills from more senior engineers.

- companies have gotten stingy with sme-time and labeling

What confidence do companies have in supporting these solutions in the future? How long will you be around and who will take up the mantle after you leave?

AutoML never really panned out, so I’m less confident that platforming RL will go any better. The unfortunate reality is that companies are almost always willing to pay more for inferior products because it scales. Industry “skills” are mostly experience with proprietary platform products. Sure they might list “pytorch” as a required skill, but 99% of the time, there isn’t hardly anyone at the company that has spent any meaningful time with it. Worse, you can’t use it, because it would be too hard to support.


Labels are so essential - even if you're not training anything, being able to quickly and objectively test your system is hugely beneficial - but it's a constant struggle to get them. In the unlikely event you can get budget and priority for an SME to do the work, communicating your requirements to them (the need to apply very consistent rules and make few errors) is difficult and the resulting labels tend to be messy.

More than once I've just done labeling "on my own time" - I don't know the subject as well but I have some idea what makes the neurons happy, and it saves a lot of waiting around.

I've found tuning large models to be consistently difficult to justify. The last few years it seems like you're better off waiting six months for a better foundation model. However, we have a lot of cases where big models are just too expensive and there it can definitely be worthwhile to purpose-train something small.


My personal opinion is that true engineering, which revolves around turning complex theory into working practice, has seen a decline in grace. Why spend a lot of time trying to master the art of engineering if you can ride the wave of engineering services and get away with it?

In true hacker spirit, I don't think trying to train a model on a wonky GPU is something that needs an ROI for the individual engineer. It's something they do because they yearn to acquire knowledge.


Eventually someone will make a killing on doing actual outcome measurements instead of just trusting the LLMs, Michael Lewis will write a popular book about it, and the cycle will begin anew...


I'm also seeing teams who expected big gains from fine tuning get incremental or moderate gains. Then they put it in production and regret the action as SOTA marches quickly.

I have avoided fine tuning because the models are currently improving at a rate that exceeds big corporate product development velocity.


Absolutely the first thing you should try is a prompt optimizer. The GEPA optimizer (implemented in DSPy) often outperforms GRPO training[1]. But I think people are usually building with frameworks that aren't machine learning frameworks.

[1] https://arxiv.org/abs/2507.19457


> “What would America’s Founding Fathers think if they were alive today?”

> For Cross, it is pointless to speculate about the present-day views of men who could not have imagined cotton candy, let alone the machine that makes it.

Some things, like “taxation without representation” seem to be timeless. You can call it irony or perhaps in some cases, a spade is still just a spade.


  >> men who could not have imagined cotton candy
It's a funny example since it looks like cotton candy might have been around in their time [0]. Machines spun cotton candy came about much later but I'm not overly suspicious of the claims in [0] as meringue[1] certainly existed in their lifetimes and the process isn't dissimilar. I'm certain these men could understand "like meringue, but with sugar!" And "a machine that spins fast!" These would not be great leaps for people at this time. It seems to make them out to be idiots rather than not being prophetic (presumably the intended meaning)

[0] https://web.archive.org/web/20150701005917/http://www.cotton...

[1] https://en.wikipedia.org/wiki/Meringue


Pretty sure they could have imagined cotton candy, anyway. There's nothing special about modern people that makes them more capable of comprehending new technology, it's just a matter of exposure.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: