More

deepsquirrelnet · 2026-03-23T16:13:15 1774282395

Good article, and I think the "evolution of every AI system" is spot on.

In my opinion, the reason people don't use DSPy is because DSPy aims to be a machine learning platform. And like the article says -- this feels different or hard to people who are not used to engineering with probabilistic outputs. But these days, many more people are programming with probability machines than ever before.

The absolute biggest time sink and 'here be dragons' of using LLMs is poke and hope prompt "engineering" without proper evaluation metrics.

> You don’t have to use DSPy. But you should build like someone who understands why it exists.

And this is the salient point, and I think it's very well stated. It's not about the framework per se, but about the methodology.

sbpayne · 2026-03-23T16:23:13 1774282993

yeah this is the main point I wanted to get across! I rarely recommend people to use Dspy; but I think Dspy is often so polarizing that people "throw out the baby with the bathwater". They decide not to use Dspy, but also don't learn from the great ideas it has!

deepsquirrelnet · 2026-03-20T18:21:27 1774030887

This is even after the Hindenburg research report that found numerous screaming red flags a few years ago.

https://hindenburgresearch.com/smci/

deepsquirrelnet · 2026-03-15T16:57:33 1773593853

I worked at Micron in the SSD division when Optane (originally called crosspoint “Xpoint”) was being made. In my mind, there was never a real serious push to productize it. But it’s not clear to me whether that was due to unattractive terms of the joint venture or lack of clear product fit.

There was certainly a time when it seemed they were shopping for engineers opinions of what to do with it, but I think they quickly determined it would be a much smaller market anyway from ssds and didn’t end up pushing on it too hard. I could be wrong though, it’s a big company and my corner was manufacturing and not product development.

chrneu · 2026-03-15T17:28:34 1773595714

I worked at Intel for a while and might be able to explain this.

There were/are often projects that come down from management that nobody thinks are worth pursuing. When i say nobody, it might not just be engineers but even say 1 or 2 people in management who just do a shit roll out. There are a lot of layers of Intel and if even one layer in the Intel Sandwich drag their feet it can kill an entire project. I saw it happen a few times in my time there. That one specific node that intel dropped the ball on kind of came back to 2-3 people in one specific department, as an example.

Optane was a minute before I got there, but having been excited about it at the time and somewhat following it, that's the vibe I get from Optane. It had a lot of potential but someone screwed it up and it killed the momentum.

osnium123 · 2026-03-15T17:41:11 1773596471

Are you referring to the Intel 10nm struggles in your reference to 2-3 people?

empiricus · 2026-03-15T18:01:57 1773597717

This is actually insane. Do you mean 2-4 people in one department basically killed Intel? Roll to disbelief.

LASR · 2026-03-15T19:26:45 1773602805

Yes this is pretty common in large enterprise-ey tech companies that are successful. There are usually a small group of vocal members that have a strong conviction and drive to make a vision a reality. This is contrary to popular belief that large companies design by committee.

Of course it works exceptionally well when the instinct turns out to be right. But can end companies if it isn’t.

wtallis · 2026-03-15T18:15:51 1773598551

It's somewhat plausible that a small group of people in one department were responsible for the bad bets that made their 10nm process a failure. But it was very much a group effort for Intel to escalate that problem into the prolonged disaster. Management should have stopped believing the undeliverable promises coming out of their fab side after a year or two, and should have started much sooner to design chips targeting fab processes that actually worked.

rjsw · 2026-03-15T19:28:47 1773602927

A friend was working at Micron on a rackmount network server with a lot of flash memory, I didn't ask at the time what kind of flash it used. The project was cancelled when nearly finished.

deepsquirrelnet · 2026-03-13T16:04:38 1773417878

Interesting read! I love to see this spirit. I grew up with a different - but similar experience. Only, as an 80s and 90s kid, computers were nothing but limitations. Even when my dad built a machine with a 133MHz Cyrix chip, already a year later, it was outdated by computers with literally double the computing power.

That Cyrix machine was already miles ahead of the 386 that was handed down to me to play text based games on and learn dos through hard knocks. I remember leafing through old hard drives that had 10mb of capacity and realizing they had no value despite not being that old.

Later in college, I had the confidence to build my own first desktop with parts cobbled together from sketchy resellers. Athlon A1 single core 1ghz. Man that thing could fly.

deepsquirrelnet · 2026-03-12T02:19:38 1773281978

> I tapped into Pangram. Pangram is a remarkably good, conservative model for detecting LLM-generated text. These detectors have a bad rep among techies, but the objections are often based on outdated assumptions

Turing test is really in the rearview, huh?

Humans need machines to detect if a machine wrote the text, because humans aren’t sure.

xigoi · 2026-03-13T08:55:48 1773392148

In the Turing tent, you converse witt the other side instead of just reading static text. This makes a huge difference.

deepsquirrelnet · 2026-03-11T17:09:51 1773248991

The title being misleading is important as well, because this has landed on the front page, and the only thing that would be the only notable part of this submission.

The "new" on huggingface banner has weights that were uploaded 11 months ago, and it's 2B params. Work on this in the repo is 2 years old.

The amount of publicity compared to the anemic delivery for BitNet is impressive.

deepsquirrelnet · 2026-03-05T22:45:29 1772750729

Zero-shot encoder models are so cool. I'll definitely be checking this out.

If you're looking for a zero-shot classifier, tasksource is in a similar vein.

https://huggingface.co/tasksource/ModernBERT-large-nli

goodlux · 2026-03-07T04:03:50 1772856230

gliner2 does classification as well as entities and relationships

deepsquirrelnet · 2026-03-02T14:07:47 1772460467

Does this use something like xnnpack under the hood?

deepsquirrelnet · 2026-03-01T02:22:15 1772331735

4-bit quantization on newer nvidia hardware is being supported in training as well these days. I believe the gpt-oss models were trained natively in MXFP4, which is a 4-bit floating point / e2m1 (2-exponent, 1 bit mantissa, 1 bit sign).

It doesn't seem terribly common yet though. I think it is challenging to keep it stable.

[1] https://www.opencompute.org/blog/amd-arm-intel-meta-microsof...

[2] https://www.opencompute.org/documents/ocp-microscaling-forma...

zozbot234 · 2026-03-01T17:53:45 1772387625

mxfp4 is a block-based floating point format. The E2M1 format applies to individual values, but each 32-values block also has a shared 8-bit floating point exponent to provide scaling information about the whole block.

deepsquirrelnet · 2026-03-01T00:13:20 1772324000

That’s why I unsubbed today! Otherwise I might forget.