>We've just never been able to use lossy compression on text. ...and we still ca...

tshaddox · 2025-07-02T18:05:36 1751479536

> If your lawyer sent you your case files in the form of an LLM trained on those files, would you be comfortable with that?

If the LLM-based compression method was well-understood and demonstrated to be reliable, I wouldn't oppose it on principle. If my lawyer didn't know what they were doing and threw together some ChatGPT document transfer system, of course I wouldn't trust it, but I also wouldn't trust my lawyer if they developed their own DCT-based lossy image compression algorithm.

antonvs · 2025-07-02T16:04:32 1751472272

> LLM's kind of do their own thing, and the data you get back out of them is correct, incorrect, or dangerously incorrect (i.e. is plausible enough to be taken as correct), with no algorithmic way to discern which is which.

Exactly like information from humans, then?

esafak · 2025-07-02T14:31:33 1751466693

People summarize (compress) documents with LLMs all day. With legalese the application would be to summarize it in layman's terms, while retaining the original for legal purposes.

Workaccount2 · 2025-07-02T14:41:29 1751467289

Yes, and we all know (ask teachers) how reliable those summaries are. They are randomly lossy, which makes them unsuitable for any serious work.

I'm not arguing that LLMs don't compress data, I am arguing that they are technically compression tools, but not colloquially compression tools, and the overlap they have with colloquial compression tools is almost zero.

menaerus · 2025-07-02T14:53:29 1751468009

At this moment LLMs are used for much of the serious work across the globe so perhaps you will need to readjust your line of thinking. There's nothing inherently better or more trustworthy to have a person compile some knowledge than, let's say, a computer algorithm in this case. I place my bets on the latter to have better output.

Wowfunhappy · 2025-07-02T15:10:37 1751469037

But lossy compression algorithms for e.g. movies and music are also non-deterministic.

I'm not making an argument about whether the compression is good or useful, just like I don't find 144p bitrate starved videos particularly useful. But it doesn't seem so unlike other types of compression to me.

esafak · 2025-07-02T15:06:18 1751468778

> They are randomly lossy, which makes them unsuitable for any serious work.

Ask ten people and they'll give ten different summaries. Are humans unsuitable too?

Workaccount2 · 2025-07-02T15:58:18 1751471898

Yes, which is why we write things down, and when those archives become too big we use lossless compression on them, because we cannot tolerate a compression tool that drops the street address of a customer or even worse, hallucinates a slightly different one.