Hacker Newsnew | past | comments | ask | show | jobs | submit | svnt's commentslogin

Yeah it’s called a regex. With a lot of human assistance it can do less but fits in smaller spaces and doesn’t break down.

It’s also deterministic, unlike llms…

Is this true for anything beyond the simplest LLM architectures? It seems like as soon as you introduce something like CoT this is no longer the case, at least in terms of mechanism, if not outcome.

Those products aren’t typically described as having been “hyped” though — just successful or viral. Hyped has a sort of derogatory/schadenfreude subtext.

This was published right before people started experimentally validating the Landauer limit. I am not sure why it hasn’t been taken down at some point as the evidence has accumulated:

2012 — Bérut et al. (Nature) — They used a single colloidal silica bead (2 μm) trapped in a double-well potential created by a focused laser. By modulating the potential to erase the bit, they showed that mean dissipated heat saturates at the Landauer bound (k_B T ln 2) in the limit of long erasure cycles.

https://www.physics.rutgers.edu/~morozov/677_f2017/Physics_6...

2014 — Jun et al. (PRL) — A higher-precision follow-up using 200 nm fluorescent particles in an electrokinetic feedback trap. Same basic physics, tighter error bars.

https://pmc.ncbi.nlm.nih.gov/articles/PMC4795654/

2016 — Hong et al. (Science Advances) — First test on actual digital memory hardware. Used arrays of sub-100 nm single-domain Permalloy nanomagnets and measured energy dissipation during adiabatic bit erasure using magneto-optic Kerr effect magnetometry. The measured dissipation was consistent with the Landauer limit within 2 standard deviations using the actual the basis of magnetic storage.

https://www.science.org/doi/10.1126/sciadv.1501492

2018 — Guadenzi et al. (Nature Physics) — Opens with:

The erasure of a bit of information is an irreversible operation whose minimal entropy production of kB ln 2 is set by the Landauer limit1. This limit has been verified in a variety of classical systems, including particles in traps2,3 and nanomagnets4. Here, we extend it to the quantum realm by using a crystal of molecular nanomagnets as a quantum spin memory and showing that its erasure is still governed by the Landauer principle.

https://www.nature.com/articles/s41567-018-0070-7

The Landauer limit is not conjecture.


I haven't finished reading this yet, but I don't think the author is saying that the Landauer limit for erasure is wrong. They're saying that there are other limits in computing beyond erasure. I think this makes sense; although reversible computing should be possible at zero temperature and infinite precision, realistic computers need some way to remove entropy that accumulates during the computation.

So I don't think their claim is in tension with any of the papers that you cite.


I'm not sure, but isn't 2 standard deviations a bit low? Especially so for something that can be done in a lab. It seems that 2 SD is the minimum threshold for getting published. Can we be sure that these are properly reviewed?

Could it be possible that you confused the number of standard deviations one needs to falsify something? For instance, if two things are different we may want to be as many SD as we can apart. Here, on the other hand, the data agree _within_ 2S D.

That was the limit of just one experimental approach that was peer reviewed and published in a major journal. As you can see there are many experiments validating the limit and none invalidating it.

The reality is that the Landauer limit is vanishingly small. I would encourage you to review the experiment methodology and see if you can come up with better, fundable methods.


Is the focus on the erasure of a bit, rather than writing a bit, just conventional or is there a significant difference between the processes?

Erasure is logically irreversible, writing a bit is not. When you erase a bit you compress the logical phase space of the closed system, which means the missing information has to go somewhere — in this case a couple of very low energy phonons into the larger environment.

Ah, I thought writing a bit was irreversible, because after writing say 1, the previous state could have been a 0 or a 1. But in fact writing a bit should be thought of as the whole process "0 to 1" or "1 to 1", including the initial bit, so that the process is logically reversible. Is that right? Then what I had in mind as an irreversible process of writing would be equivalent to first erasing the bit and then writing the new one.

For people who want to ask a model for an app, or a website, or something at a level of “hey you make apps right, I have had this idea for years…” the experience is akin to a slot machine — sometimes they get what they imagined their description would create and it works, and sometimes they get a hollow chocolate approximation.

Only if the model is actually a human or equivalent, otherwise we don’t know what it is.

That’s because it is literally just a feedback loop?

The readme opens with this:

> I have an RTX 5070 with 12 GB VRAM and I wanted to run glm-4.7-flash:q8_0, which is a 31.8 GB model. The standard options are:

> Offload layers to CPU — works, but drops token/s by 5–10× because CPU RAM has no CUDA coherence. You end up waiting. Use a smaller quantization — you lose quality. At q4_0 the model is noticeably worse on reasoning tasks.

> Buy a bigger GPU — not realistic for consumer hardware. A 48 GB card costs more than a complete workstation.

> None of those felt right, so I built an alternative: route the overflow memory to DDR4 via DMA-BUF, which gives the GPU direct access to system RAM over PCIe 4.0 without a CPU copy involved.

And then limps home with this caveat on the closest thing to a benchmark:

> The PCIe 4.0 link (~32 GB/s) is the bottleneck when the model overflows VRAM. The best strategy is to shrink the model until it fits — either with EXL3 quantization or ModelOpt PTQ — and use GreenBoost's DDR4 pool for KV cache only.

I think the reason it refers it to DDR4 is because that is how the user explained it to their coding agent. LLMs are great at perpetuating unnecessary specificity.


Given that 32 GB/s is significantly worse than CPU to RAM speeds these days, does the additional compute really make it any faster in practice? The KV cache is always on the GPU anyway unless you're doing something really weird, so it won't affect ingestion, and generation is typically bandwidth bound. With something like ×16 PCIe 6.0 it would actually make sense, but nothing less than that, or maybe for smaller dense models that are more compute bound with 8x PCIe 6.0 or 16x 5.0 but that's already below DDR5 speeds.

Additional compute is generally a win for prefill, while memory bandwidth is king for decode. KV cache however is the main blocker for long context, so it should be offloaded to system RAM and even to NVMe swap as context grows. Yes that's slow on an absolute basis but it's faster (and more power efficient, which makes everything else faster) than not having the cache at all, so it's still a huge win.

Well if you do that then you reverse the strengths of your system. It might be best to work with the context length you can offload, like a normal person.

> "wanted to run glm-4.7-flash:q8_0" > q8_0

a well made (as in, unsloth) smaller quant will help a good amount here, without a notable reduction in performance or increase in perplexity


No, they stop hunt their way to depressed prices where they then buy anticipating the recovery while you closed out your “safe” retirement positions at -15%.


You don’t put stop losses on retirement positions. That’s an incredibly dumb thing to do for long term investors.

It’s literally a “sell low” policy.


You use a trailing stop loss. You get closed out 15% down from the top, not 15% down from purchase. The alternative in a 24 hour market is worse — the news of a real event hits and by the time you wake up and respond you’re down 50% or more and the stock isn’t coming back.

This policy change is to hunt profit from a safety mechanism used by retail traders.

It is something that should yield a lot of profit for 24 hour trading systems during a downturn.


You don’t put stop losses on retirement positions, period.

Doing that literally any time in the last 30 years would have been a dumb idea if you weren’t retiring in the next year.

You would have gotten stopped out and then what? What magic crystal ball do you use to decide to get back in?

Look at the violent downturn during COVID. You would have been stopped out and then likely would have taken a loss and missed one of the largest bull markets in history.

Stop losses with or without trading are for day traders.


>while you closed out your “safe” retirement positions at -15%

User error


It’s funny and also disadvantages everyone who can’t trade 24/7. Win/win?

That is a reasonable position, however the assumption that it is the administration that is gaming them vs other motivated parties is open for discussion.


It is in fact not at all reasonable. They are saying that the BLS stats can't be trusted because they totally misunderstand the survey methodology. That isn't a reason!


I’d counter that if we were doing a good job gathering data that these structural biases could be compensated for with more conservative initial numbers.

At some point a lack of decision to take compensating action becomes faking the numbers.


> if we were doing a good job gathering data that these structural biases could be compensated for with more conservative initial numbers

There is no more conservative. The data will bias in the direction of trend. The point of the data are, in part, to measure that trend. Fucking with it to make it politically correct to the statistically illiterate is precisely the sort of degradation of data we’re worried about.

(They’re also useless as a time series if the methodology changes quarter to quarter. That’s the job of analysis. Not the data.)


What you wrote suggests the data will bias predictably, which matches my understanding.

Reporting biased data as the default because the bias compensation is already built into the audience seems like a weak argument for not improving.

They can provide for the continuation of data visibility/granularity by releasing the prior numbers as previously calculated and at the same time changing the calculation of the headline number to be better compensated.

The simpler argument is that changing it at all will result in a negative step change in the reporting that no one wants to take accountability for.


> What you wrote suggests the data will bias predictably

Ex post facto. Before the fact, we don’t know.

Imagine you know the weather will be a strong gust regardless of direction. Averaging the models will produce a central estimate. But you know it will be biased away from the center. You just don’t know, until it happens, in which direction.

> They can provide for the continuation of data visibility/granularity by releasing the prior numbers as previously calculated and at the same time changing the calculation of the headline number to be better compensated

They do. These data are all recalculated with each methodological change. They’re just deprecated indices the media don’t report on because they’re of academic, not broad, concern.

> simpler argument is that changing it at all will result in a negative step change in the reporting

Simpler but wrong. Those data would be useless for the same reason we don’t let CEOs smooth revenues.


I’m confused by this discussion. It seems like you said the biases were structural because we know who reports early and that is why the early numbers are always revised down. Structural implies known in advance.

It also seems like you said they shouldn’t revise the numbers but now you are saying they already do.

What am I misunderstanding?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: