More

sliken · 2026-05-12T16:00:27 1778601627

C, for better or worse, is like a high level assembly language. You can do anything, which pretty much means you are going to have many security and correctness issues. Double free, use after free, off by one errors, buffer overflows, etc. Thus numerous CVEs.

Go has less flexibility, no pointer arithmetic, a healthy package system, and a smaller domain. Mostly consuming or providing network services. My favorite feature is channels. For me they make levering the performance of multi-core CPUs straight forward, and dramatically nicer than the C approaches I've tried like pthreads and mutexes.

I wouldn't rate go as secure as rust, but has a pleasingly developer friendly approach. Seems way more secure than C.

Making a pipeline where each stage is 1 to N threads is pleasingly easy, reliable, and performant.

sliken · 2026-04-20T16:46:48 1776703608

Sadly motherboards, tech journalist, and many other places confuse the difference between a dimm and channel. The trick is the DDR4 generation they were the same, 64 bits wide. However a standard DDR5 dimm is not 1x64 bit, it's actually 2x32 bit. Thus 2 DDR5 dimms = 4 channels.

For some workloads the extra channels help, despite having the same bandwidth. This is one of the reasons that it's possible for a DDR5 system to be slightly faster than a DDR4 system, even if the memory runs at the same speed.

fluoridation · 2026-04-20T18:25:02 1776709502

>However a standard DDR5 dimm is not 1x64 bit, it's actually 2x32 bit. Thus 2 DDR5 dimms = 4 channels.

Uh, surely that depends on how the motherboard is wired. Just because each DIMM has half the pins on one channel and the other half on another, doesn't mean 2 DIMM = 4 channels. It could just be that the top pins over all the DIMMs are on one channel and the bottom ones are on another.

sliken · 2026-04-21T04:36:47 1776746207

I think there's a standard wiring for the dimm and some parts are shared. Each normal ddr5 dimm has 2 sub channels that are 32 bits each, and the new specification for the HUDIMM which will only enable 1 sub channel and only have half the bandwidth.

I don't think you can wire up DDR5 dimms willy nilly as if they were 2 separate 32 bit dimms.

fluoridation · 2026-04-21T06:44:08 1776753848

Well, I don't know what to tell you. I'm not a computer engineer, but I assume Gigabyte has at least a few of those, and they're labeling the X870E boards with 4 DIMMS as "dual channel". I feel like if they were actually quad channel they'd jump at the chance to put a bigger number, so I'm compelled to trust the specs.

sliken · 2026-04-21T06:55:04 1776754504

In computer manufacture speak dual channel = 2 x 64 bit = 128 bits wide.

So with 2 dimms or 4 you still get 128 bit wide memory. With DDR4 that means 2 channels x 64 bit each. With DDR5 that means 4 channels x 32 bit each.

Keep in mind that memory controller is in the CPU, which is where the DDR4/5 memory controller is. The motherboards job is to connect the right pins on the DIMMs to the right pins on the CPU socket. The days of a off chip memory controller/north bridge are long gone.

So if you look at an AM5 CPU it clearly states:

   * Memory Type: DDR5-only (no DDR4 compatibility).

   * Channels: 2 Channel (Dual-Channel).

   * Memory Width: 2x32-bit sub-channels (128-bit total for 2 sticks).

fluoridation · 2026-04-21T08:18:24 1776759504

Why are you quoting something that contradicts you? It clearly states it's a dual channel memory architecture with 32-bit subchannels. The fact the two words are used means they mean different things.

>In computer manufacture speak dual channel = 2 x 64 bit = 128 bits wide.

Yes, because AMD64 has 64-bit words. You can't satisfy a 64-bit load or store with just 32 bits (unless you take twice as long, of course). That you get 4 32-bit subchannels doesn't mean you can execute 4 simultaneous independent 32-bit memory operations. A 64-bit channel capable of a full operation still needs to be assembled out of multiple 32-bit subchannels. If you install a single stick you don't get any parallelism with your memory operations; i.e. the system runs in single channel mode, the single stick fulfilling only a single request at a time.

sliken · 2026-04-21T09:40:26 1776764426

AM5 is the AMD standard, it's accurate, seems rather pedantic to differentiate between 2 sub channels per dimm and saying 4 32 bit channels for a total of 128 bit.

However the motherboard vendors get annoyingly hide that from you by claiming DDR4 is dual channel (2 x 64 bit which means two outstanding cache misses, one per channel) and just glossing over the difference by saying DDR5 dual channel (4 x 32 bit which means 4 outstanding cache misses).

> Yes, because AMD64 has 64-bit words.

It's a bit more complicate than that. First you have 3 levels of cache, the last of which triggers a cache line load, which is 64 bytes (not 64 bits). That goes to one of the 4 channels, there's a long latency for the first 64 bits. Then there's the complications of opening the row, which makes the columns available, which can speed up things if you need more than one row. But the general idea is that you get at the maximum one cache line per channel after waiting for the memory latency.

So DDR4 on a 128 bit system can have 2 cache lines in flight. So 128 bytes * memory latency. On a DDR5 system you can have 4 cache lines in flight per memory latency. Sure you need the bandwidth and 32 bit channels have half the bandwidth per clock, but the trick is the memory bus spends most of it's time waiting on memory to start a transfer. So waiting 50ns then getting 32bit @ 8000 MT/sec isn't that different than waiting 50ns and getting 64 bit @ 8000MT/sec.

Each 32 bit subchannel can handle a unique address, which is turned into a row/column, and a separate transfer when done. So a normal DDR5 system can look up 4 addresses in parallel, wait for the memory latency and return a cache line of 64 bytes.

Even better when you have something like strix halo that actually has a 256 bit wide memory system (twice any normal tablet, laptop, or desktop), but also has 16 channels x 16 bit, so it can handle 16 cache misses in flight. I suspect this is mostly to get it's aggressive iGPU fed.

sliken · 2026-04-20T16:42:59 1776703379

> Quad-channel RAM is common on consumer desktops

Yes, but tablets, laptops, and normal (non-HEDT) desktops have 4 channels, 4x32 bit = 128 bit wide. Modern memory with DDR5 allows two 32 bit channels on a 64 bit dimm. The previous gen DDR4 would allow 1 64 bit channel on a 64 bit dimm.

So strix halo (on laptops, tablets, and desktops) allows for a 256 bit wide memory system, providing twice the memory bandwidth of any ryzen or intel i3/i5/i7/i9. The Apple pro (256 bit), max (512 bit), and ultra (1024 bit) lines of apple silicon have greater than 128 bit wide memory systems. On the AMD size it's just the Threadripper (256 bit) and Threadripper pro (512 bit), but those are typically in expensive workstations that are physically large, expensive, and need substantial cooling.

So the HALO is pretty unique (outside of Apple) for providing twice the memory bandwidth of anything else that fits in the tablet, laptop, or small desktop category.

sliken · 2026-04-17T16:21:12 1776442872

As you dig deeper I think you'll find a method behind the madness.

Sure modules just play with env variables. But it's easy to inspect (module show), easy to document "use modules load ...", allows admins to change the default when things improve/bug fixed, but also allows users to pin the version. It's very transparent, very discover-able, and very "stale". Research needs dictate that you can reproduce research from years past. It's much easier to look at your output file and see the exact version of compiler, MPI stack, libraries, and application than trying to dig into a container build file or similar. Not to mention it's crazy more efficient to look at a few lines of output than to keep the container around.

As for slurm, I find it quite useful. Your main complaint is no default systemd service files? Not like it's hard to setup systemd and dependencies. Slurms job is scheduling, which involves matching job requests for resources, deciding who to run, and where to run it. It does that well and runs jobs efficiently. Cgroup v2, pinning tasks to the CPU it needs, placing jobs on CPU closest to the GPU it's using, etc. When combined with PMIX2 it allows impressive launch speeds across large clusters. I guess if your biggest complaint is the systemd service files that's actually high praise. You did mention logging, I find it pretty good, you can increase the verbosity and focus on server (slurmctld) or client side (slurmd) and enable turning on just what you are interested, like say +backfill. I've gotten pretty deep into the weeds and basically everything slurm does can be logged, if you ask for it.

Sounds like you've used some poorly run clusters, I don't doubt it, but I wouldn't assume that's HPC in general. I've built HPC clusters and did not use the university's AD, specifically because it wasn't reliable enough. IMO a cluster should continue to schedule and run jobs, even if the uplink is down. Running a past EoL OS on an HPC cluster is definitely a sign that it's not run well and seems common when a heroic student ends up managing a cluster and then graduates leaving the cluster unmanaged. Sadly it's pretty common for IT to run a HPC cluster poorly, it's really a different set of contraints, thus the need for a HPC group.

Plenty of HPC clusters out there a happy to support the tools that helps their users get the most research done.

sliken · 2026-04-16T16:16:51 1776356211

> You won't like it, but the answer is Apple.

Or strix halo.

Seems rather over simplified.

The different levels of quants, for Qwen3.6 it's 10GB to 38.5GB.

Qwen supports a context length of 262,144 natively, but can be extended to 1,010,000 and of course the context length can always be shortened.

Just use one of the calculators and you'll get much more useful number.

3836293648 · 2026-04-16T21:10:39 1776373839

What Strix Halo system has unified memory? A quick google says it's just a static vram allocation in ram, not that CPU and GPU can actively share memory at runtime

sliken · 2026-04-17T03:49:29 1776397769

All. Keep in mind strix != strix halo.

You can get tablets, laptops, and desktops. I think windows is more limited and might require static allocation of video memory, not because it's a separate pool, just because windows isn't as flexible.

With linux you can just select the lowest number in bios (usually 256 or 512MB) then let linux balance the needs of the CPU/GPU. So you could easily run a model that requires 96GB or more.

ac29 · 2026-04-17T03:09:48 1776395388

> What Strix Halo system has unified memory?

All of them. The static VRAM allocation is tiny (512MB), most of the memory is unified

sliken · 2026-03-21T21:07:13 1774127233

Try this one: https://archive.is/UGzzc

sliken · 2026-03-15T20:55:54 1773608154

Any chance of turning this into a server/client with an API so people could use the language of their choice?

sliken · 2026-02-10T02:12:27 1770689547

Along similar lines, the double-slit experiment, seems simple. Two slits let light though and you get bands where they constructively or destructively interfere, just like waves.

However I still find it crazy that when you slow down the laser and one photon at a time goes through either slit you still get the bands. Which begs the question, what exactly is it constructively or destructively interfering with?

Still seems like there's much to be learned about the quantum world, gravity, and things like dark energy vs MOND.

ggm · 2026-02-10T02:22:43 1770690163

I had a conversation about this in HN some months back. It's a surprisingly modern experiment. It demanded an ability to reliably emit single photons. Young's theory may be 1800 but single photon emission is 1970-80.

(This is what I was told, exploring my belief it's always been fringes in streams of photons not emerging over repeated applications of single photons and I was wrong)

lefra · 2026-02-10T06:38:44 1770705524

To get single photons, you just need to stack up enough stained glass infront of a light source. That's been acheivable for aeons (the photon will go through at random time though).

The difficult part is single photon _detectors_, they're the key technology to explore the single-photon version of Young's experiment (which originally showed that light has wave-like properties).

jasonwatkinspdx · 2026-02-10T05:37:38 1770701858

The most simple answer here is the "fields are real, particles are excitation patterns of fields." And that's generally the practical way most physicists think of it today as I understand it.

If I make the equivalent of a double slit experiment in a swimming pool, then generate a vortex that propagates towards my plywood slits or whatever, it's not really surprising that the extended volume of the vortex interacts with both slots even though it looks like a singular "particle."

el_nahual · 2026-02-10T08:49:02 1770713342

And yet if you place a detector at the slits to know which slit the single photon goes through, you get no interference pattern at the end.

squeefers · 2026-02-10T10:44:53 1770720293

> However I still find it crazy that when you slow down the laser and one photon at a time goes through either slit you still get the bands.

why does nobody mention the fact the photon doesnt keep going through the same hole? like why is it randomly moving through the air in this brownian way? the laser gun doesnt move, the slit doesnt move, so why do different photons end up going through different holes?

sliken · 2026-01-14T16:40:12 1768408812

Sure, if you are smart enough. Maybe mount a small transmitter on a tree then use a directional antenna at a very low power and use the tree as a repeater.

Or use NVIS, which at least makes triangulation harder.

sliken · 2026-01-13T16:17:30 1768321050

Not that I have to tell this crowd. Tuxpaint is free, get it from tuxpaint.org, do not buy it, do not download it as part of a "desktop". I was talking to the author, apparently it's sadly often used to trick people into buying or downloading malware.