Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The prospects for 128 bit processors (John R. Mashey, 1995) (yarchive.net)
32 points by doener on May 21, 2024 | hide | past | favorite | 44 comments


RISC-V has defined support. See "3.4 The RV128I Base ISA" https://people.eecs.berkeley.edu/~krste/papers/EECS-2016-1.p.... You can find an emulator "RISC-V system emulator supporting the RV128IMAFDQC base ISA" https://bellard.org/tinyemu/, too.


128 bit address space is overkill, but I wonder if a hardware-backed 128 bit integer type would be useful for bit twiddling - or whether spreading the bits over two 64-bits integers is 'good enough' assuming that both halfs reside in the same cache line, and it's unlikely that a single bit twiddling operation needs to affect both halfs.

I guess I will find out soon because I started a home computer emulator experiment in Zig where I'm essentially mapping mainboard wires to bits in a wide integer, and where 64-bits definitely won't enough, but 128 bits most likely will cover most target systems. Very curious about the x86-64 and ARM compiler output.


I am actually currently working on a CPU that has an auxiliary unit with a 521-bit integer type (although many instructions are 512-bit). These have some interesting effects, but I can't say they are tremendously useful for bit hacks, at least not at the cost of the hardware to support them. Multiplications of this size are a very costly operation, for example. Even an addition or a leading zero count is a significant area. Vector units are about as good as you will get for most of this stuff.

This unit, by the way, is intended primarily for cryptography (521 is the length of a large mersenne prime). Cryptography operations are the only times I have ever seen an integer wider than 64 bits in the wild, and most people use bignum libraries.

The same pressures apply to 128-bit cores. It will make a lot of hardware a lot bigger and more complicated.


Aren't there physical limits to how large integers you can add with carry at once per GHz?


Not when you build the CPU. You can pipeline the operation. All these ops have very long latency.


128 bits of address space would be useful if you're building an architecture with one level of storage.

And there actually has been one family of computers that have used 128-bit pointers. IBM S/38 and its successors did that. It has a machine independent instruction set that is then compiled down to actual machine code, and that uses 128-bit pointers for future proofing.


There have been a few research OSes designed around a concept of "all memory is persistent". Iirc this allows some simplifications. For example, no need for a file system: the copy-from-disk-to-RAM is pointless when there's no difference between RAM & disk.

This didn't catch on as the tech to make it practical wasn't there. And existing OSes were 'good enough' due to memory-mapped files, smart caching techniques etc. Plus boatloads of software using that.

But if all background storage were treated as one giant RAM, I could see some cloud/AI big boys crossing that 64-bit size boundary. Or some science/engineering projects like CERN.

That said: it's the apps, really. If OS + biggest app working sets easily fit in a 64-bit address space, then why throw 2x the bits at it? And disk <-> RAM transfers (+filesystems, cache etc) are a solved problem.


I thought this was where optane was going to take us, back to the days of "your memory is also your persistent storage" like an old PDP/8 with core memory...


> 128 bit address space is overkill, but I wonder if a hardware-backed 128 bit integer type would be useful for bit twiddling - or whether spreading the bits over two 64-bits integers is 'good enough' assuming that both halfs reside in the same cache line, and it's unlikely that a single bit twiddling operation needs to affect both halfs.

Don't we already have that? My reading of https://en.wikipedia.org/wiki/Advanced_Vector_Extensions is that AVX starts at 128-bit and goes up from there.


Yeah, but can compilers actually do the necessary magic when encountering something like:

    x = ((y & (1<<101)|(1<<102)) >> 10);
My impression was always that the vector extensions are good for SIMD operations, but not "wide integer" operations, but I might be wrong of course (e.g. is bit-shifting across "lanes" even possible?)


It's not really a compiler issue tho. SIMD is meant to pointwise map an operation across multiple "lanes" in a single operation. You can't have lane interdependence on the result.


(V)PALIGNR shifts between lanes just fine, but only with byte granularity.


Because of the byte granularity, there's no interdependence between lanes. The result within a lane is not affected by the values of the other lanes.


As it turns out, yes: https://godbolt.org/z/xEqrx5dY4

SSE2 adds the PSRL/PSLL operators, which are basically i128 shift operators on vector registers (i.e., shift continues between lanes), so you can pretty easily map i128 to vector registers if you're only doing and/or/xor/shifts.


No, it doesn't shift across 64-bit boundaries. Take a look at the gcc output in your link ``` movdqa xmm0, XMMWORD PTR [rdi] movdqa xmm1, xmm0 psrlq xmm0, 10 psrldq xmm1, 8 psllq xmm1, 54 por xmm0, xmm1 movdqa xmm1, XMMWORD PTR .LC0[rip] ```

That's a lot of psr and psl instructions for a "single 128-bit wide shift"...


The 128-bit wide shifts PS{L,R}LDQ only have byte granularity. They're a special case of a byte shuffle.


The "Q" at the end is "quadword", so it's 64 bits (8 bytes).


No, you're thinking of PS{R,L}LQ.


On 64 bit computers you've been able to use 128 bit unsigned ints as a gcc extension for a long time now and the bit twiddling stuff works exactly as you'd expect. The relevant types are __int128 / unsigned __int128.

Clang has something similar and my understanding is that c23's _BitInt will let you do this.


Yes, but compilers seem to disagree on whether they use a pair of 64-bit registers, or an SSE register under the hood (reusing link from a reply): https://godbolt.org/z/xEqrx5dY4


So what are the chances that our grandkids PCs will have over 18 exabytes of RAM in 2043?


I'm still trying to top Weird Al's 100GB of RAM 25 years later!

https://en.wikipedia.org/wiki/It%27s_All_About_the_Pentiums


while the back-of-the-napkin math is surprisingly not terrible, of course it's a log scale so Mar's Law is relevant: "Everything is linear if plotted log-log with a fat magic marker".

Mashey was really talking about workstations, not PCs. of course, the line is blurred or non-existent now. what if we look at x86ish PCs?

presume that in 1995, a nicely appointed high-end PC was a 486DX-33 (w/487) and 16MB of ram. that requires 24 bits of physical address. using 3/2*bits per year estimate, we find that we need more than 32 bits around 1995+(3/2*(32-24)) = 2007.

AMD64 came out in 2003/4. but my own recollection is it wasn't really the sunset for 32 bit PCs until just about that timeframe (2007 ish). so that's not too far off.

now apply forward to when we "use up" 48 bits (or 52). and it would be around 2031-2037. possible?

now the other thing, is that (imho) a nicely appointed PC today in 2024 is 64GB (36 bits) vs. 16GB (24 bits) in 1995. does this track? not really. the 2 bits every 3 years would predict that we'd want 64GB machines in 2011? that's not really realistic. and it'd predict that we'd want ~16TB PCs by 2025.

it seems apparent that the exponential growth that was happening in the 1980s and 1990s has either slowed substantially or is no longer exponential for perhaps the last 20ish years.

i think the lesson (and for Moore's law too) is that apparent exponential growth is not going to continue forever.


One area that 128bit points would be useful for is tagging.

This could be used to improve memory safety. For example, a pointer inside a range could potentially include its bounds.


> For many years DRAM gets 4X larger every 3 years, or 2 bits/3 years.

This trend in DRAM scaling has stopped quite a while ago, though I don't know when exactly. I think it was before 4 GB became common in desktop computers.


Is there any reason we wouldn't keep advancing the computer bits beyond 64? "640K ought to be enough for anybody" has held up nicely, for example.


At some point we'll run out of atoms in the solar system for the physical memory ;)

(a large virtual address space is of course useful on its own, for instance never reusing a memory address, but we're not even close at scratching the 64-bit barrier - e.g. AFAIK x86-64 CPUs have are limited to 48 bits virtual and 52 bits physical address range).


Yes, it was 48 bits virtual but Intel is starting to introduce SKUs with support for 57 bits of virtual address space:

https://en.wikipedia.org/wiki/Intel_5-level_paging


Exponential growth. 640k is obviously not enough. Nor is double that. Or double that. Or double that. But keep doubling long enough, and eventually the length of time before you need to do it again will be considerable.

We might be there. And 128 bits is a lot of bits. You know what bit width you need to represent twice as many states as a 64 bit value?

That's right, a 65 bit value.


It kinda has been advancing, but the advancement has been happening on the SIMD/SSE/AVX side of things (up to 512 now).

Regarding the non-SIMD side of x86, what are good driving needs to process 128-bit values among the normal instruction stream?

- 128 bits is 16 bytes, and that's a decent maximum length for many textual identifiers. These can be loaded, compared, and processed with single instructions now and possibly be processed without intermediate memory accesses.

- It would be super convenient to load a GUID/UUID or IPv6 in a single register/instruction I guess. Would Intel get an `RDUUIDV4` instruction and be able to generate them natively?

- A bigger space for `mmap()` could have interesting possibilities.

- 64 bits of additional randomization available for memory paging could make security techniques like ASLR a bit better.


Data-level parallel processing (or SIMD vector width) is a different thing to address space width. If you want to see some _really_ wide units, look at GPUs.

For UUIDs, 16-byte short strings and IPv6, there's no real reason the SIMD units couldn't do the work there. (Granted the existing vector units may be a bit short on features for working "across" lanes - I'm not sure how capable they are at dealing with null-terminated C strings for example).

In principle at least (and allowing for some snags around memory alignment), C++ std::strings with the short string optimisation (which stores the string's data on the local stack, if it's less than a certain number of bytes) can already be loaded into vector registers and indeed never materialised as memory at all. How much this happens in pratise I wouldn't like to say, but it's not that hard to roll your own stringid_16 or whatever with conversion operators.


> 640K ought to be enough for anybody

That statement still isn't false today.


I also think 128 bit is overkill. But I wonder with this AI push, if 128 bit may help with processing and power ?


You don't need 128 bits for memory addressing, but for just processing - yes, and in fact 128 bits is far less than we're using already! If you look at https://github.com/ggerganov/llama.cpp you'll see this line:

> AVX, AVX2 and AVX512 support for x86 architectures

Guess what the 512 in AVX512 stands for?;)

On GPUs I'm pretty sure the same thing is in play, but I'm less familiar. A quick search turns up ex. https://developer.nvidia.com/blog/implementing-high-precisio... which makes me think yes.


Qemu 7.x includes support for RV128.


256 is the magic number


We need 166 bits to address each atom on Earth uniquely. We won't need to go beyond that until we are interplanetary.


You are assuming each atom represents one bit. But each atom could represent 2^N states (charge, spin, location, etc) in which case you could store 2^(166+N) bits on Earth!


I was referring to addressing each atom individually. Indeed, you could possibly store more than 1 bit of information at each address. Maybe call it an "atombyte".

Similarly in computer RAM, we don't address each "bit" individually but actually each "byte" (8 bits). Or maybe we address each word?


I guess I'll never be able to individually label my quarks and gluons. Pity.


128-bit address space is probably enough to map the whole internet.


Oops. Sorry. I’d delete this if I still could.


128-bits is probably enough to mmap the whole internet.


This is the next step after Plan 9. We need new a new memory-based filesystem though.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: