Years ago I met a guy at a dinner party who had done some work for DARPA around super conducting computers. The big appeal was energy efficiency and speed: 100s of GHz and ~99% energy reduction.
One critical limitation is that information transfer from superconducting to regular wiring is relatively energy intensive, so a superconducting CPU won't actually result in advantages for input/output intensive processing. For processing heavy jobs though, if you don't have to shuffle data around the speed and energy savings will be very attractive.
AFAIK current CPUs power usage for calculations is in pJ ( pico joule ) and that's why we are doing speculative execution because that's almost free
the issue is that moving that data even to anything outside the core is cost about 100x that so the main every consumption for CPU is not in doing actual computation but in moving data around
It's mostly about avoiding the latency penalty. There's only so much ILP you can extract from the instruction stream before you get stuck waiting for a dependency. If you started executing that dependency speculatively, it will complete earlier, so you can launch the next instructions sooner.
That lets you speed up single-threaded execution more by adding more functional units (since you can't really jam the clock speed any more).
This is exactly it. Speculative execution comes from instruction set design and basic constraints of branching. The energy consumption could make speculative execution prohibitive, but it's not "the" reason we do it.
Noob question here: is this the reason specialized chips can work so much better for AI applications? That the computations needed in a neural network are entirely deterministic and there is no need for branch prediction?
Not really, it's more the massive parallelism. Branch prediction takes away something but mostly it's the parallelism. Each instruction you usually do in a GPU is a massive array in one go. In a cpu you need to use AVX type instructions and those are way more limited in the size of arrays they can process at once.
Yes, the GPUs provide massive parallelism. An NVIDIA RTX 4090 has 16384 "cuda cores". Whatever these cuda cores are, they must be much, much smaller than a CPU core. They do computations though, and CPU cores do computations too. Why do the CPU cores need to be so much larger, so a CPU with more than 64 cores is rarely heard of, while GPUs have thousands of cores?
Read about vector instructions a little bit and you'll see what I mean in the previous comment. A CPU has many many niche instructions it supports, it's way more flexible. A GPU is just trying to multiply the largest arrays possible as fast as possible, so the architecture becomes different. I don't think there's a quick way for you to grasp this without reading more about computer architecture and instruction sets, but you seem to be interested in it, so dive in :)
The tech described in the article features superconducting memory in combination with the 'SPU':
> To complement the logic architecture, we also redesigned a compatible Josephson-junction-based SRAM.
[...]
> But there are also some striking differences. First, most of the chip is to be submerged in liquid helium for cooling to a mere 4 K. This includes the SPUs and SRAM, which depend on superconducting logic rather than CMOS, and are housed on an interposer board. Next, there is a glass bridge to a warmer area, a balmy 77 K that hosts the DRAM. The DRAM technology is not superconducting, but conventional silicon cooled down from room temperature, making it more efficient. From there, bespoke connectors lead data to and from the room-temperature world.
... with the idea of stacking this technology to put a 'supercomputer in a shoebox':
> It is also straightforward to stack multiple boards of 3D superconducting chips on top of each other, leaving only a small space between them. We modeled a stack of 100 such boards, all operating within the same cooling environment and contained in a 20- by 20- by 12-centimeter volume, roughly the size of a shoebox. We calculated that this stack can perform 20 exaflops (in BF16 number format), 20 times the capacity of the largest supercomputer today. What’s more, the system promises to consume only 500 kilowatts of total power. This translates to energy efficiency one hundred times as high as the most efficient supercomputer today.
... or make datacenters a fraction of the size:
> In addition, with this technology we can engineer data centers with much smaller footprints. Drastically smaller data centers can be placed close to their target applications, rather than being in some far-off football-stadium-size facility.
Technical challenges I'd like to see more discussion on from IMEC:
- Superconductors are lossy with AC signals, is this loss not a big problem?
- Cooling things is very expensive, is this approach not simply transferring the costs to the cooling plant?
- I imagine that having a superconducting interconnect represents an impedance discontinuity, effects like the large kinetic inductances in superconducting materials have large ramifications on signal integrity. Any comments on how to deal with these problems? I worry when I see the words "top-down approach" towards this problem because from my experience it should be driven bottom-up. Superconducting circuits are not just: 'Oh it's lossless in DC, everything is great!' You have new limitations on trace dimensions because if you go too small or you drive too large a current you kill the superconducting state.
- NbTiN with amorphous Si as the barrier. How sensitive is the system to stray magnetic fields and stray radiation? If one of these structures switches into the normal state can it reset itself like sc photon detectors do or are the phonons trapped and the structure is latched in the normal state?
- Superconductors are lossy with AC signals, is this loss not a big problem?
In this logic style, data are represented with fluxons. One fluxon is the quantum unit of flux. You can't have "half a fluxon" in a superconducting loop. In that sense it's actually "more digital than" anything in commercial CMOS chips. The circuit can still malfunction of course -- the failure mode looks like a fluxon failing to move from one logic stage to the next.
The real worry is that cryocooler though. Cryocoolers that can do liquid helium temperatures have efficiency ratings around 1%, so that 500kW shoebox is going to need an entire cooling tower attached to its refrigerator.
There are a lot of ways to do helium recycling (common these days for MRI magnets), never 100% but close. Also there is a lot of helium wasted because it is not worth saving for the small amounts you could sell.
> the system promises to consume only 500 kilowatts of total power.
I don't understand, if these 500kW is the power consumed by a computation itself, or does the cooling equipment consumption included? To suck out 500kW of heat out of a shoebox box while keeping box's temperature at 4 kelvin... it seems to me impossible.
The 500kW almost certainly includes the cooling losses. The chips alone theoretically consume power roughly according to the Landauer-Von Neumann limit, which would be less than microwatts for the shoebox they describe. In real life it’ll be more due to the switching flux and EM radiation generated but I think still much less than a watt. Liquid helium cooling and pumping through the chip is super inefficient which is where 99% of the energy is going.
> there is a glass bridge to a warmer area, a balmy 77 K that hosts the DRAM. The DRAM technology is not superconducting,
There is a lot of not-superconducting circuitry in that shoebox; without it you wouldn't have enough memory to do anything that qualifies as "AI".
The reasonable conclusion is that the power figure covers only the components they have described. Note that the article fails to mention the word "cryocooler" even once.
I know of other research groups (not IMEC) who are pretty shameless about excluding cryocooler power and cost from all of their press-release/popsci materials. It's considered acceptable in this field.
Well, this isn't just going to keep inert matter cold, this will do massive amounts of computation, so it makes sense that it would need to dump a lot of heat.
Current fastest supercomputer has a little over 1 exaflop and consumes over 20MW. 20 exaflops at 500kW is a lot less power, but it'd probably still generate a substantial amount of heat.
Superconductors do not generate heat for a constant DC current. Computers are very, very AC, and you do get heat production anytime the current changes.
It is, IMO, a bit dubious whether or not anything is truly flowing in a superconductor at constant current. Electrons don't have identity, so the 'constant flow of electrons' can be rephrased as 'the physical system isn't changing'... and the degree to which you can tell that there are electrons moving about is also the degree to which the superconductor isn't truly zero-resistance.
You can tell there's a magnetic field, certainly. My argument is essentially one of nomenclature; I don't feel a constant electron-field should count as 'flowing'.
Of course it isn't actually constant -- there are multiple electrons, and you can tell that the electron field is quantized. But the degree to which that is visible, is the exact degree to which the superconductor nevertheless doesn't superconduct!
> The cooling overhead required to operate superconducting computers becomes less significant at higher than tens of petaflops. There, superconducting AI boards become more energy efficient than state-of-the-art GPUs.
Pretty clear to me.
> What’s more, the system promises to consume only 500 kilowatts of total power.
> The word "cooling" alone appears seven times, none of them in the context of specific power numbers.
Exactly like I said.
Additionally, nowhere is it said that "the system" includes the cryocooler. Since IMEC doesn't make cryocoolers, it would be unreasonable to assume that their "system" includes one.
This quantum stuff is starting to get even more scammy than cryptocurrency.
Sounds cool. Does the DRAM exist both in the SPU (4 kelvin) area and 77 kelvin area? This paragraph is a bit confusing:
> We call it a superconductor processing unit (SPU), with embedded superconducting SRAM, DRAM memory stacks, and switches, all interconnected on silicon interposer ... Next, there is a glass bridge to a warmer area, a balmy 77 K that hosts the DRAM.
Having 20 exaflops in a 20x20x12 cm volume is cool, but aren't you going to need a memory bandwidth close (factor 10 - 1000 lets say depending on arithmetic intensity) to that to be useful? And total memory capacity as well. I feel like the bytes/second/area (bandwidth flux) would be the limiting factor to make use of that compute density
I would like to see more research into superconducting applied to computing, vs. the "quantum computing" applied to computing. I feel that while superconductibility is actual and proven technology that just waits to be applied to a new field (computing), QC is a "dream about the future".
OTOH I think there are theoretical lower bounds on the amount of energy that needs to be ejected as heat from a non-reversible computation, such that a non-reversible SC would still need to produce some heat.
> OTOH I think there are theoretical lower bounds on the amount of energy that needs to be ejected as heat from a non-reversible computation, such that a non-reversible SC would still need to produce some heat.
Yes, it's called Landauer's principle, but it's so low it can be ignored for most intent and purposes.
Certain flavors of superconducting digital logic (there are a handful) such as AQFP (adiabatic quantum flux parametron) actually get close to the Landauer limit (with cooling power excepted). Interconnect (as some comments have noted) and lack of a good memory technology are the challenges for all these architectures.
In grad school I extrapolated the charts showing joules per bit of computation, looked at where it intersected the landauer limit compared to my own lifespan, and immediately lost interest in adiabatic computation. No regrets yet.
Interesting how they use inductors for memory?
My conventional thinking would imply that when you put a few billion of these together on a single package it's bound to create interference.
Amazing to read through this though, as someone who's studied mostly traditional computer architecture this stuff is absolutely boggling.
I wonder what kind of CPU architecture they're targeting, if any. If something like this exists for RISC-V and not ARM or Intel for example, this could have some pretty big effects on the domination of the industry. Or if Intel buys this company and its patents and this is all x86 only...
Ya that was my first thought too. Since they're targeting AI, they'll probably start with the streaming multiprocessors (SMs) and tensor processing units (TPUs):
I take issue with this though because the DSP/SIMD approach that GPUs and TPUs take is a subset of SMP/MIMD (symmetric multiprocessing and multiple instruction multiple data).
Where this matters is that today's neural networks are built on matrix operations, which is a narrow niche within computer science. A more general approach which would allow for faster evolution of the 20 or so other algorithms in AI would be to use compute clusters to explore genetic algorithms, simulated annealing and all the rest in playgrounds limited only by the developer's imagination. Loosely that would look like a desktop computer with perhaps 1000 or more cores appearing as a single CPU, that could be programmed in a language of choice like Erlang, Go, Julia, MATLAB, C/C++, etc (optionally using containerization tools like Docker). Or like having something akin to AWS EC2 locally.
I believe that this divergence in computing approaches is why Moore's Law ended around 2007 with emphasis switching to low cost and low power mobile and embedded systems. It's also why my heart fell out of programming and I stopped pursuing endlessly more photorealistic game engines that result in everyone making the same game over and over again.
If someone out there won the internet lottery and wants to invest in something truly disruptive, some low hanging fruit might be a highly scaled multicore RISC-V processor (as you mentioned) with 100-1000 cores running over 1 GHz using under 100 watts for under $1000. Targeting 10,000 to 100,000 cores by 2030 and 1 million cores shortly thereafter. That's the level of performance we should be expecting from companies like Intel, had Moore's Law continued. And it shows just how far expectations have fallen, with today's personal computers running little faster than those of 2010 for typical (non-GPU) workflows, tragically at the same price of $1000-3000.
I also wish that Imec would start with a consumer-level superconducting processor for under $1000, but unfortunately today's socioeconomic reality of widening wealth inequality doesn't support that. But it's easy enough to make liquid nitrogen from the air with a liquid nitrogen generator (looks like they start around $4000, that could/should be disrupted to $400 or less). Unfortunately their Josephson junctions still need liquid helium at 4k it looks like. Probably some company overseas will figure out a 77k liquid nitrogen version and bypass IP law, as these things tend to go lately.
"Scientists have predicted that by 2040, almost 50 percent of the world’s electric power will be used in computing. What’s more, this projection was made before the sudden explosion of generative AI. The amount of computing resources used to train the largest AI models has been doubling roughly every 6 months for more than the past decade. At this rate, by 2030 training a single artificial-intelligence model would take one hundred times as much computing resources as the combined annual resources of the current top ten supercomputers. Simply put, computing will require colossal amounts of power, soon exceeding what our planet can provide. "
One critical limitation is that information transfer from superconducting to regular wiring is relatively energy intensive, so a superconducting CPU won't actually result in advantages for input/output intensive processing. For processing heavy jobs though, if you don't have to shuffle data around the speed and energy savings will be very attractive.