There was some hope at the time that FPGAs could be used in a lot more applications in the data center. It is likely still feasible. Remember Hennessy published:
And maybe this is/was a pipe dream - maybe there aren't enough people with the skills to have a "golden age of architecture". But MSFT was deploying FPGAs in the data center and there were certainly hopes and dreams this would become a big thing.
FPGA dev is just much more painful and more expensive than software dev at every step.
That's in no small part because the industry & tools seem to be stuck decades in the past. They never had their "GCC moment". But there's also inherent complexity in working at a very low level, having to pay attention to all sorts of details all the time that can't easily be abstracted away.
There's the added constraint that FPGA code is also not portable without a lot of extra effort. You have to pick some specific FPGA you want to target, and it can be highly non-trivial to port it to a different one.
And if you do go through all that trouble, you find out that running your code on a cloud FPGAs turns out to be pretty damn expensive.
So in terms of perf per dollar invested, adding SIMD to your hot loop, or using a GPU as an accelerator may have a lower ceiling, but it's much much more bang for the buck and involves a whole lot less pain along the way.
It's hard to find places where FPGAs really win. For relatively simple tasks FPGA can beat just about anything in latency. For instance for the serialization/deserialization end of a high frequency trading system. If a problem has a large working set and needs to store data in DRAM it needs a memory controller the same way a CPU or GPU has a memory controller and this can only be efficient if the system's memory access pattern is predictable.
You can certainly pencil out FPGA or ASIC systems that which would attain high levels of efficient parallelism if there wasn't memory bandwidth or latency limits but there are. If you want to do math that GPUs are good at, you use GPUs. Historically some FPGAs have let you allocate bits in smaller slices so if you only need 6 bit math you can have 6 bit math but GPUs are muscling in on that for AI applications.
FPGAs really are good at bitwise operations used in cryptography. They beat CPUs at code cracking and bitcoin mining but in turn they get beat by ASICs. However there is some number of units (say N=10,000) where the economics of the ASIC plus the higher performance will drive you to ASIC -- for Bitcoin mining or for the NSA's codebreaking cluster. You might prototype this system on an FPGA before you get masks made for an ASIC though.
For something like the F-35 where you have N=1000 or so, could care less about costs, and might need to reconfigure it for tomorrow's threats, the FPGA looks good.
One strange low N case is that of display controllers for retrocomputers. Like it or not a display controller has one heck of a parts count to make out of discrete parts and ASIC display controllers were key to the third generation of home computers which were made with N=100,000 or so. Things like
and are already expensive compared to the Raspberry Pi so they tend to use either a microcontroller or FPGA, the microcontroller tends to win because an ESP32 which costs a few buck is, amazingly, fast enough to drive a A/D converter at VGA rates or push enough bits for HDMI!
Rapid product development. Got a project that needs to ship in 6-9 months and will be on the market for less than two years in small volume? Thats where FPGAs go. Medical, test and measurement, military, video effects, telepresence, etc.
I'm not sure about that. In these fields there are plenty of places where you need to ingest or process masses of data (eg. from a sensor in a medical device), and you're only going to sell 5 of these machines a month for 100K each, so $3000+ bill of materials for an FPGA to solve the problem makes sense.
The problem (for Intel) is that you don't sell billions of dollars of FPGAs into a mass market this way.
Most of my knowledge about FPGAs come from ex-FPGA people, so take this with a grain of salt:
First off, clock rates on an FPGA run at about a tenth that of CPUs, which means you need a 10× parallelism speedup just to break-even, which can be a pretty tall order, even for a lot of embarrassingly parallel problems.
(This one is probably a little bit garbled) My understanding is that the design of FPGAs is such that they're intrinsically worse at getting you FLOP/memory bandwidth number than other designs, which also gives you a cap on expected perf boosts.
The programming model is also famously bad. FPGAs are notorious for taking forever to compile--and the end result of waiting half an hour or more might simply be "oops, your kernel is too large." Also, to a degree, a lot of the benefits of FPGA are in being able to, say, do a 4-bit computation instead of having to waste the logic on a full 8-bits, which means your code also needs to be tailored quite heavily for an FPGA, which makes it less accessible for most programmers.
Tooling mostly. To write fast code for CPUs you need a good optimizing compiler, like clang or gcc. Imagine how much work has gone into making them good. We're talking thousands of man years over several decades. You need just as good tooling for FPGAs and it takes just as much effort to produce. Except the market is orders of magnitudes smaller. You can also not "get help" from the open source community since high-end FPGAs are way to expensive for most hackers.
Intel tried to get around this problem by having a common framework. So one compiler (based on clang) with multiple backends for their CPUs, FPGAs, and GPUs. But in practice it doesn't work. The architectures are too different.
There is nothing quite like gcc or LLVM for FPGAs yet. FPGA tooling is still stuck in the world of proprietary compilers and closed software stacks. It makes the whole segment of the industry move slower and have higher friction. This is just starting to break with Yosys and related tools, which are showing wild advantages in efficiency over some of the proprietary tooling, but still only support a fraction of available chips, mostly the smaller ones.
I'm just a casual observer, but I'm pretty sure one hard thing about FPGAs is preventing abuse. A customer could easily set up a ring oscillator that burns out all the LUTs. Another thing is FPGAs are about 10x slower than dedicated logic, so CPUs/GPUs beat them for a lot of applications. Plus, there's not a lot of logic designers in the first place. Software skills don't transfer over very well. For example, a multiplier is about the size of 8kb of RAM, so lookups and complex flow are way more expensive than just multiplying a value again (kinda like GPUs, except if you only had an L1 cache without main memory).
Not sure why I'm being downvoted, would those who downvoted me explain why? I try to be accurate so if I missed any important details I'd like to know :)
I dont know how a Ring Osc specifically could burn out LUTs. All switching contributes to the device temperature. If they get too hot then they will enter thermal protection mode.
We run such oscillators as dummy payloads for thermal tests while we are waiting for the real firmware to be written.
To play devil's advocate, I wonder how well they handle more annoying things.
When a CMOS switches, it essentially creates a very brief short circuit between VCC and GND. That's part of normal dynamic power consumption, it's expected and entirely accounted for.
But I don't know how these cloud FPGAs could enforce that you don't violate setup and hold times all over the place. When you screw up your crossings and accidentally have a little bit of metastability, that CMOS will switch back and forth a little bit, burn some power, and settle one way or the other.
Now if you intentionally go out of your way to keep one cell metastable as long as possible while the neighbors are cold, that's going to be one hell of a localized hotspot. I wouldn't be surprised if thermal protection can't kick in fast enough.
It's just kibitzing though, I'm not particularly inclined to try with my own hardware
Timing analysis is usually part of the synthesis and seems very comprehensive to me (I realise this statement may traumatise some firmware people). How hard it is to actively bypass this would be an interesting question.
It made their stock pop for a while which was all that mattered to Brian Krzanich who took the bonus and left the mess in the hands of Bob Swan who did the same things and left the mess ... (recursion her).
> Intel soon discovered the obvious, which is that customers with applications well-suited to FPGAs already use FPGAs.
Yes, but pairing an FPGA somewhat tightly integrated with an actually powerful x86 CPU would have made an interesting alternative to the usual FPGA+some low end ARM combo that's common these days.
Sure, if they wanted to intel could have done what nvidia did with CUDA: Put the tech into everything, even their lowest end consumer devices, and sink hundreds of millions into tooling and developer education given away free of charge.
And maybe it would have lead somewhere. Perhaps. But they didn't.
I wasn't there, but I've always imagined the conversation went something like this:
Intel: Welcome, Altera. We'd like you to integrate your FPGA fabric onto our CPUs.
Altera: Sure thing, boss! Loads of our FPGAs get plugged into PCIe slots, or have hard or soft CPU cores, so we know what we're doing.
Intel: Great! Oh, by the way, we'll need the ability to run multiple FPGA 'programs' independently, at the same time.
Altera: Ummmm
Intel: The programs might belong to different users, they'll need an impenetrable security barrier between them. It needs full OS integration, so multi-user systems can let different users FPGA at the same time. Windows and Linux, naturally. And virtual machine support too, otherwise how will cloud vendors be able to use it?
Altera: Uh
Intel: We'll need run-time scaling, so large chips get fully utilised, but smaller chips still work. And it'll need to be dynamic, so a user can go from using the whole chip for one program to sharing it between two.
Intel: And of course indefinite backwards compatibility, that's the x86 promise. Don't do anything you can't support for at least 20 years.
Intel: Your toolchain must support protecting licensed IP blocks, but also be 100% acceptable to the open source community.
Intel: Also your current toolchain kinda sucks. It needs to be much easier to use. And stop charging for it.
Intel: You'll need a college outreach program. And a Coursera course. Of course students might not have our hardware, so we'll need a cloud offering of some sort, so they can actually do the exercises in the course.
Altera: I guess to start with we
Intel: Are you profitable yet? Why aren't you contributing to our bottom line?
I think they have tried to improve the software for FPGAs--FPGA backends are part of their oneAPI software stack, for example. And when I was in grad school, Intel was definitely doing courses on building for FPGAs using OpenCL (I remember seeing some of their materials, but I don't know much about them otehr than they existed).
As to why it didn't work, well, I'm not plugged into this space to have a high degree of certainty, but my best guess is "FPGAs just aren't that useful for that many things."
Yes, if they actually made the thing available, maybe people would have used it for something. There were several proofs of concept at the time, with some serious gains, ever for the uses that people ended up using CUDA.
But they didn't actually sell it. At least not in any form anybody could buy. So, yeah, we get the OP claiming it was an obvious technological dead-end.
And if they included it on lower-end chips (the ones they sold just a few years after they brought Altera), we could have basically what the RasPI 2040 is nowadays. Just a decade earlier and controlled by them... On a second thought, maybe this was for the best.
Applications that benefit from the Zynq-style combination (e.g. radio systems) generally take that approach because they have SWaP concerns that preclude the use of big x86 CPUs in the first place.
Size, Weight, and Power. It would be very nice if people would take the two seconds to type those words out instead of using ungoogleable acronyms on a public forum unfamilar with the terms.
> Intel soon discovered the obvious, which is that customers with applications well-suited to FPGAs already use FPGAs.
So selling FPGA's was a bad move? Or was the purchase price just wildly out-of-line with the--checking...$9.8B annual market that's expected to rise to $23.3B by 2030?
It was a forced acquisition, iirc they made promises to Altera to get them to use their foundry, failed to keep those promises and could either get sued and embarrassed or just buy Altera outright for about what they were worth before the deal.
And less disastrous for Xilinx, given they could basically just keep going as they were before, instead of being significantly diverted onto a sinking ship of a process.
Intel soon discovered the obvious, which is that customers with applications well-suited to FPGAs already use FPGAs.