The Other Cray Launches CPU-FPGA Hybrids

etep · on Sept 29, 2015

Intel has an interesting angle on this: FPGA + Xeon.

One story I see here as similar to what has been happening to the GPU business: Intel controls the access to the CPU, and therefore they will have an advantage because they can hold the competing solutions at arms length over PCI.

The link is just the first result I got back from Google, but, for example, 20x perf upside claimed: http://www.extremetech.com/extreme/184828-intel-unveils-new-...

The other takeaway, for me anyway, is that these are interesting times because hw architecture and hw/sw co-optimization appear to be gaining in importance (i.e. because Moore's law is slowing).

varelse · on Sept 29, 2015

Correct, Intel is playing dirty with Skylake. Sandybridge (i7-3820), IvyBridge (i7-4820), and Haswell (i7-5930k) all had 40 PCIE lane high-end consumer CPUs. Such CPUs could be used with PLX PCIE switches to build inexpensive quad-GPU supercomputers:

http://exxactcorp.com/index.php/solution/solu_list/85

As far as I can tell, there is no Skylake analog to these CPUs. So instead of building something better than a GPU, all Intel can do is buy Altera and do their worst to raise the price of building a fat GPU server.

Previously, they blocked the ability to send P2P copies between GPUs over QPI. But PLX 8747 PCIE switches provided a nice workaround (as will a single 8796 switch this round).

I guess those can, do, and those who can't, erect roadblocks.

Sanddancer · on Sept 30, 2015

The -E chips have always been released a year to a year and a half after the standard models. Broadwell-E has been announced and looks to be shipping in February or March according to rumors, and Skylake E about a year or so out.

The biggest curiosity for me is the lack of the Xeon E3s. Those launched simultaneously with the past few generations of desktops, with nothing so far regarding when/if they're going to be released, other than a mention in the Skylake chipset datasheet that the last four io lanes were skylake only.

Edit: Completely forgot about the Xeon D chips. That could be an interesting path to go for the desktop supercomputer route. 32 pcie lanes, and 4-8 cores with hyperthreading. They're not as beefy of a core, but with 10gig-e onboard, could quite possibly fit in the role of GPU router.

etep · on Sept 29, 2015

It doesn't make much sense to say that Intel can't build something better than a GPU, right? It's like saying the reason nvidia doesn't build CPUs is because they can't build a better CPU than Intel. It distracts the conversation.

So the question really is, what is the long term strategy and why? It appears to me that Intel has validated some of this "custom hardware" FPGA strategy and that their view is it will be "better together."

I agree that Intel is erecting roadblocks, but I doubt it is because they "can't build a GPU," rather more likely because "they can erect roadblocks." In fact, they are obligated, by their shareholders, to erect those roadblocks (or to play dirty if you want).

varelse · on Sept 29, 2015

You seriously need to investigate what a cluster#$%! Xeon Phi is. Their attempt to kill NVIDIA was a joke and continues to be one for anyone not getting paid to say otherwise.

IMO if Intel really cared about "shareholder value(tm)" they would have acquired NVIDIA by hook or by crook. Instead, they bought the promising redheaded stepchild Altera.

Meanwhile, NVIDIA owns the ML/Deep Learning space for at least the next 2-3 years no matter what manure Intel tries to fling at them. If only AMD had a decent driver/tools team, this battle could be far more interesting.

That said, 2018 or so and beyond is a green field(tm) if Intel stops choking on its own process and exploits its process advantage to build a GPU killer either as a co-processor or by integrating sufficient multiple AVX units into the cores of its CPU roadmap.

All IMO of course.

semi-extrinsic · on Sept 29, 2015

On the contrary, Xeon Phi has absolutely been a move to get some of the hot GPGPU market, and it's not working very well. AFAIK they don't have any cases where they beat GPUs on performance per watt (or per hardware dollar). So it's highly accurate to say they can't build something better than a GPU (so far).

hapless · on Sept 29, 2015

Intel had already price-segmented "inexpensive" supercomputers by restricting ECC to Xeon chips. The good news was that the low-end Xeons were only about $100 more than the Core i7 in the same socket.

We don't know whether Intel will restrict the low-end Xeon PCI-E lane counts for skylake, because the skylake Xeons aren't out yet.

I will be very alarmed if the next generation of 1-socket (cheap) Xeons feature the same sharply restricted I/O as the non-Xeon units.

varelse · on Sept 29, 2015

the !/$ on ECC has been craptastic for a rather long time. In fact, on GPUs, it's more likely your GPU is bad from the get-go than it is to encounter an ECC error:

http://www.robinbetz.com/papers/xsede13_betz_walker_ecc.pdf

pinewurst · on Sept 29, 2015

This article is from May '15.

dang · on Sept 29, 2015

Yup, but we asked luu to repost it because it didn't get any attention the first time.

p1esk · on Sept 29, 2015

Can you give a brief summary of this product?

solarexplorer · on Sept 29, 2015

If I understand correctly they combine a standard X86 processor with two FPGAs. One FPGA runs a custom CPU called MAP and the other manages the communication between the two CPUs, memory and other boards/peripherals. The develoment system accepts standard C/C++ code and emits two programs: one for the X86 processor, one for the MAP processor and a hardware description to adapt the MAP processor to the specific requirements of the source program. For some programs this can be a big win (and for others not so much).

GFK_of_xmaspast · on Sept 29, 2015

It looks like they did something to opencl to get it to target FPGAs, and then went a level beyond and had some kind of high-powered memory interconnect between CPU and FPGA so that they can share memory space.

varelse · on Sept 29, 2015

Altera has developed an OpenCL->VHDL compiler. That said, GPU code does not directly map to VHDL code in an efficient manner. Note the utter absence of GPU vs FPGA OpenCL benchmarks in the wild and take the hint...

https://www.altera.com/products/design-software/embedded-sof...

p1esk · on Sept 29, 2015

So this is a physically separate CPU, not part of the FPGA chip, right? If so, what are the advantages compared to CPU implemented as a hard core in the FPGA?

solarexplorer · on Sept 29, 2015

The separate CPU is much faster, because it has an entire die for itself and because it is a high-volume off-the-shelf chip. An embedded CPU/FPGA will suffer much more when it executes the serial part of the application on the embedded CPU. So if you remember Amdahls law...

> What are the applications for this hybrid?

Highly parallel applications with tight loops. These loops can be offloaded to the FPGA while the rest of the system can run as before. This approach allows you to use languages like FORTRAN/C/C++ without resorting to Verilog/VHDL, so it's much easier to adopt.

p1esk · on Sept 29, 2015

But a separate die means bandwidth and latency of CPU-FPGA communication is greatly reduced. So if data has to be constantly moved between them, it might be a disadvantage. What are the applications for this hybrid?

stonogo · on Sept 29, 2015

There is an entire chart devoted to this question in the article.

dang · on Sept 29, 2015

I can't! (I'm just a moderator and the guy who sent the repost invite.) But as a huge fan of all things Cray, I hope somebody knowledgeable will.