Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Field programmable gate array that's 4.2x faster than a 16 core CPU (hpcwire.com)
36 points by ColinWright on April 19, 2012 | hide | past | favorite | 21 comments



The title doesn't mean much if you don't specify at what. It's only about floating point, and the comparison is with a purely theoretical CPU.


Also, two of article's four authors work for Xilinx, which makes the FPGA in question.

Does anyone know how much these things cost? A quick google yielded nothing, but I may using the wrong terms.


Check farnell.com. They don't seem to stock the Virtex-7 family yet, but the price-range of the Virtex-6 family might provide some guidance (£350-£850).

Edit: This is for the naked chip, not a board. There's quite a discrepancy with int19h's findings, I don't know if that can be all up to the board?


Boards for high-end FPGAs like that are not cheap to make yourself either (needs like 6 layers or more, because they have SO MANY densely packed pins + you need one or two layers to feed its gratuitous hunger for power), so you'll probably end up having to buy some ready-to-use solution that probably quadruples that price...


I found one board, EK-V7-VC707-CES-G, for USD3.5k

The only time I was ever "in the market" for a board was in 2007, and back then it was a struggle getting parts; this was in the days of FX/LX/SX availability, and I wanted an SX part (with many DSP units) but basically got told no unless I wanted to buy over 50 units. So, had to settle for the bog standard FX. Paid USD5k for it (a premium for Infiniband connectivity).


I do not think they crippled that theoretical CPU, though:

  "The floating point performance for the reference microprocessor
   is calculated by multiplying the number of floating point
   functions units on each core by the number of cores and by the
   clock frequency.
   ...
   this article series has been using a normalized value of 2.5 GHz
   clock frequency."


"Field programmable gate array that's 4.2x faster than a 16 core CPU", theoretically and only in regards to 64-bit floating point arithmetic.

What's with the link-baity titles lately?


I'm quoting directly from the article:

    Comparing theoretical peaks for 64-bit floating point
    arithmetic, the current generation of Xilinx’s Virtex-7
    FPGAs is about 4.2 times faster than a 16-core microprocessor. 
And with regards your question about titles "lately", I'd be interested to see what other submissions I've made that you think are "link-baity".

Thanks.


I think it was in reference to other titles submitted lately in general, not necessarily by you.


Sorry, I didn't mean that you have submitted link-baity titles lately. I don't think I've read any of your submissions and if I did I wouldn't know it :)

The thing is, you left out some information that transforms the way the title reads to people who haven't read the article. People will think, as I did, "Wow, they've made improvements to FPGAs and got them way faster than CPUs", click through and find out that the performance gains are currently only theoretical, not empirical and also that the 4.2x number is only for a very specific type of problem.

Whether intentional or not, the title implies something greater than the article reports. That's annoying, I like article titles to be informative not inflationary.


I think that title is a fair summary. Any title has to be taken with a grain of salt until one reads the article, anyway.


Anyone have an idea about how this would compare with current GPU performance ? My impression is that GPU's are currently way ahead of CPU's in floating point performance (though maybe not for 64 bit ?).

EDIT: To make this question a bit more specific, say I wanted to develop a really fast neural net implementation, which basically reduces to matrix-vector multiplication and function interpolation. Would I be better off looking to do this with a GPU or an FPGA given the current state of both technologies ?

From what little experience I've had with GPU's I think bandwidth to the device might be a limiting factor but I'm guessing this would affect either type of co-processor.


In my experience, it's not bandwidth that is the limiting factor, but latency. You'll hit the same problem with FPGAs if you're using it as a co-processor, as they are typically connected to the motherboard over PCI Express. If the vectors you're using are small (where "small" means small enough to easily fit into an L1 cache on a processor), then you probably won't see any performance improvement by offloading the computation to an accelerator.

I say this because in a matrix-vector multiplication, only the vector has data-reuse. You do a single pass over the matrix. I wrote a paper where latency killed any performance benefit from using a GPU, because the computation we performed did only a single pass over the data: http://people.cs.vt.edu/~scschnei/papers/debs2010.pdf If you're doing a matrix-matrix multiplication, then that's a different story because each element in each matrix will be reused.


GPU's have multi gigabit bandwidth and 1.5-6+ GB of on-board ram. The GFLOP performance vary's significantly though.

The Radeon 7970 has 947 GFLOPS Double Precision, but the nvidia cripples it's geforce series to 100GFLOPS to force people to pay for a Quadro 600 that has 515.2GFLOPS Double Precision. Though, if it's a large project paying for some Quadro's are probably worth the cost's for better software support and more RAM IMO.

The problem with FPGA's is they cost about as much but take a lot more effort to anywhere close to those performance numbers. However, they are great if you have some vary odd specific needs and plan on moving to custom chips in the future. AKA, you want to build a custom video encoder and plan on mass producing your own chips, so you already need to develop at really low levels.


What's a good text for learning to program these? (Or perhaps series of texts, as my knowledge of electronics and computational hardware is very superficial.)


In grad school, I took a configurable computing course in the ECE department. I'm a CS guy - I had never done any hardware design before. You may benefit from reading over my short writeups of the assignments: http://people.cs.vt.edu/~scschnei/ece5530/

I recall that in trying to describe the impact of the web to typical business folks, Douglas Adams compared it to trying to explain the ocean to a river: first, you have to understand that river rules no longer apply. Hardware is similar. First, you have to understand that software rules no longer apply. If you dive into this even a little, I predict you will be shocked (much as I was) how much of your concept of "computation" is tied up in sequential, memory-hierarchy based processors.


  > If you dive into this even a little, I predict you will be shocked 
Thanks, sounds like my kind of ride.


Good way to start is to learn one of the hardware description languages. I liked the book by Pong P. Chu "FPGA Prototyping by VHDL Examples: Xilinx Spartan-3 Version". The same book is also available for Verilog, which is another HDL. Later on you can take a look into higher level HDLs, since creating hardware in VHDL and Verilog is tedious.


It might be tedious, but as an EE, I've never seen or heard of anyone using anything besides VHDL and Verilog to describe digital hardware designs. What sort of high level HDLs do people typically use, and for what purpose?


Try to check SystemC (http://en.wikipedia.org/wiki/SystemC).


The article mentioned AutoESL, which compiles C, C++, or SystemC to Verilog/VHDL. This allows you to focus on the algorithmic, or behaviorial level. The advantages are plenty, but the main drawback is that it is one more level abstracted away from the hardware..




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: