I'd have to know more specifics to be able to comment beyond a certain level. In...

caxap · on Feb 25, 2013

Thanks for your insights, there is a lot of value for me in your post.

I have an intense dislike for VHDL. I have yet to meet an engineer who likes it! I hate it with passion, but it lets me write circuits in the way I want. Luckily, emacs VHDL mode makes me type less.

If you've done non-trivial FPGA work you have probably experienced the agony of waiting an hour and a half for a design to compiler and another N hours for it to simulate before discovering problems. My simulations never took hours. I use GHDL (an open source tool that converts VHDL into C++) to simulate my code, which is much slower than running Modelsim in a virtual machine. So I guess that you are working on much larger problems than I do.

I have tried using a high level language before writing my circuits in VHDL before. But the results were not very good, apart from learning a lot more about the actual algorithm/circuit.

Either I coded at a too high of a level, which would be impossible in an FPGA (e.g., accessing a true dual port block RAM at 3 different addresses in a clock cycle), or I ended up simulating a lot of hardware just to make sure that it will work.

But the point is, no matter which approach I tried, it was painful, so I ended up choosing the workflow that is less painful.

I'd have to know more specifics to be able to comment beyond a certain level. I am developing a marker detection system that runs at 100fps, with 640x480 8-bit grayscale images. First I am doing CCL to find anything in the image that could be a marker. At the same time, some features are accumulated for each detected component (potential marker).

Then the features are used to find which component is a real marker and what's its ID. And finally, the markers have some spacial information that allows me to find out the position and orientation of the camera.

Even though the FPGA that I use is the largest of all Cyclone II FPGAs with 70k LEs, I have to juggle registers and block RAM because it's too small to store all data in the registers, and using up too many registers substantially increases the time to place&route the design.

I maintain that FPGA's are, fundamentally, still about electrical engineering and not about software development. These, at certain levels, become vastly different disciplines. Once FPGA compilers become 100 to 1,000 times faster and FPGA's come with 100 to 1,000 times more resources for the money the two worlds will probably blur into one very quickly for most applications. I agree, and I would add that the compilers need to be smarter about parallelizing the code. So while being able to perform better than the alternatives, the FPGAs are still a pain to develop for. Even if the compilers are faster, and FPGAs are bigger, writing code for FPGAs feels still more like writing assembly code rather than code that is easily accessible "for the masses". But I would be happy if the compilers become just 10x faster!

robomartin · on Feb 26, 2013

> I hate it with passion, but it lets me write circuits in the way I want.

Can you explain what you are doing. I am wondering if you might be making your work more difficult by not taking advantage of inference. Are you doing logic-element level hardware description? In other words, are you wiring the circuits by hand, if you will, by describing everything in VHDL?

I've done that of course, but I don't think it's necessary unless you really have to squeeze a lot out of a design. Where it works well is in doing your own hand-placement and hand-routing thorough switch boxes, etc. to get a super-tight design that runs like hell. I've done that mostly with adders and multipliers in the context of filter structures.

My guess is that you have setup several delay lines in order to process a kernel of NxM pixels at a time?

It's been a while but I recall doing a fairly complex shallow diagonal edge detector that had to look at 16 x 16 pixel blocks in order to do its job. This ended-up taking the form of using internal storage in a large FPGA to build a 16 line FIFO with output taps every line. Now you could read a full 16 lines vertical chunk-o-pixels into the shallow edge processor and let it do its thing.

The fact that you are working on a 70k LE Cyclone imposes certain limits, not the least of which is internal memory availability. I haven't used a Cyclone in a long time, I'd have to look and see what resources you might have. That could very well be the source of much of your pain. Don't know.

swah · on Feb 26, 2013

6+ hours to compile was the longest I've seen/worked with. The problem IIRC were the large FPGAs, not that much the large designs.

robomartin · on Feb 26, 2013

With dense designs you can easily run into what feels like O(n!) time, which is probably close to how complex the problem might actually become.