I would talk to these guys (unless you are one of them) working on extending the...

caxap · on Feb 25, 2013

No, I am not one of them :) Thanks for the reference! I am drawing my inspiration from Bailey, and more recently Ma et al. They label an image line by line and merge the labels during the blanking period. If you start merging labels while the image is processed then data might get lost if the merged label occurs after the merge.

The paper that you reference divides the image into regions, so that the merging can start earlier, because labels used in one region are independent of the other regions. If it starts earlier, it also ends earlier, so that new data can be processed.

In my case, there is no need for such high performance, just a real time requirement of 100fps for 640x480 images, where CCL is used for feature extraction. The work by Bailey and his group is good enough, and the reference can be done in the future, if there is need for more throughput!

My workflow is a lot different from the one that you describe. I don't use any soft cores, and write everything in VHDL! I have used soft cores before, but they were kind of not to my liking. I miss the short feedback loop (my PC is a Mac and the synthesis tools run in a VM).

After trying out a couple of environments, I ended up using open source tools---GHDL for VHDL->C++ compilation and simulation, and GTKwave for waveform inspection.

Usually, I start with a testbench a testbench that instantiates my empty design under test. The testbench reads some test image that I draw in photoshop. It prints some debugging values, and the wave inspection helps to figure out what's going on.

If it works in the simulator, it usually works on the FPGA! But the biggest advantage is that it takes just some seconds to do all that.

I will give the softcore approach another chance once my deadline is over!

robomartin · on Feb 25, 2013

One quick note. Sometimes in image processing you can gain advantages by frame-buffering (to external SDR or DDR memory, not internal resources) and then operating on the data at many times the native video clock rate.

If your data is coming in at 13.5MHz and you can run your internal evaluation core at 500MHz there's a lot you can do that, all of a sudden, appears "magical".

VLM · on Feb 25, 2013

While eating lunch I was thinking about your CCL and a simple 4-way CCL reminds me of the old "put the game-of-life" on a FPGA deal. So what if you model each pixel as a cell, and if you're set "on" then either propagate a GUID to the southeast cells, or if you got a GUID from the northwest cells, then propagate that GUID instead of your own? If you're on, propagate a zero to the southeast? Whats a good GUID? Probably some combo of your pixel's X/Y coord and/or just a (very large) random number.

FPGA's do cellular automata pretty well because you can create an ever larger matrix of them until you run into some hardware limit.

This is not exactly what you're trying to do, but it sure is simple and a possible start. I'm guessing when you're done you'll end up with a really smart peripheral that looks like a CA accelerator.

caxap · on Feb 25, 2013

That's perfectly possible, but only the newer FPGAs are big enough to store the whole image in the registers. If I had a bigger FPGA, I would not bother doing all this memory juggling that I am doing now and place all my data into the registers. And then wait for 10 hours for the software to produce the bitstream!

Probably some combo of your pixel's X/Y coord and/or just a (very large) random number.

I would go with X/Y because it requires less memory than a random number. Besides, random numbers on FPGAs need extra (though not much!) logic to produce them in LFSRs.