More

caxap · on Dec 18, 2013

I don't think Adi Shamir is interested in writing fake articles and a fake paper about the same topic:

http://www.tau.ac.il/%7Etromer/papers/acoustic-20131218.pdf

caxap · on Dec 7, 2013

And if everyone starts paying to highlight their ads, then those that don't pay will stand out.

gus_massa · on Dec 7, 2013

May I paraphrase patio11 ( https://news.ycombinator.com/item?id=4477088 ) ...

If everyone want’s to highlight their add, then double your price. If everyone still want’s to highlight their add, then double your price again.

caxap · on Oct 16, 2013

Yes, the government knows how to lie with statistics. Since the introduction of the so-called "one-euro jobs", the unemployment has gone down only because people who took such jobs were not counted as unemployed anymore, even though their situation has not been improved.

caxap · on April 11, 2013

Just send an email to this woman---she might send your mac back! My wife just told me that Iranians, like the ones in this case, don't like to use stolen things.

caxap · on Feb 25, 2013

Someone just needs to design a high level language that can be synthesised; something akin to a python of the FPGA world if you will.

The advantage of FPGAs is that they allow nontrivial parallelism. On a CPU with 4 cores, you can run 4 instructions at a time (ignoring the pipelining). On the FPGA, you can run any number of operations at the same time, as long as the FPGA is big enough. The problem is not the low-level nature of hardware description languages, the problem is that we still don't have a smart compiler that can release us from the difficulty of writing nontrivial massively-parallel code.

VLM · on Feb 25, 2013

"The advantage of FPGAs is that they allow nontrivial parallelism."

Want a system on a chip with 2 cores leaving plenty of space for an ethernet accelerator, or 3 cores without space for the ethernet accelerator? Its only an include and some minor configuration away.

"the problem is that we still don't have a smart compiler that can release us from the difficulty"

Still don't have smart programmer... its hard to spec. Erlang looking elegant, doesn't magically make it easy to map non-technical description of requirements to Erlang.

caxap · on Feb 25, 2013

At the moment, I am writing some computer vision code in VHDL. A part of the circuit will perform connected component labeling (CCL) on incoming images, because I want to extract some features from some object in the images. And CCL is actually a union find algorithm. The algorithm can be written in a normal programming language like Racket or even Java in a couple of hours. However, the same algorithm will take me weeks to work out and test in VHDL! I have done some nontrivial work with FPGAs, and every single time it was hard, because every low-level detail has to be considered. Maybe it is so hard because on FPGAs you are forced to optimize right from the start, whereas when using programming languages, you can develop a prototype quickly and then improve upon it? How is your experience with developing stuff on FPGAs?

robomartin · on Feb 25, 2013

I'd have to know more specifics to be able to comment beyond a certain level.

In general terms, yes, FPGA work can and does usually take longer than the equivalent work in the software domain. It doesn't have to be that way though.

For me it starts with language choices. I suppose that if you work in VHDL all the time you probably rock. I have an intense dislike for VHDL. I don't see a reason to type twice as much to do the same thing. Fifteen years ago VHDL had advantages with such constructs as "generate", this is no-longer the case. I realize that this can easily turn into an argument of religious nature, so we'll have to leave it at that.

One approach that I have used with great success with complex modules is to write them in software first and then port to the FPGA. Going between C and Verilog is very natural.

The key is to write C code keeping in mind that you are describing hardware all along. Don't do anything that you would not be able to easily replicate on the FPGA. You are, effectively, authoring a simulation of what you might implement in the FPGA. The beauty of this approach is that you get the advantage of immediate execution and visualization in software. Debug initial structures and assumptions this way to save tons of time.

Maybe the best way to put it is that I try not to use the FPGA HDL coding stage to experiment and create but rather to simply enter the implementation. Then my goal is to go through as few Modelsim simulation passes as possible to verify operation.

If you've done non-trivial FPGA work you have probably experienced the agony of waiting an hour and a half for a design to compiler and another N hours for it to simulate before discovering problems. The write-compile-simulate-evaluate-modify-repeat loop in FPGA work takes orders of magnitude longer than with software. I've had projects where you can only reasonably make one to half-a-dozen code changes per 18 hour day. That's the way it goes.

This is why I've resorted to extensive software-based validation before HDL coding. I've done this with, for example, challenging custom high-performance DDR memory controllers where there was a need to fiddle with a number of parameters and be able to visualize such conditions as FIFO fill/drain levels, etc. A nice GUI on top of the simulation made a huge difference. The final implementation took far less time to code in HDL and worked as required from the very start.

Another general comment. When it comes to image processing in FPGA's you don't really pay a penalty for modularizing your code to a relatively fine-grained degree. This because module interfaces don't necessarily create any overhead (the best example of this being interconnect wires). In that sense FPGA's are vastly different from software in that function or class+method interfaces generally come at a price.

Modularization can produce benefits during synthesis and placement. If you can pre-place portions of your design and do your floor planning in advance you can save tons of time. Incremental compilation has been around for a while. Still, nothing beats getting into the chip and locking down structures when it makes sense.

To circle back to the recurring theme of "FPGA for the masses" that pops-up every so often. I maintain that FPGA's are, fundamentally, still about electrical engineering and not about software development. These, at certain levels, become vastly different disciplines. Once FPGA compilers become 100 to 1,000 times faster and FPGA's come with 100 to 1,000 times more resources for the money the two worlds will probably blur into one very quickly for most applications.

caxap · on Feb 25, 2013

Thanks for your insights, there is a lot of value for me in your post.

I have an intense dislike for VHDL. I have yet to meet an engineer who likes it! I hate it with passion, but it lets me write circuits in the way I want. Luckily, emacs VHDL mode makes me type less.

If you've done non-trivial FPGA work you have probably experienced the agony of waiting an hour and a half for a design to compiler and another N hours for it to simulate before discovering problems. My simulations never took hours. I use GHDL (an open source tool that converts VHDL into C++) to simulate my code, which is much slower than running Modelsim in a virtual machine. So I guess that you are working on much larger problems than I do.

I have tried using a high level language before writing my circuits in VHDL before. But the results were not very good, apart from learning a lot more about the actual algorithm/circuit.

Either I coded at a too high of a level, which would be impossible in an FPGA (e.g., accessing a true dual port block RAM at 3 different addresses in a clock cycle), or I ended up simulating a lot of hardware just to make sure that it will work.

But the point is, no matter which approach I tried, it was painful, so I ended up choosing the workflow that is less painful.

I'd have to know more specifics to be able to comment beyond a certain level. I am developing a marker detection system that runs at 100fps, with 640x480 8-bit grayscale images. First I am doing CCL to find anything in the image that could be a marker. At the same time, some features are accumulated for each detected component (potential marker).

Then the features are used to find which component is a real marker and what's its ID. And finally, the markers have some spacial information that allows me to find out the position and orientation of the camera.

Even though the FPGA that I use is the largest of all Cyclone II FPGAs with 70k LEs, I have to juggle registers and block RAM because it's too small to store all data in the registers, and using up too many registers substantially increases the time to place&route the design.

I maintain that FPGA's are, fundamentally, still about electrical engineering and not about software development. These, at certain levels, become vastly different disciplines. Once FPGA compilers become 100 to 1,000 times faster and FPGA's come with 100 to 1,000 times more resources for the money the two worlds will probably blur into one very quickly for most applications. I agree, and I would add that the compilers need to be smarter about parallelizing the code. So while being able to perform better than the alternatives, the FPGAs are still a pain to develop for. Even if the compilers are faster, and FPGAs are bigger, writing code for FPGAs feels still more like writing assembly code rather than code that is easily accessible "for the masses". But I would be happy if the compilers become just 10x faster!

robomartin · on Feb 26, 2013

> I hate it with passion, but it lets me write circuits in the way I want.

Can you explain what you are doing. I am wondering if you might be making your work more difficult by not taking advantage of inference. Are you doing logic-element level hardware description? In other words, are you wiring the circuits by hand, if you will, by describing everything in VHDL?

I've done that of course, but I don't think it's necessary unless you really have to squeeze a lot out of a design. Where it works well is in doing your own hand-placement and hand-routing thorough switch boxes, etc. to get a super-tight design that runs like hell. I've done that mostly with adders and multipliers in the context of filter structures.

My guess is that you have setup several delay lines in order to process a kernel of NxM pixels at a time?

It's been a while but I recall doing a fairly complex shallow diagonal edge detector that had to look at 16 x 16 pixel blocks in order to do its job. This ended-up taking the form of using internal storage in a large FPGA to build a 16 line FIFO with output taps every line. Now you could read a full 16 lines vertical chunk-o-pixels into the shallow edge processor and let it do its thing.

The fact that you are working on a 70k LE Cyclone imposes certain limits, not the least of which is internal memory availability. I haven't used a Cyclone in a long time, I'd have to look and see what resources you might have. That could very well be the source of much of your pain. Don't know.

swah · on Feb 26, 2013

6+ hours to compile was the longest I've seen/worked with. The problem IIRC were the large FPGAs, not that much the large designs.

robomartin · on Feb 26, 2013

With dense designs you can easily run into what feels like O(n!) time, which is probably close to how complex the problem might actually become.

VLM · on Feb 25, 2013

I would talk to these guys (unless you are one of them) working on extending their results

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6...

The wikipedia entry also has a link to a parallelizable algo from 20+ years ago for CCL. FPGAs certainly parallel pretty easily. I wonder if your simplified optimum solution is to calculate one cell and replicate into 20x20 matrix or whatever you can fit on your FPGA and then have a higher level CPU sling work units and stitch overlapping parts together.

More practically I'd suggest your quick prototype would be slap a SoC on a FPGA that does it in your favorite low-ish level code, since it only takes hours, then very methodically and smoothly create an acceleration peripheral that begins to do the grunt-iest of the grunt work one little step at a time.

So lets start with just are there any connections at all? That seems a blindingly simple optimization. Well thats a bitwise comparison, so replace that in your code with a hardware detection and flag. Next thing you know you've got a counter that automatically in hardware skips past all blank space into the first possible pixel... But thats an optimization, maybe not the best place to start.

Next I suppose if you're doing 4-connected you have some kind of inner loop that looks a lot like the wikipedia list of 4 possible conditions. Now rather than having the on FPGA cpu compare if you're in the same region one direction at a time, do all 4 dirs at once in parallel in VHDL and output the result in hardware to your code, and your code reads it all in and decides which step (if any) was the lowest/first success.

The next step is obviously move the "whats the first step to succeed?" question outta the software and into the VHDL, so the embedded proc thinks, OK just read one register to see if its connected and if so in which direction.

Then you start feeding in a stream and setting up a (probably painful) pipeline.

This is a solid bottom up approach. One painful low level detail at a time, only one at a time, never more than one at a time. Often this is a method to find a local maximum, its never going to improve the algo (although it'll make it faster...)

"because on FPGAs you are forced to optimize right from the start" Don't do that. Emulate something that works from the start, then create an acceleration peripheral to simplify your SoC code. Eventually remove your onboard FPGA cpu if you're going to interface externally to something big, once the "accelerator" is accelerating enough.

Imagine building your own floating point mult instead of using an off the shelf one ... you don't write the control blocks and control code in VHDL and do the adders later... your first step should be writing a fast adder only later replacing control code and simulated pipelining with VHDL code. You write the full adder first, not the fast carry, or whatever.

caxap · on Feb 25, 2013

No, I am not one of them :) Thanks for the reference! I am drawing my inspiration from Bailey, and more recently Ma et al. They label an image line by line and merge the labels during the blanking period. If you start merging labels while the image is processed then data might get lost if the merged label occurs after the merge.

The paper that you reference divides the image into regions, so that the merging can start earlier, because labels used in one region are independent of the other regions. If it starts earlier, it also ends earlier, so that new data can be processed.

In my case, there is no need for such high performance, just a real time requirement of 100fps for 640x480 images, where CCL is used for feature extraction. The work by Bailey and his group is good enough, and the reference can be done in the future, if there is need for more throughput!

My workflow is a lot different from the one that you describe. I don't use any soft cores, and write everything in VHDL! I have used soft cores before, but they were kind of not to my liking. I miss the short feedback loop (my PC is a Mac and the synthesis tools run in a VM).

After trying out a couple of environments, I ended up using open source tools---GHDL for VHDL->C++ compilation and simulation, and GTKwave for waveform inspection.

Usually, I start with a testbench a testbench that instantiates my empty design under test. The testbench reads some test image that I draw in photoshop. It prints some debugging values, and the wave inspection helps to figure out what's going on.

If it works in the simulator, it usually works on the FPGA! But the biggest advantage is that it takes just some seconds to do all that.

I will give the softcore approach another chance once my deadline is over!

robomartin · on Feb 25, 2013

One quick note. Sometimes in image processing you can gain advantages by frame-buffering (to external SDR or DDR memory, not internal resources) and then operating on the data at many times the native video clock rate.

If your data is coming in at 13.5MHz and you can run your internal evaluation core at 500MHz there's a lot you can do that, all of a sudden, appears "magical".

VLM · on Feb 25, 2013

While eating lunch I was thinking about your CCL and a simple 4-way CCL reminds me of the old "put the game-of-life" on a FPGA deal. So what if you model each pixel as a cell, and if you're set "on" then either propagate a GUID to the southeast cells, or if you got a GUID from the northwest cells, then propagate that GUID instead of your own? If you're on, propagate a zero to the southeast? Whats a good GUID? Probably some combo of your pixel's X/Y coord and/or just a (very large) random number.

FPGA's do cellular automata pretty well because you can create an ever larger matrix of them until you run into some hardware limit.

This is not exactly what you're trying to do, but it sure is simple and a possible start. I'm guessing when you're done you'll end up with a really smart peripheral that looks like a CA accelerator.

caxap · on Feb 25, 2013

That's perfectly possible, but only the newer FPGAs are big enough to store the whole image in the registers. If I had a bigger FPGA, I would not bother doing all this memory juggling that I am doing now and place all my data into the registers. And then wait for 10 hours for the software to produce the bitstream!

Probably some combo of your pixel's X/Y coord and/or just a (very large) random number.

I would go with X/Y because it requires less memory than a random number. Besides, random numbers on FPGAs need extra (though not much!) logic to produce them in LFSRs.

caxap · on April 19, 2012

Good way to start is to learn one of the hardware description languages. I liked the book by Pong P. Chu "FPGA Prototyping by VHDL Examples: Xilinx Spartan-3 Version". The same book is also available for Verilog, which is another HDL. Later on you can take a look into higher level HDLs, since creating hardware in VHDL and Verilog is tedious.

rthomas6 · on April 19, 2012

It might be tedious, but as an EE, I've never seen or heard of anyone using anything besides VHDL and Verilog to describe digital hardware designs. What sort of high level HDLs do people typically use, and for what purpose?

StephenFalken · on April 19, 2012

Try to check SystemC (http://en.wikipedia.org/wiki/SystemC).

HardyLeung · on April 19, 2012

The article mentioned AutoESL, which compiles C, C++, or SystemC to Verilog/VHDL. This allows you to focus on the algorithmic, or behaviorial level. The advantages are plenty, but the main drawback is that it is one more level abstracted away from the hardware..

caxap · on March 17, 2012

Wouldn't open data make a bus company more popular to another one that does not release its data to the public? But here in Colone there is only one bus company, which might be the reasons why there is no public API---why put extra effort if there is no competition?

joelhaasnoot · on March 17, 2012

This article last week highlights how research in Chicago has shown that publishing live data and releasing an API has increased ridership: http://www.theatlanticcities.com/commute/2012/03/do-real-tim...

caxap · on Dec 30, 2011

Recently in Germany many banks introduced a new "security" feature that allows you to receive your TANs per SMS in order to do online transactions. The TANs are sent in plain text. All you need is a UMTS receiver and a way to analyze the data, e.g., a software-defined radio implemented on an FPGA.

Maxious · on Dec 30, 2011

Isn't this more secure than having nothing? There is a large additional cost to the wrongdoers in that they have to get close to you (even if they know your home address, how do they know you and your phone are home). Seems like a deterrent when you could be running credit card phishing sites for less work per victim. And you would still get the intercepted text, the ones I get from my bank in Australia suggest if you didn't request the token to contact them immediately.

aristidb · on Dec 30, 2011

The old alternative in Germany was that you had a piece of paper with numbers on them. So it absolutely may be a step backward.

coffeeaddicted · on Dec 30, 2011

My bank (Landesbank BW) gives you hardware (looking a little like a calculator) where you for example type in the bank-number of a person to whom you send money and then it'll calculate some PIN for that action.

stesch · on Dec 30, 2011

I might add, that the numbers were indexed and the banking software was requesting a random TAN. This was called iTAN.

darklajid · on Dec 30, 2011

We had both (both the original TAN list where every number could be used just once and invalidated all previous numbers on the list and the iTan system).

I prefer the token thingy my bank gave me. Insert your direct debit card, enter two numbers from the screen (usually corresponding to your transaction in some way, to confirm _again_ that you're really trying to send money to account X) and generate the TAN. Done.

caxap · on Nov 5, 2011

"I think we need more and more materials out there on good TDD and OOD..."

Which materials do you recommend?