As a veteran from the chip industry, I should warn you that all these suggestions about FPGAs for prototyping are not really done that much in the ASIC industry.
The skills to do front end work are similar but an ASIC design flow generally doesn't use an FPGA to prototype. They are considered slow to work with and not cost effective.
IP cores in ASICs come in a range of formats. "Soft IP" means the IP is not physically synthesised for you. "Hard IP" means it has been. The implications are massive for all the back end work. Once the IP is Hard, I am restricted in how the IP is tested, clocked, resetted and powered.
For front end work, IP cores can be represented by cycle accurate models. These are just for simulation. During synthesis you use a gate level model.
I've had a different experience to this. I've worked on ASIC's for over ten years and have had experience with nearly all aspects of the design flow (from RTL all the way to GDS2 at one point or another). I've taped out probably 20+ chips (although I've been concentrating on FPGA's for the last three years). Every chip that I've taped out has had extensive FPGA prototyping done on the design. This is in a variety of different areas too (Bluetooth, GPU's, CPU's, video pipelines, etc). You can just get a hell of a lot more cycles through an FPGA prototyping system than you can an RTL sim and when you are spending a lot of money on the ASIC masks, etc you want to have a chance to soak test it first.
My experience agrees with yours. Many big-budget teams use a hardware emulator like the Palladium XP or the similar Synopsis device. Both built from FPGAs.
Hardware emulators are expensive, but a single mask respin at 7, 10, or 16nm is even more expensive.
There is a distinction between hardware emulators and FPGAs.
Though hardware emulators such as Palladiums may use FPGAs inside them they don't work the same way in terms of validation. The two tools are very different to use.
I don't agree.
If it's non trivial, I don't have the more advanced verification tools such as UVM if I prototype via FPGA.
The ability to perform constrained randomised verification is only workable via UVM or something like it. For large designs that is arguably the best verification methodology. Without visibility through the design to observe and record the possible corner cases of transactions, you can't be assured of functional coverage.
While FPGAs can run a lot more transactions, the ability to observe coverage of them is limited.
I have worked on multiple SoCs for Qualcomm, Canon and Freescale. FPGAs don't play a role in any SoC verification that I've worked on.
This was my experience working on SoCs at Broadcom also where we didn't really use FPGAs at all.
But at another employer that did not work on consumer designs, I did use a lot of large FPGAs in final shipped products, and in those cases we did some of our heavy testing and iterating on the real FPGA(s). For example I built a version of the FPGA with pseudo-random data generation to test an interface with another FPGA. When I found a case that failed I could then reproduce it in simulation much more quickly.
That employer also built some ASIC designs and I remember some discussions about using FPGA prototyping for the ASICs to speed up verification or get a first prototype board built faster that would later get redesigned with the final ASIC. I don't know if they ever went down that route but it would not surprise me if they did. These were $20k PCB boards once fully assembled, and integration of the overall system was often a bigger stumbling block than any single digital design.
There are a lot of different hardware design niches so I'm sure there are many other cases.
All my information is also about 10 years out of date.
This reflects my experience. Many/most of the "nontrivial" issues nowadays are rooted in physical issues, not logical issues. And in those cases, simulation is often superior to dealing with the fpga software layer. Fwiw,I asked my co founder formerly at Intel, and he said that fpga involvement was "almost zero".
That's a false dichotomy -- you can do FPGA verification in addition to simulation-based verification. And yes, there are ASIC teams that have successfully done that.
The reasons are numerous. I already gave a few. I will give another. Once you have to integrate hard IP from other parties, you cannot synthesise it to FPGA. Which means you won't be able to run any FPGA verification with that IP in the design. You can get a behavioural model that works in simulation only. In fact it is usually a requirement for Hard IP to be delivered with a cycle accurate model for simulation.
I'll give another reason. If you are verifying on FPGA you will be running a lot faster than simulation. The Design Under Test requires test stimulus at the speed of the FPGA. That mans you have to generate that stimulus at speed and then check all the outputs of the design against expected behaviour at speed. This means you have to create additional HW to form the testbench around the design. This is a lot of additional work to gain speed of verification. This work is not reusable once the design is synthesised for ASIC.
I can go on and on about this stuff. Maybe there are reasons for a particular product but I am talking about general ASIC SoC work. I got nothing against FPGAs. I am working on FPGAs right now. But real ASIC work uses simulation first and foremost. It is a dominant part of the design flow and FPGA validation just isn't. On a "Ask HN", you would be leading a newbie the wrong way to point to FPGAs. It is not done a lot.
As another veteran in the ASIC industry: we are using FPGAs to verify billion transistor SOCs before taping out, using PCBs that have 20 or more of the largest Xilinx or Altera FPGAs.
It's almost pointless to make the FPGA run the same tests as in simulation. What you really want is to run things that you could never run in simulation. For example: boot up the SOC until you see an Android login screen on your LCD panel.
A chip will simply not tape out before these kind of milestones have been met, and, yes, bugs have been found and fixed by doing this.
The hard macro IP 'problem' can be solved by using an FPGA equivalent. Who cares that, say, a memory controller isn't 100% cycle accurate? It's not as if that makes it any less useful in feeding the units that simply need data.
I find the above pair of comments really interesting. I'm guessing there are parallels with differences of opinion and approach in other areas of engineering. There are always reasons for the differences, and those are usually rooted in more than just opinion or dogma.
In this case, I'd guess its got a lot to do with cost vs relevance of the simulation. If you're Intel or AMD making a processor, I bet FPGA versions of things are not terribly relevant because it doesn't capture a whole host of physical effects at the bleeding edge. OTOH for simpler designs on older processes, one might get a lot of less formal verification by demonstrating functionality on an FPGA. But this is speculation on my part.
"If you're Intel or AMD making a processor, I bet FPGA versions of things are not terribly relevant because it doesn't capture a whole host of physical effects at the bleeding edge."
Exactly.
When you verify a design via an FPGA you are only essentially testing the RTL level for correctness. Once you synthesise for FPGA rather than the ASIC process, you diverge. In ASIC synthesis I have a lot more ability to meet timing constraints.
So given that FPGA validation only proves the RTL is working, ASIC projects don't focus on FPGA. We know we have to get back annotated gate level simulation test suite passing. This is a major milestone for any SoC project. So planning backwards from that point, we focus on building simulation testbenches that can work on both gate level and RTL.
I am not saying FPGAs are useless but they are not a major part of SoC work for a reason. Gate level simulation is a crucial part of the SoC design flow. All back end work is.
Let me try to summarize part of this: When you're building an ASIC, you have to care about the design at the transistor level because you're going for maximum density, maximum speed, high volume, and economies of scale. When you're building an FPGA, you are only allowed to care about the gates, which is one abstraction level higher than transistors.
In an FPGA, you cannot control individual transistors. (FPGAs build "gates" from transistors in a fundamentally different way than ASICs do, because the gates have to be reprogrammable.) And that's okay because FPGA designs aren't about the highest possible speed, highest density, highest volumes or lowest volume cost.
Maybe engineers need to be introduced to the synthesis tools at the same time as the simulator tools.
Simulating RTL is only an approximation of reality. So emphasizing RTL simulation is bad. You see it over and over though. People teach via RTL simulation.
Synthesis is the main concern. Can the design be synthesised into HW and meet the constraints? Because all the combinatorial logic gets transformed into something quite different in a FPGA.
If you are not an old fart like myself, you probably haven't used an actual Unix system but back in the days before the popularity of Linux, you'd see a lot of Solaris.
You stuck to your guns and didn't just lie about Unix experience, so I commend you.
But if you really want the job, next time just lie and set them straight once you've gotten an interview. It is splitting hairs to make a big deal out of actual Unix experience vs Linux experience.
It is splitting hairs to make a big deal out of actual Unix experience vs Linux experience.
This only holds true if one is a UNIX Graybeard, otherwise, GNU/Linux is rife with pitfalls and minefields for the unwary. Shell scripting and GNU-isms or bash-isms is an example of this, or using GNU specific functions or semantics when programming yet another. There are many such landmines lying in wait on GNU/Linux, waiting to blow one's legs off. The BSD guys have been trying to raise awareness of this for years.
It's the experimental part of the high level language that is the problem. I agree you shouldn't teach it to students. It just leads them down a divergent path away from what is done in industry. It isn't addressing the needs of the student, only their short term "wants".
But the language is just a small part of the design process. You have to be learn to design HW. The HW engineering project tailors the tool choices around the requirements of the product. It is assumed that engineers know the fundamentals. They can adapt to any high level synthesis tool.
Vendors training courses for all fancy HLS tools are done in a few days at most. They don't have a semester for any newbies to learn Verilog/VHDL or C/C++ first. It's assumed you know them.
"One thing that bit me when I was a complete n00b: assigning registers from within more than a single always block. On my simulator (at the time) it worked perfectly but the synthesis tool silently ignored one of the blocks."
It's tool dependent but I believe you should see a warning that two drivers are assigned to the same net.
This is probably where I am guessing you mistakenly thought you were creating a register in Verilog with the keyword "reg". Synthesis tools don't work like that and haven't for quite a while.
"Initially, Verilog used the keyword reg to declare variables representing sequential hardware registers. Eventually, synthesis tools began to use reg to represent both sequential and combinational hardware as shown above and the Verilog documentation was changed to say that reg is just what is used to declare a variable. SystemVerilog renamed reg to logic to avoid confusion with a register – it is just a data type (specifically reg is a 1-bit, 4-state data type). However people get confused because of all the old material that refers to reg."
A lot of people here on HN seem to be self taught and not keeping up with tool and language developments. If you use tools and techniques from the 90s, don't expect wonderful results.
A new FPGA designer should learn the first principles so they can understand how to make decisions and where to look for potential issues when bugs occur.
As a long time HW FPGA guy, I think you might want to take a look at the C projects again. I don't know whether Go has any advantages but the concept of a higher level language for development is being used by the major FPGA companies.
Both Xilinx and Altera have High Level Synthesis (HLS) tools. These use C or C++. If you know how FPGA work is generally done, you can separate the hype from the reality and you can understand how to use it for a real application.
The vendors have lots of libraries for IP. You don't write RTL from scratch. It would take too long to verify. You tie IP together. It can be DSP or generic maths or a video codec thing. The VHDL is done for you.
You write your algorithm in C++ in a particular format using compatible data types and calling HLS libraries. You run it all in C++ first and make sure it does exactly what you want in SW. This is where the algorithm is developed.
THEN you fire up the HLS tool and a couple of hours of synthesizing later (lol) you get to load a bitstream onto a FPGA to verify it.
Of course there can be problems in that translation. It takes good engineering to dive down into the design and find the issues.
My current work does not touch any HLS. I am doing the VHDL stuff. But I know the algorithms all started from SW first. It always does. For the bulk of the work, verification, it is somewhat irrelevant whether it is manually converted to RTL or done via tools.
Another HW FPGA guy here. Albeit one who has never used HLS. My concern with the whole idea of HLS is that it fails to take advantage of the parallelization capability of FPGAs, which in my opinion is one of the main reasons to use an FPGA in the first place. It sounds great for designs that are linear in nature, that is, putting data through a bunch of sequential processing blocks and then outputting some result. But for most of those cases, why not just use a processor + DSP SoC? Or even something like a Zynq? It will probably be faster.
Seeing how FPGAs do not operate in a linear way the way that software does on a processor, why are we trying to make them work that way? It would make more sense to me to design a high-level synthesis language with a paradigm that is also not imperative: functional programming. Like, for example, how would this kind of C code even be synthesized in hardware?:
A = 5;
B_out = A + 3;
A = 6;
C_out = A;
"A" is used as two different things, which is totally fine when the code is run sequentially, which must be what is happening when code like this is synthesized, but that's wasteful on an FPGA, because B_out and C_out don't actually have dependence on each other and could be computed concurrently, which is what would happen if we used VHDL to do something similar. We need a high-level synthesis language that describes a system which solves the algorithm we want, the same way VHDL does, except with more abstraction capabilities. In my opinion this could be a functional language.
I agree about the parallelism but you have to understand the design methodology.
Your example is somewhat pointless. The code is written to create the HW not the other way around. I can't feed it just any crap.
You want parallelism you have to code it.
Zynq would actually be what I use! You start with SW. The ARM core is not that quick. You will use the FPGA to accelerate the tough parts. You may think you will have throughput issues but you have options via the high performance AXI ports. Your FPGA modules access the data in memory via DMAs.
KNOWING what part of the algorithm you need to accelerate actually suits FPGAs, you grab the HLS and start coding.
Video, matrices, linear algebra, encoders/decoders. Etc.
I can string them together in the same way I would string HDL IP.
The advantage is I can run the algorithm in C++ first and test it all, under the assumption that the HLS library has the equivalent HW version for synthesis.
There is still a lot of HW work involved. For instance in your example with A used twice. One module would calculate B_out by reading A prior to changing its value then you would have to start the C_out module. You would need a way to coordinate the two modules to share the same memory at A. But they would be running in parallel, just not started at the same time.
>Lack of electronics knowledge is what prevents most software developers from being productive in Verilog. A "better" language won't help.
I am working in the FPGA industry. I definitely agree with you.
But possibly I am a little too close to the current industry and way we do things. There's new applications around the corner that need innovative ideas. The innocent fresh perspective could be the seed for something. I am sure if any real decent improvements were started by a SW person, the big companies like Intel would be eating it up.
It funny how people on HN were raving about Altera & Intel recently because word on the FPGA street was that Intel weren't happy with the acquisition. Xilinx has had 14nm FPGAs for a long time while Altera's are nowhere to be seen still. Everyone's expectations of a Xeon with Altera FPGA on the same die are years away I am betting.