14 nanometer process. I vaguely recall how impossible people said that would be ...

listic · on Aug 11, 2014

Still, it looks like 5 nanometers is the end of it. http://en.wikipedia.org/wiki/5_nanometer

Not that it will necessarily be the absolute end of Moore's law, as hardware manufacturers are trying alternative approaches to keep ramping up power, e.g. Samsung already sells its 850 Pro series SSDs http://www.amazon.com/s/ref=nb_sb_ss_c_0_6?url=search-alias%... made with its VNAND memory http://www.samsung.com/global/business/semiconductor/html/pr... which fell back to 40 nm from 840 EVO's 19 nm, while going 3D, which seemed to improve both speed and reliability. Now they have a bit more runway in their Moore's law, but still not much in sight.

jfoutz · on Aug 11, 2014

Yeah, a transistor out of 4 atoms? wow.

The only thing i'm sure of, when Moore's law ends we'll be spending a lot more time with Amdahl's law.

osmala · on Aug 12, 2014

Actually we have been spending a lot time with Amdahl's law already. Software people assume it only applies to parallerizing software. In "computer organization & design" By Patterson and Hennessy it is stated in the general form. And example is used how much multiply unit should be sped up to get five fold improvement in execution time when 80% of time is spend in multiplication?

Execution time after improved=

(execution time affected by improvement / amount of improved)+Execution time unaffected.

pinkyand · on Aug 11, 2014

V-nand is probably not a solution to scaling flash , according to memory experts :

http://thememoryguy.com/comparing-samsung-v-nand-to-micron-1...

Igglyboo · on Aug 11, 2014

I hear this end to moores law all the time but couldn't we just make the CPUs physically larger?

I know this probably wouldn't work for a mobile device but for desktops/servers theres a ton of room for larger dimension chips right?

ethbro · on Aug 11, 2014

(Disclaimer: I'm not an EE)

To be technical, Moore's law is about the number of transistors on an integrated circuit.

So your point isn't too far off the 3D comments elsewhere.

Simply making the die bigger doesn't get you much: larger dies (without additional redundancy) have lower yields (as you're more likely to have a defect given a constant defect/area rate) and fewer can be stamped out of a standard sized wafer.

However, if you carry that idea to its logical conclusion... we may turn from shrinking the transistor to shrinking the packaging as the path of least resistance. 3D transistors, chip stacking (aka PoP), and through-silicon vias (aka vertical connectivity) all help get us more processing / area (while remaining within fundamental thermal, manufacturing, etc. physical limits).

Again, this is a CS major with an architecture interest, so anyone please feel free to correct me if I'm off-base.

ZenoArrow · on Aug 12, 2014

There is plenty of room for optimisation in overall speed in classical computers, even with the looming end of Moore's Law. In my opinion HP have got the right idea... http://arstechnica.com/information-technology/2014/06/hp-pla...

Silicon photonics is likely to be a huge source of potential improvements. A former Intel SVP, Pat Gelsinger, was quoted as saying "Today, optics is a niche technology. Tomorrow, it's the mainstream of every chip that we build." http://en.wikipedia.org/wiki/Silicon_photonics

XorNot · on Aug 12, 2014

3D doesn't help very much. We're already at the limits of thermal capacity for consumer-chips. 3D just exacerbates that, since now you've lost a whole dimension you can shunt heat through.

If you go 3D, you need a very large drop in heat dissipation to keep your junction temperatures down.

valarauca1 · on Aug 11, 2014

The problem is if you make a CPU twice as large, with transistors have the same resistance, and power consumption as before you double the heat output and double the power input.

So yes you can. But if you do this for more then 26 months we'll end up with 600 watt CPU's.

XorNot · on Aug 12, 2014

Also your 1cm2 die becomes 2cm2, becomes 4cm2...you pretty quickly run out of space physically, and your interconnects get long enough that propagation delay becomes significant.

frankchn · on Aug 12, 2014

And to fix your propagation delays you introduce longer and longer pipelines... then you basically end up with Prescott.

ajb · on Aug 11, 2014

Apart from the power considerations, that's very expensive. The cost of silicon grows more than linearly (I don't remember the exponent, someone help me out here...) in area, because of yield issues: the larger your chip, the more likely it is that an imperfection will cause it to be worthless. With very regular designs like DRAM and flash the reduce the impact of this by redundancy, but last I heard this wasn't really considered practical for logic designs like CPUs. You can sacrifice an entire one, although apparently you have to provide for that by making a voltage island round each one, which uses some area not to mention being a nasty analog complication, although I think those are coming in anyway due to the desire to reduce base power by switching unused blocks off.

wtallis · on Aug 11, 2014

GPUs have for generations been designed with the intention of being able to disable defective compute units; that's why we have more than 3 models from each manufacturer. It's not done quite as much in the CPU space, but Intel does sell some server CPUs that have some cores disabled, and AMD has even sold 3-core CPUs that didn't quite pass QA as 4-core parts. There's also variability in the amount of cache memory that gets enabled on parts made from the same die.

For CPUs, it's not done at any granularity finer than a whole core - nobody sells CPUs that might have one of the ALUs disabled.

Yizahi · on Aug 12, 2014

Also K series CPUs (for overclockers) are chips with defective IOMMU part, small "xeon" E3 chips are Core chips with defective GPU part, and probably there is a lot of other similar binning going on with other parts.

malkia · on Aug 12, 2014

GPU have only recently being used in holding data for which losing bits may trigger a disaster (financial, banking, system, etc. data). Before that it was just for holding vertices, shaders, models, textures, etc.

akiselev · on Aug 11, 2014

They can be made larger but that would reduce the manufacturer's yield. The bigger a single chip is, the more surface area you lose when you get an error.

Imagine a scratch that runs straight through the middle of several chips parallel to 2 out of the 4 sides and runs for 175 millimeters. If the chips are 50mm x 50mm each, you would lose at most 12,500 sq mm of silicon (3 scratched through and 2 with 12.5mm scratches = 5 * 2500 sq mm). If they are, say, 60mm x 60mm you could lose 14,400 sq mm of silicon from the same error (2 scratched through and 2 with 12.5 mm scratches = 4 x 3600 sq mm).

bashinator · on Aug 11, 2014

IBM's Z-series CPUs are pretty ridiculously enormous in this regard. The z196 is 512mm^2[0] (compare with about 260mm^2 for a top-of-the-line Intel Xeon), and is able to run at 5.2GHz on a 45nm process. Because IBM can basically charge anything they want for these monsters[1]; their yield of working chips per wafer can be pretty terrible and they'll still make money.

Intel, on the other hand, needs to be able to sell as many chips as possible per wafer, because they have vastly lower margins. This is also why they do stuff like fusing off dead cores and cache to produce working lower-end parts from dice that aren't 100% functional out of the factory.

[0] https://en.wikipedia.org/wiki/IBM_z196_%28microprocessor%29

[1] Locked-in customer base, service costs dwarf hardware costs, etc.

wmf · on Aug 11, 2014

That's being done for Xeons and discrete GPUs; 700 sq. mm. chips are now a thing. It increases performance but doesn't help price/performance or power efficiency.

roeme · on Aug 11, 2014

In addition to what has been said (power requirements/efficiency), signal propagation becomes a problem; you may not reach enough of the CPU die during a clock cycle.

sigterm · on Aug 11, 2014

Source? I don't think it's necessary for signal to propagate through the entire chip within a clock cycle.

marcosdumay · on Aug 11, 2014

It's not, and it does not on the fastest chips around.

But you'll be in quite a bad situation if it does not propagate through at least an entire ALU and register file within less than a cycle. Not an impossible situation, but a bad one.

coolsunglasses · on Aug 11, 2014

Itaniums did this and it contributed to the expense. I don't think that "scales" for our priorities (power, cost).

zanny · on Aug 11, 2014

You can still transition away from silicon lithography in addition to alternative transistor layout designs. Graphene is the most cited alternative, but there may be untested others, that just require rarer materials.