Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Intel’s Ponte Vecchio: Chiplets Gone Crazy (chipsandcheese.com)
184 points by rbanffy on Sept 25, 2023 | hide | past | favorite | 63 comments


The potential for intel to explode is definitely there if intel executes with its AI demand.

I suppose one unknown catalyst with intel is what happens in taiwan/china. If things get crazy over there, suddenly intel seems alot more valuable as the 'US' chip maker (they produce roughly 75% in the US iirc). If the gov starts to even more heaivly subsidize non-reliance on asia, intel could find major gains if TSMC/samsung get shut out.

I mean, just look at the market caps- Intel is worth 6x less than nvidia despite historically having the same or greater gross revenue (not counting the most recent quarter of course).


Absolutely. We're still in early days, but the products that Intel has announced in this space are impressive, and if they execute well they should be able to capture a significant amount of market share. That isn't to say that they will be the majority or dominant player in this space, but even capturing 10% or 20% of the datacenter GPU market in the next few years would be a win for Intel.

Intel is also well known for inking long-term deals with major discounts for big customers (Google, Facebook, etc.) that can commit to purchasing large amounts of hardware, whereas Nvidia doesn't really have the same reputation. It's conceivable that Intel could use this strategy to help bootstrap their server GPU business. The Googles and Facebooks of the world are going to have to evaluate this in the context of how much additional engineering work it is to support and debug multiple GPU architectures for their ML stack, but thinking long-term/strategically these companies should be highly motivated to negotiate these kinds of contracts to avoid lock-in and get better discounts.


I was surprised by how poorly poised Intel was to act on the "Cambrian explosion" of AGI late last year. After the release of their Intel Arc GPUs, it took almost two quarters for their Intel Extensions for PyTorch/TensorFlow to be released, to middling support and interest, which hasn't changed much, today.

How many of us learned ML using Compute Sticks, OpenVINO and OneAPI or another of their libraries or frameworks, or their great documentation? It's like they didn't really believe in it outside of research.

What irony is it when a bedrock of "AI" fails to dream?


Maybe I'm thinking about it too simply but yeah I agree.

Language models in particular are very similar architectures and effectively a lot of dot products. And running them on GPU's is arguably overkill. Look at llama.cpp for the way the industry is going. I want a fast parallel quantized dot product instruction on a CPU, and I want the memory bandwidth to keep it loaded up. Intel should be able to deliver that, with none of the horrible baggage that comes from CUDA and nvidia drivers.


Does Intel have a credibility problem w.r.t. ISA extensions to support deep learning?

I'm thinking about the widespread confusion they caused by having different CPUs support different subsets of the AVX-512 ISA.


This reads like parody (from llama.cpp, to it being a beacon of where industry is going (!?), to GPUs are overkill for what is effectively a lot of dot products)


Yeah using CPUs for inference or training is ridiculous. We're talking 1/20th the performance for 1/4th the energy


The reason CUDA has won is precisily because it isn't horribly stuck in a C dialect, have embraced polyglot workloads since 2010, have a great developer experience where GPUs can be debugged like regular CPUs, and the library ecosystem.

Now while NVidia is making standard C++ run on CUDA, Intel is still having SYSCL and oneAPI extensions.

Similarly with Python and RAPIDS framework.

Intel and AMD have to up their game for the same kind of developer experience.


Err.. Last time I checked, CUDA was the one with the partially compliant C++ implementation, while, on the contrary, SYSCL was being base on pure C++17..


Time to check again, as CUDA is C++20 for a bit now (minus modules), and NVidia is the one driving the senders/receivers work for C++26, based on their CUDA libraries.

SYCL isn't pure C++, meaning writing STL code that goes into the GPU, like CUDA allows for, nor requires the hardware to follow C++ memory model.


Per the article, this is on TSMC's 5nm node, though it does seem that Intel has some level of support from the US govt since it's the only onshore player there.


> they produce roughly 75% in the US

Some of the wafer fabs are in the US, but most of assembly gets done in Malaysia


s/some/most/. AZ, Ireland, and Israel, with Ohio allegedly coming soon.

https://en.wikipedia.org/wiki/List_of_Intel_manufacturing_si...


> The potential for intel to explode is definitely there if intel executes with its AI demand.

Nope. Intel doesn't get "It's the software, stupid."

Intel is congenitally unable to pay software people more than their engineers--and they treat their engineers like crap, mostly. And they're going to keep getting beaten black and blue by nVidia for that.


ML/AI engineers get paid a lot more for the same experience as other software engineers at Intel, so much so that there’s a “soft” Principal Engineer grade where they don’t go through the usual nomination process


This sounds like a problem to me. It takes more than ML folks to ship a high quality, production ready product.


You’d think they’d “out-Open Source” Nvidia’s Linux drivers situation. It seems if they shipped “good enough” hardware (not the best) and open sourced their driver stack it would get the attention of more devs.


Intel Linux GPU drivers have been open source for as long as I can remember (at least 10 years, maybe forever).


I think Intel is doing relatively well on the software side, given how short a time frame we're talking about. OneAPI is in the same ballpark as AMD and on a better trajectory, I think. They're competing for second place, remember.

The more disappointing thing for me is that they bought like 5 AI startups pretty early on and have basically just shut most of them down. Maybe that was always the plan? See which ones develop the best and consider the rest to be acqui-hires? But I think it's more likely just fallout from Intel's era of flailing around and acquiring random crap.


Intel has been running oneapi for several years, and as a long time user I assure you it's horrible to deal with. It's the only software I've dealt with that breaks other software. Every year or two I'd have to fully reinstall visual studio because it hopped in, messed up a bunch of files, and the uninstaller never works. It will also happily break your system python environment if you let it try. And did I mention it takes over an hour to run through their horrible installer that tends to break itself? Even under linux it would try to hop in and screw up system /bin and /sbin links because why not. They also shut down their old license validation portals so their older, working versions can no longer be installed. Intels dev tooling is absolutely the worst tooling around.


> They also shut down their old license validation portals

For a company that makes a truckload of money from selling CPUs it's unforgivable their tools are not

a) free

b) top-notch

I know one is limited by how good Windows is when you ship a tool for Windows, but your description is quite horrifying.


OneAPI still isn't as polyglot as CUDA, and they had to buy Codeplay to get better tooling to start with.


Intel love software and firmware -- they do things in sw instead of hw whenever they can.

They just bad at it.


Intel( and AMD) need to get their high end GPUs offered by a cloud provider. Total non-starter until then.


I don't know of any public instances with said hardware, but the cloud providers definitely have them, along with a few big data centre customers. It's probably going to be a matter of months before people can access them on AWS/Azure/etc. Supermicro are selling systems with them in right now. In terms of actual usage, I know Netflix are using Intel's Flex GPUs for AVX-512 transcoding.


> It's probably going to be a matter of months before people can access them on AWS/Azure/etc

I've been hearing this for years though.



Too much friction to use a vendor specific cloud like this. All our tooling is AWS/Azure, not going to be allowed to upload our application and code to anyone's random service.


> This is likely a compiler issue where the v0 += acc * v0 sequence couldn’t be converted into a FMA instruction.

Err, is the ISA undocumented/impossible to inspect in the execution pipeline? Seems like an important thing to verify/fix for a hardware benchmark...


Yes, at least for as far as I know. The actual micro-ops resulting from the instruction stream are invisible. You can count the number of uops issued and partly deduce how the instructions were decoded, but not view the uops themselves.


From the preceding paragraph:

> We weren’t able to get to the bottom of this because we don’t have the profiling tools necessary to get disassembly from the GPU.


And that's all I need to know about replacing all NVIDIA stuff. I know it's pretty hard to get there, but Intel should know that having a serious general purpose computing thing means solid compilers, toolchains, optimized libraries, and a whole lot of mindshare (as in 'a large number of people willing to throw their time to test your stuff').


I am an Intel shill lately but I think it's more of a time thing rather than the desire to keep stuff a secret. They've been pretty good about open documentation on the stuff that matters (like this) such as OpenVINO.


I was a bit annoyed about the OpenVINO reference, because I felt they closed most of the things about myriad-x and the SHAVE arch. And last time I tried OpenVINO on TigerLake I was left with a very thick pile of undebuggable, uninspectable opencl-y stuff, very bad taste in my mouth.

I mean OpenVINO's perf is up there on Intel CPUs and it's a great optimising compiler, I've thrown a lot of weird stuff in there and it didn't crap out with complaints about unsupported layers or unsupported combination of layers. It also has an OK batching story (as opposed to TVM last time I checked...) if you're ready to perform some network surgery.

I also feel it's very bad at reporting errors, and stepping through with gdb is one of the worst experiences... BUT but yeah most of the code is available now.

Now if they could stop moving shit around, and renaming stuff, it'd be great. Hoping they settle on 'OneAPI' for some time.


SHAVE was such a cool architecture, it's too bad about all the secrecy.


Is the Intel Xe ISA even publicly documented? I’ve searched before and I can’t find a PDF detailing the instruction set. AMD releases them,[0] but I can’t find anything from Intel (or Nvidia for that matter).

[0]: RDNA2 ISA: https://www.amd.com/content/dam/amd/en/documents/radeon-tech...



So the author compares it with a bunch of other GPUs, but: what about the price? I mean yeah H100 looks better in the graphs, but does it cost the same?



I don't know if there even is a price. Maybe Intel is just giving them out for free.


> It’s really a giant, parallel processor that happens to be programmed in the same way you’d program a GPU for compute.

This sounds vaguely interesting but I am not holding my breath.


Does this feel a lot like Xeon Phi v3.0 to anybody else?

Intel's strategy here is baffling to me. Rather than keep trying to improve their existing line of coprocessors (and most critically, keep accumulating key talent), they kill off the program, scatter their talent to the four winds, wait a couple years, and then launch another substandard product.


This is typical of Intel’s weak leadership and focus on short term profits instead of long term success.

Just look at how they dragged their feet in transitioning to EUV because it was too expensive. This contributed to large delays in their 10 and 7 nm processes and a total loss in their process leadership.

And look at how many billions they poured into making a 5G modem only to give up and sell their IP to Apple.

Or how they dragged their feet in getting into mobile, then came out with Atom way too late to be successful in the market. They essentially gave the market to ARM.

Optane is another recent example. Cool technology, but if a product is not a smashing success right away, Intel throws in the towel.

There’s no real long term vision that I can see. No resilience to challenges or ability to solve truly difficult problems.


> They essentially gave the market to ARM

They also had the best ARM chips for years with StrongARM/Xscale (using their own cores). Which they killed because obviously Atom was going to be much better and lock in everyone into x86...


A 233Mhz StrongARM coprocessor plugged into an Acorn RISC PC around 1994 was astonishing to behold. 233Mhz! RiscOS flew! That could have been the future.


Are you sure about the timeline? I don’t think the StrongARM CPUs were running that fast in 1994.



>Optane is another recent example. Cool technology, but if a product is not a smashing success right away, Intel throws in the towel.

Wasn't the actual (partial) reason that they didnt have a place to actually create them since Micron sold the fab?

https://www.extremetech.com/computing/320932-micron-ends-3d-...


From my understanding, the problem was that it wasn’t selling well enough and they decided to cut their losses.

I’m not saying that Optane was a hill they needed to die on, but it’s just another example of their failed leadership and decision making.

Look at how AMD is pursuing and largely succeeding with their vision of using chiplets in their CPUs and GPUs to enable significantly higher core counts at a lower cost.

Or how Nvidia is innovating with massive AI supercomputers, ray tracing, and DLSS.

What is Intel’s vision? In what way are they inventing the next generation of computing? It seems to me that their company objective is just to play catch up with AMD and Nvidia.


> It seems to me that their company objective is just to play catch up with AMD and Nvidia.

And TSMC. Intel really wants to win in both the fab game and the chip game.


I think it's fair to say that Optane was not merely "not a smashing success" but was completely uneconomical. Intel was essentially using Optane products as loss leaders to promote platform lock-in, and had limited uptake. Micron made only the smallest token attempt to bring 3D XPoint to market before bailing. Clearly neither partner saw a way forward to reduce the costs drastically to make it competitive as a high-volume product.


> This is typical of Intel’s weak leadership and focus on short term profits instead of long term success.

They're probably just doing what their shareholders want. Unfortunately, shareholders are shortsighted and risk-averse, contrary to the common rhetoric of being risk takers to justify eliteness.

Surely leadership could be embroiled in lawsuits were they to actually care more about the company than their weak, whimsical, and often incompetent shareholders. Kind of a sad irony actually.


Not really. Phi never had a very large market in reality, and at the time it existed there were very few non-niche workloads it was actually cost effective at. Intel was also on top at the time so they could afford to experiment there; now they're actually chasing an existing market that is growing. Phi looked really cool but existing software (a major selling point!) couldn't meaningfully be run without very poor performance and it was difficult to program. To the extent its design decisions made sense or were useful, they were absorbed into other product lines (e.g. Xeon MAX now has an HBM on die as an option, most Xeons just began scaling up their core count while keeping a better core uarch, etc...)

But Intel has been doing GPUs for a very long time however and it doesn't realistically seem like they are going to stop anytime soon. Discrete-class and Datacenter class GPUs are new for them, but hyperscaler space is a place that's staying hot and one they're familiar with. Nvidia literally can't sell H100s fast enough. So, I suspect they'll probably remain in the "GPU accelerator" race for quite a while yet, actually.


I think Intel's strategy, in a broad sense, makes sense. Xeon Phi succeeded in a few tiny niches, but they need a real GPGPU in order to compete in the broader market this decade. They tried to make their microarchitecture broadly similar to their competitors' to reduce risk and improve software compatibility. They knew their architecture (and software) wouldn't be as good as their experienced competitors' but thought that at the high end they could use their advanced packaging technology as an advantage. In hindsight that was maybe over-ambitious if it caused the substantial delays (I don't think we know that for certain but it's a good guess) but maybe it will pay dividends in the next product. You do have to take some risks when you're in last place.


I just don't understand why they would keep shutting programs down rather than doing course corrections toward a more competitive GPGPU. This behavior stretches all the way back to Larrabee in 2010.

If I was a betting man, I would bet that this project is dead inside 36 months. And if I was a GPU designer, I'd accordingly not touch Intel with a barge pole. They've painted themselves into a corner.

I personally know GPU experts who left Intel for Nvidia because of this. I can't imagine they would consider going back at this point.


You don't course correct Xeon Phi into a competitive GPGPU without starting over from scratch. The two are not similar.


If you look at this through an organizational lens -- how incentives line up for individuals -- rater than as what makes sense for Intel holistically, it might make more sense.

You see similar behavior in many failing companies, as well as third world countries. You can't admit faults to iterate, and you need grand new initiatives.


> With that in mind, Ponte Vecchio is better seen as a learning experience. Intel engineers likely gained a lot of experience with different process nodes and packaging technologies while developing PVC

cough An expensive lesson, I’m sure.


Cheaper than Itanium I bet.


Itanium killed enough competitors by sheer announcement that it might have been a positive for Intel in the end.


I can't imagine Intel would have lost money on Itanium.


Sure, but position of Intel back then was very different than today.

Being dethroned and free cash flow negative is rather bad I am told.


They are selling this thing to do the US Department of Energy, right? Presumably for a gigantic pile of cash.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: