More

mehmetoguzderin · 2024-07-06T08:47:35 1720255655

Fascinating, to say the least, especially considering the execution time in comparison to the best CPU result. I might be skimming too fast, but are there also CPU timings for stress test scenes in the paper, including mmark, etc.?

raphlinus · 2024-07-06T13:24:01 1720272241

Good catch! We took the mmark CPU numbers out (in response to review feedback) because they made the graphs hard to read, but it scales very much the same way as the Nehab timings dataset. The raw CPU numbers for mmark are in the repo[1] in the "timings" file.

[1]: https://github.com/linebender/gpu-stroke-expansion-paper

mehmetoguzderin · 2024-07-06T21:17:49 1720300669

Impressive! Thank you for the pointer

mehmetoguzderin · on March 9, 2022

> The 3090 also can do fp16 and the M1 series only supports fp32

Apple Silicon (including base M1) actually has great FP16 support at the hardware level, including conversions. So it is wrong to say it only supports FP32.

oneplane · on March 9, 2022

I'm not sure if he was talking about the ML engine, the ARM cores, the microcode, the library or the OS. But it does indeed have FP16 in the Arm cores.

inkyoto · on March 9, 2022

FP16 is supported in M1 GPU's and Neural Engines through the CoreML framework. From https://coremltools.readme.io/docs/typed-execution :

> The Core ML runtime dynamically partitions the network graph into sections for the Apple Neural Engine (ANE), GPU, and CPU, and each unit executes its section of the network using its native type to maximize its performance and the model’s overall performance. The GPU and ANE use float 16 precision, and the CPU uses float 32.

Also, this exploration (https://tlkh.dev/benchmarking-the-apple-m1-max#heading-neura...) reports the 5.1-5.3 TFLOPS FP16 ballpark performance.

_jx7j · on March 9, 2022

I should have been more clear. I didn't mean the hardware, but the speedup you get from using mixed precision in something like Tensorflow with an NVIDIA GPU.

_jx7j · on March 9, 2022

Thanks. At least when I ran the benchmarks with Tensorflow, using mixed precision resulted in the CPU being used for training instead of the GPU on the M1 Pro. So if the hardware is there for fp16 and they will implement the software support for DL frameworks, that will be great.

mehmetoguzderin · on March 9, 2022

Yes, unfortunately, the software is to blame for the time being, and I also ran into issues myself. :\ Hope they catch up to what the hardware delivers well, including both the GPU and the Neural Engine.

mehmetoguzderin · on Oct 17, 2021

This setup sounds like a near-dream with that 3:2 display. The aspect ratio is one of my main drivers to use iPad 12.9 for doing things. But of course, it is always a double-edged sword with how well the split-screen looks with various apps on crammed display, etc., which makes this setup extra interesting because the freedom to adjust through software is much more here.

mehmetoguzderin · on Jan 19, 2021

I find it weird that so few tech people volunteer and dialogue on this topic; many aspects are waiting for exploration and exposition in the open. It is not all politics but rather a real phenomenon going on.

mehmetoguzderin · on Aug 5, 2020

Han Chinese are also speaking up for this: https://www.amnesty.org/en/latest/news/2020/06/witness-to-di...

(But I would not suggest discussing unless you are close friends beyond work)

mehmetoguzderin · on July 18, 2020

Neither Metal nor USDZ is buggy nor something like undesired software, though. On the contrary, both are exemplar and well-performing propositions.

raphlinus · on July 18, 2020

Metal is extremely buggy if you're pushing the state of the art. Patrick Walton has run into a number of very serious driver bugs in porting Pathfinder to Metal (he can reliably panic the kernel just by running shader code), and I've seen some as well, though today I'm working in Vulkan.

I'm hopeful for WebGPU in the future, not just for its features and expected cross-platform support, but because I expect a pretty thorough test suite to emerge, holding developers' feet to the fire to make their drivers actually work.

mehmetoguzderin · on July 18, 2020

I find the use of "extremely" to be extreme here or too forgiving of problems that arise while using other APIs. There are all kinds of state of the art algorithms implemented on top of Metal. But anything improper can easily switch off GPUs, you can even do that with CUDA.

raphlinus · on July 18, 2020

That's fair, I've certainly also had problems with Vulkan drivers from other vendors. So I'm not saying it's more buggy than other GPU platforms, but it is buggy, and it is more buggy than we expect of CPU-based language toolchains and runtimes.

mehmetoguzderin · on July 19, 2020

Unfortunately, that's the situation. Though in the last couple of years, there is a noticeable increase in tooling on all fronts.

pjmlp · on July 19, 2020

Because one of the top complaints is how Vulkan is found lacking versus proprietary API SDKs, even with what LunarG provides to Khronos.

pjmlp · on July 19, 2020

WebGPU is going to be yet another layer to blindly debug without vendors graphical debugging tools support.

Maybe those that enjoy outputting pixels or debugging messages will be fine with it.

mehmetoguzderin · on June 14, 2020

I think ML naysaying stems mostly from extravagant claims of pop-sci articles around papers or of the authors themselves. It is always good to have some amount of people that point out that the models aren't some cyborg superheroes or has to be a step towards AGI but be more practical.

mehmetoguzderin · on May 11, 2020

All kinds of smart techniques are emerging for ray tracing, too; you don't have to shoot the rays directly from the camera per se. Both rasterization and ray tracing will coexist to provide better graphics, and it was just that ray tracing didn't have the place it deserves on the chip.

mehmetoguzderin · on May 8, 2020

Metal is pretty nice, to be honest, with MSL being a real treat. And it is not too extra for a company with its custom GPU architecture to have its API too.

mehmetoguzderin · on March 28, 2020

WGSL [0] implementation of Boids: https://dawn.googlesource.com/tint/+/refs/heads/master/test/...

[0]: http://gpuweb.github.io/gpuweb/wgsl