Hacker News new | past | comments | ask | show | jobs | submit login

There's a degree of GPU-style going on here, but its not OpenGL or DirectX.

  for y in 0..height {
    for x in 0..width {

      // Get target position
      let tx = x + offset;
      let ty = y;
So this code, in a language I'm not too familiar with, is clearly a GPU concept. Except, this 2-dimensional for-loop is executed in parallel on modern GPUs in the so-called pixel-shader.

A Pixel-shader is all sorts of complications in practice that deserves at least a few days of studying the rendering pipeline to understand. But the tl;dr is that a pixel-shader launches a thread (erm... a SIMD-lane? A... work-item? A shader?) per pixel, and then the device drivers do some magic to group them together.

Like, in the raw hardware, pixel0-0 is going to be rendered at the same time as pixel0-1, pixel0-2, etc. etc. And the values inside of this "for loop" are the code that runs it all.

Sure its SIMD and all kinds of complicated to fully describe what's going on here. But the bulk of GPU-programming (or at least, for pixel shaders), is recognizing the one-thread-per-pixel (erm, SIMD-lane per pixel) approach.

------------------

Anyway, I think this post is... GPU-enough. I'm not sure if this truly executes on a GPU given how the code was written. But I'd give it my stamp of approval as far as "Describing code as if it were being done on a GPU", even if they're cheating for simplicity in many spots.

The #1 most important part is that the "rasterize" routine is written in the embarrassingly parallel mindset. Every pixel "could" in theory, be processed in parallel. (Notice that no pixels have race-conditions or locks, or sequencing needed with each other).

And the #2 part is having the "sequential" CPU-code logically and seamlessly communicate with the "embarrassingly parallel" rasterize routine in a simple, logical, and readable manner. And this post absolutely accomplishes that.

Its harder to write this cleanly than it looks. But having someone show you, as per this post, how it is done helps with the learning process.




It is a Rust application making use of wgpu, Rust's WebGPU native library.


Nope.

Pixel shaders in WebGPU / wgpu are written in WGSL. The above 2-dimensional for-loop is _NOT_ a proper pixel shader (but it is written in a "Pixel Shader style", very familiar to any GPU programmer).


The author didn't say it, but I'm pretty sure for-loop was meant to be pseudocode to help the reader understand what it does and not the actual implementation.


I'm pretty sure this whole post is a shitpost. A well written joke, and one I enjoyed. But a shitpost nonetheless.

Upon closer inspection, the glyphs are each rendered onto the framebuffer sequentially... one-at-a-time. IE: NOT in an embarrassingly parallel manner. So the joke is starting to fall apart as you look closely.

But those kinds of details don't matter. The post is written well enough to be a good joke but no "better" than needed. (EDIT: It was written well enough to trick me in my first review of the article. But on 2nd and 3rd inspection, I'm noticing the problems, and its all in good fun to see the post degenerate into obvious satire by the end).


The rasterizer doesn't even do any rasterization. It just blends the already rasterized glyphs onto the screen.

Honestly it sounds like AI. This is a website in the shape/memory of a blogpost, not an actual blogpost.


Because it is Rust code?!?

"...An easy tutorial in Rust"

A short visit to the authors blog clearly shows they know what they talk about.


It’s not just the language. That code is impossible to directly translate to a pixel shader because GPUs only implement fixed-function blending. Render target pixels (and depth values) are write-only in the graphics pipeline, they can be only loaded with fixed-function pieces of GPUs: blending, depth rejection, etc.

It’s technically possible to translate the code into compute shader/CUDA/OpenCL/etc., but that gonna be slow and hard to do, due to concurrency issues. You can’t just load/blend/store without a guarantee other threads won’t try to concurrently modify the same output pixel.


Tilers (mostly mobile and Apple) generally expose the ability to read & write the framebuffer value pretty easily - see things like GL_EXT_shader_framebuffer_fetch or vulkan's subpasses.

For immediate mode renderers (IE desktop cards), VK_EXT_fragment_shader_interlock seems available to correct those "concurrency" issues. DX12 ROVs seem to expose similar abilities. Though performance may be hit more than tiling architectures.

So you can certainly read-modify-write framebuffer values in pixel shaders using current hardware, which is what is needed for a fully shader-driven blending step.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: