>It sounds like you're missing render to texture and multipass techniques, the ways to use textures as compute I/O. To do multipass in WebGL and share the results of computation from one pass to the next, you create an offscreen framebuffer for your results, and you render directly to that framebuffer. You can then use the result as an input texture for the next pass (via glCopyTexImage2D) or you can read back the buffer (via glReadPixels) and then you repeat. Using glCopyTexImage2D is much faster because you never leave the GPU.
How are you writing many pixels to a framebuffer in a single instance? frag_color returns one pixel. Even if that OpenCL implimentation works in webgl, you'd still be at only 8 threads, so only 8 pixels that can be done at the same time, then you have to pass in the output to process again. 259200 glCopyTexImage2D calls for a 1920x1080 image since its 2073600 pixels.
I'll definetely look into this more, since that openCL implimentation might hold some answers on how the error distance is seperated to allow for 8 rows at a time.
Whoa, hang on. Hey I only mean this to be helpful not insulting, but it sounds to me like you may have some misconceptions about the way WebGL works. I know how easily that can be taken the wrong way, especially in text, so again I apologize in advance and I don't mean that to be rude at all. It would be best to back up and understand WebGL.
If you're doing image processing in WebGL, then to write many pixels to a framebuffer all at once, you draw a single polygon that covers the entire viewport. Your shader is applied in parallel to all pixels drawn. That is how ShaderToy works, it renders a single quad to the viewport and applies whatever shader you give it, the GPU runs that shader on all pixels rendered.
There are never hundreds of thousands of buffer read calls, you only need a handful. For a blur, you only have to do a buffer read once, and your shader samples the 3x3 neighbor pixels.
You don't need OpenCL, that's a level of complication you don't need. I may have given the wrong impression with that link.
Check out this image processing tutorial using basic WebGL 1, and pay attention to how it works:
Here is the demo from that article that uses the techniques I've been talking about. All of the filters in this demo are doing neighborhood computations. And note you can apply multiple filters. There is no texture copy here, this tutorial renders directly to a texture in one pass, and then samples that texture in the next pass and so on. The iterations or feedback that you're looking for happen by combining render-to-texture with drawing the viewport polygon multiple times.
>If you're doing image processing in WebGL, then to write many pixels to a framebuffer all at once, you draw a single polygon that covers the entire viewport. Your shader is applied in parallel to all pixels drawn. That is how ShaderToy works, it renders a single quad to the viewport and applies whatever shader you give it, the GPU runs that shader on all pixels rendered.
>Whoa, hang on. Hey I only mean this to be helpful not insulting, but it sounds to me like you may have some misconceptions about the way WebGL works.
You draw a simple quad, which the shader then processes all the pixels in parallel, returning their frag_color as the output color. It can read textures and other source information, but it does not have access to what is being processed currently for other pixels, because its parallel. You have to wait until it has rendered that, and then pass it in again. I am unsure what I am misunderstanding.
This was a cool insight into using framebuffers for multiple passes, thank you.
>Here is the demo from that article that uses the techniques I've been talking about. All of the filters in this demo are doing neighborhood computations. And note you can apply multiple filters. There is no texture copy here, this tutorial renders directly to a texture in one pass, and then samples that texture in the next pass and so on. The iterations or feedback that you're looking for happen by combining render-to-texture with drawing the viewport polygon multiple times.
From what I'm seeing, this is not just a couple passes, its a pass per-filter. None of the filters singularly rely upon multiple passes, their output is calculated purely on the image that was filtered before it, meaning the filter itself doesn't need to write pixels and then read them again. You can see that in the for loop that calls setFramebuffer() and drawKernel(). I understand that you can apply a shader, and put its output back in, but I still fail to see how you're avoiding doing that at the very least in the hundreds of thousands of times. error diffusion dithering classically is sequential from top to bottom, or bottom to top depending upon serpentine or not, I don't think you can just process a ton of pixels at the same time and still look anything like a sequentially done Floyd-Steinberg.
Okay, this is good, you're almost there. BTW, I'm doing a bad job of explaining, and I'm sorry. I realize I'm complicating a few things and conflating a few things, so the best advice I can give is to actually go through that tutorial on image processing and write the code and understand the whole pipeline.
You're right; this demo is 1 pass per filter type. None of them require multipass, but the entire demo is multipass. There could be a filter that needed more than 1 pass, but in this case there isn't.
1 pass means: render all pixels in the viewport, and run the shader on all pixels rendered. You've got that part. The trick is you get to access a texture, and the texture is the result of the previous pass. Furthermore, inside the shader, you can address and access any pixels from the previous pass, and you can access multiple pixels from the previous pass to render one destination pixel.
I think the millions of reads you're looking for are the texture() calls happening inside the shader. The [render / render-to-texture / copy pixels / copy texture image] calls process all pixels in the viewport in a single call. The shader applies to all pixels, but a shader only gets to touch any given destination pixel once per render. But the shader can read as many source pixels as it wants.
Because the shaders aren't limited on their reads, but they are limited on their writes, you have to re-organize your algorithm accordingly. You keep reiterating that error diffusion is spreading out from top to bottom, and I keep re-iterating that it has to work differently in WebGL, we've been talking past each other a little bit here.
You're right; you can't spread things out (scatter) using WebGL during a single pass, you can't do the classic Floyd Steinberg implementation the same way you do on the CPU. So it's important to understand that there is another way to do it, and it doesn't look like what you're used to. It doesn't spread things out by pushing error around inside the loop. It spreads things out by letting each destination pixel pull the error from it's neighbors before computing it's own error, rather than pushing it's own error to it's neighbors after computing. This is known as a gather, as opposed to scatter. It is mathematically the same thing, but the order of operations is turned around.
Here's a diffusion demo, it's reaction diffusion, not error diffusion, but ultimately exactly the same diffusion process. Each pass diffuses the previous pass by 1 step.
>You're right; you can't spread things out (scatter) using WebGL during a single pass, you can't do the classic Floyd Steinberg implementation the same way you do on the CPU. So it's important to understand that there is another way to do it, and it doesn't look like what you're used to. It doesn't spread things out by pushing error around inside the loop. It spreads things out by letting each destination pixel pull the error from it's neighbors before computing it's own error, rather than pushing it's own error to it's neighbors after computing. This is known as a gather, as opposed to scatter. It is mathematically the same thing, but the order of operations is turned around.
That makes significantly more sense now. I thought you were saying I could do normal floyd-steinberg sequentially somehow.
I am unsure on the mathematical implimentation, but I will keep looking at this. I would think there would be a paper on this method somewhere.
I really appreciate the time you've taken to respond. Not many webgl people out there that can actually point out how it all works and whats possible, and I definitely learned something.
How are you writing many pixels to a framebuffer in a single instance? frag_color returns one pixel. Even if that OpenCL implimentation works in webgl, you'd still be at only 8 threads, so only 8 pixels that can be done at the same time, then you have to pass in the output to process again. 259200 glCopyTexImage2D calls for a 1920x1080 image since its 2073600 pixels.
I'll definetely look into this more, since that openCL implimentation might hold some answers on how the error distance is seperated to allow for 8 rows at a time.