>either CSS filtering or WebGL CSS only has basic filters, and WebGL is only pos...

dahart · on March 26, 2017

> but you can't actually get /back/ the processed data even if you got all the pixels into a single loop

Oh if you need the pixels back, you can certainly get them using glReadPixels(). It's a bit expensive in WebGL land, though it is much less expensive than looping over pixels using JS. And often there are ways to get around needing the pixels back, depending on what you're doing.

You can also do loads of non per-pixel tricks, including filters that need the surround, using multiple passes and clever shaders, you are not as limited with WebGL 1 as you think.

https://5013.es/toys/dithering/

https://www.shadertoy.com/view/MslGR8

This is what I'm talking about, you have to slice your program differently if you optimize it. Your high level structure and outermost loops will look different if you use WebGL instead of JS. And yes WebGL 2 is even better.

But, this is all much trickier than writing your straightforward loops in JS. That's the price you pay for performance, your architecture will be kinda crazy if you do it in WebGL, and it will not be easy to go back if you discover you optimized for the wrong thing.

devwastaken · on March 26, 2017

Random-order dithering can be done in WebGL, but they are of poor quality. I am talking about error-diffusion dithering, where you are reading and writing pixels, and then re-reading those modified pixels that change the rest of the pixels. In WebGL, you can read the pixels of the image, but you cannot modify and read. You are returning the actual pixel you have modified, which is what glReadPixels() returns. You could use webgl, but you'd have to use glReadPixels() for every pixel modified so that the next pixel knows the new data.

dahart · on March 26, 2017

Yes, ordered dither is easier in a shader than error diffusion, that's true, but error diffusion is definitely possible on a GPU. If you need it. Do you really need it? Why not still use the GPU to accelerate whatever parts you can? Even if you dither on the CPU, doing your color filters on the GPU instead of in JS could make the difference between interactive and not.

I don't know what you're doing exactly, but when you do need dithering, it's usually the last thing you need to do before output, and it's usually less important that the dither is fast than most other operations you need in an image editor.

BTW, random order is lower quality, and error diffusion is better, only for for very low res color palettes. If your result is more than 256 colors, it's irrelevant, and for high quality dithering, e.g., 16 bits per channel colors to 8 bits per channel for print, random ordered dithering is superior.

https://community.arm.com/graphics/b/blog/posts/when-paralle...

http://www.cs.cmu.edu/~imisra/projects/dithering/HybridDithe...

https://www.shadertoy.com/view/4dt3W7

That last one is ugly, but it's a great proof of concept here. You could get error diffusion in a shader using 2 passes. The first pass you render to texture, and the second pass, you can sample the texture however you want. Note how in this example, the error diffusion propagates backwards from what you would normally do in JS or C++, because it's a shader looping over destination pixels, not code that loops over source pixels.

> In WebGL, you can read the pixels of the image, but you cannot modify and read.

I don't know exactly what you mean here. You can modify & read pixels using render to texture, or using multiple passes.

Render to texture will be faster, if you can do it. Multiple round-trips from CPU to GPU and back will be slower, so you want to limit the number of trips, but it's easy to do.

devwastaken · on March 27, 2017

>Even if you dither on the CPU, doing your color filters on the GPU instead of in JS could make the difference between interactive and not.

Dithering requires quantization. If you mean things like brightness and contrast, yes, webgl is better for that. But quantization with error diffusion dithering is still based upon previous modified pixels, so you cannot just send a bunch of pixel info to webgl, you have to do each pixel seperately. Meaning, pushing through 2.5Million seperate inputs sequentially, and having to use glReadPixels() 2.5Million times.

>BTW, random order is lower quality, and error diffusion is better, only for for very low res color palettes.

Not in any of the images I have used. Ordered dithering looks fake, because it appears like a texture to the image. Dithering with error diffusion kernels, like Floyd-Steinberg create a less obvious texturization that still preserves the underlying image.

>https://www.shadertoy.com/view/4dt3W7

From what I'm reading, for every pixel, it is processing a 250 iteration loop that gets the pixels in the row and builds errors based upon those. However, it is not modifying any pixels as it goes, this is a parallel operation independent of each other. Good error diffusion with a kernel requires previous pixels to be modified, and errors to be built upon them, which is why it is a sequential operation. I don't see any destination-dependant pixel manipulation.

>Multiple round-trips from CPU to GPU and back will be slower, so you want to limit the number of trips, but it's easy to do.

If its possible to get data back from Uniform Buffer Objects in WebGL2, it may be possible to send up to 1000 or so uniform pixel values, but that is the general cap. Some GPU's have less. Unless I am missing some magic buffer you can use to write out many pixels too, and in that case I would be very interested in testing that, but shaders are designed to output a single pixel per instance.

>https://community.arm.com/graphics/b/blog/posts/when-paralle...

From what I'm reading, this is using OpenCL, and parrallelizing in rows, but these rows have height. A single thread processes one row and sequentially goes through the pixels like normal error diffusion. Just that this breaks different parts of the image up. I may have to try this out in JS and see if perhaps it can thread better that way. ty.

dahart · on March 27, 2017

> so you cannot just send a bunch of pixel info to webgl, you have to do each pixel seperately.

Why do you think that? You certainly can get previously modified pixels, you can send millions of pixels to WebGL with a single call (as a texture). Nobody calls glReadPixels millions of times, that's a bad idea. :) You might want to investigate multipass rendering techniques. Small kernel convolutions, for example, are standard and simple operations in WebGL. People use blur & edge filters all the time, and those depend on neighborhood computation.

Regarding the 1-d error diffusion shader on shadertoy, it is using a gather instead of a scatter. It's a pull instead of a push, he flipped the operation inside-out. It is still computing error diffusion correctly (but for only a single pixel row). And it's running at 60fps.

This is the whole point I tried to make above multiple times wrt performance: this code looks unrecognizable compared to the straightforward CPU serial way to implement error diffusion.

The reason this example is correct (in 1D) is because it recomputes the error propagation for every destination pixel; it's wasting almost all of the computation it's doing because in this case it's not sharing the error computation. But that doesn't mean it can't -- this is just a proof of concept on ShaderToy, not the limit of what you can do. You can't do multipass with ShaderToy, and multipass is how you share pixel results from one iteration to the next.

> Unless I am missing some magic buffer you can use to write out many pixels to

It sounds like you're missing render to texture and multipass techniques, the ways to use textures as compute I/O. To do multipass in WebGL and share the results of computation from one pass to the next, you create an offscreen framebuffer for your results, and you render directly to that framebuffer. You can then use the result as an input texture for the next pass (via glCopyTexImage2D) or you can read back the buffer (via glReadPixels) and then you repeat. Using glCopyTexImage2D is much faster because you never leave the GPU.

I think you could do error diffusion by rendering the error to a texture, and using a multipass technique that only needs as many iterations as the maximum distance any error could travel. In the worst case it'd probably be the greater of your image width or image height, e.g. 512 passes for a 512x512 image, but in practice I think you'd be done much sooner. That's assuming there isn't something hierarchical and more clever that could do it in log(width) passes, which I suspect there is.

devwastaken · on March 27, 2017

>It sounds like you're missing render to texture and multipass techniques, the ways to use textures as compute I/O. To do multipass in WebGL and share the results of computation from one pass to the next, you create an offscreen framebuffer for your results, and you render directly to that framebuffer. You can then use the result as an input texture for the next pass (via glCopyTexImage2D) or you can read back the buffer (via glReadPixels) and then you repeat. Using glCopyTexImage2D is much faster because you never leave the GPU.

How are you writing many pixels to a framebuffer in a single instance? frag_color returns one pixel. Even if that OpenCL implimentation works in webgl, you'd still be at only 8 threads, so only 8 pixels that can be done at the same time, then you have to pass in the output to process again. 259200 glCopyTexImage2D calls for a 1920x1080 image since its 2073600 pixels.

I'll definetely look into this more, since that openCL implimentation might hold some answers on how the error distance is seperated to allow for 8 rows at a time.

dahart · on March 27, 2017

> frag_color returns one pixel

Whoa, hang on. Hey I only mean this to be helpful not insulting, but it sounds to me like you may have some misconceptions about the way WebGL works. I know how easily that can be taken the wrong way, especially in text, so again I apologize in advance and I don't mean that to be rude at all. It would be best to back up and understand WebGL.

If you're doing image processing in WebGL, then to write many pixels to a framebuffer all at once, you draw a single polygon that covers the entire viewport. Your shader is applied in parallel to all pixels drawn. That is how ShaderToy works, it renders a single quad to the viewport and applies whatever shader you give it, the GPU runs that shader on all pixels rendered.

There are never hundreds of thousands of buffer read calls, you only need a handful. For a blur, you only have to do a buffer read once, and your shader samples the 3x3 neighbor pixels.

You don't need OpenCL, that's a level of complication you don't need. I may have given the wrong impression with that link.

Check out this image processing tutorial using basic WebGL 1, and pay attention to how it works:

https://webglfundamentals.org/webgl/lessons/webgl-image-proc...

Here is the demo from that article that uses the techniques I've been talking about. All of the filters in this demo are doing neighborhood computations. And note you can apply multiple filters. There is no texture copy here, this tutorial renders directly to a texture in one pass, and then samples that texture in the next pass and so on. The iterations or feedback that you're looking for happen by combining render-to-texture with drawing the viewport polygon multiple times.

https://webglfundamentals.org/webgl/webgl-2d-image-processin...

devwastaken · on March 27, 2017

>If you're doing image processing in WebGL, then to write many pixels to a framebuffer all at once, you draw a single polygon that covers the entire viewport. Your shader is applied in parallel to all pixels drawn. That is how ShaderToy works, it renders a single quad to the viewport and applies whatever shader you give it, the GPU runs that shader on all pixels rendered.

>Whoa, hang on. Hey I only mean this to be helpful not insulting, but it sounds to me like you may have some misconceptions about the way WebGL works.

You draw a simple quad, which the shader then processes all the pixels in parallel, returning their frag_color as the output color. It can read textures and other source information, but it does not have access to what is being processed currently for other pixels, because its parallel. You have to wait until it has rendered that, and then pass it in again. I am unsure what I am misunderstanding.

>https://webglfundamentals.org/webgl/lessons/webgl-image-proc...

This was a cool insight into using framebuffers for multiple passes, thank you.

>Here is the demo from that article that uses the techniques I've been talking about. All of the filters in this demo are doing neighborhood computations. And note you can apply multiple filters. There is no texture copy here, this tutorial renders directly to a texture in one pass, and then samples that texture in the next pass and so on. The iterations or feedback that you're looking for happen by combining render-to-texture with drawing the viewport polygon multiple times.

From what I'm seeing, this is not just a couple passes, its a pass per-filter. None of the filters singularly rely upon multiple passes, their output is calculated purely on the image that was filtered before it, meaning the filter itself doesn't need to write pixels and then read them again. You can see that in the for loop that calls setFramebuffer() and drawKernel(). I understand that you can apply a shader, and put its output back in, but I still fail to see how you're avoiding doing that at the very least in the hundreds of thousands of times. error diffusion dithering classically is sequential from top to bottom, or bottom to top depending upon serpentine or not, I don't think you can just process a ton of pixels at the same time and still look anything like a sequentially done Floyd-Steinberg.

dahart · on March 27, 2017

Okay, this is good, you're almost there. BTW, I'm doing a bad job of explaining, and I'm sorry. I realize I'm complicating a few things and conflating a few things, so the best advice I can give is to actually go through that tutorial on image processing and write the code and understand the whole pipeline.

You're right; this demo is 1 pass per filter type. None of them require multipass, but the entire demo is multipass. There could be a filter that needed more than 1 pass, but in this case there isn't.

1 pass means: render all pixels in the viewport, and run the shader on all pixels rendered. You've got that part. The trick is you get to access a texture, and the texture is the result of the previous pass. Furthermore, inside the shader, you can address and access any pixels from the previous pass, and you can access multiple pixels from the previous pass to render one destination pixel.

I think the millions of reads you're looking for are the texture() calls happening inside the shader. The [render / render-to-texture / copy pixels / copy texture image] calls process all pixels in the viewport in a single call. The shader applies to all pixels, but a shader only gets to touch any given destination pixel once per render. But the shader can read as many source pixels as it wants.

Because the shaders aren't limited on their reads, but they are limited on their writes, you have to re-organize your algorithm accordingly. You keep reiterating that error diffusion is spreading out from top to bottom, and I keep re-iterating that it has to work differently in WebGL, we've been talking past each other a little bit here.

You're right; you can't spread things out (scatter) using WebGL during a single pass, you can't do the classic Floyd Steinberg implementation the same way you do on the CPU. So it's important to understand that there is another way to do it, and it doesn't look like what you're used to. It doesn't spread things out by pushing error around inside the loop. It spreads things out by letting each destination pixel pull the error from it's neighbors before computing it's own error, rather than pushing it's own error to it's neighbors after computing. This is known as a gather, as opposed to scatter. It is mathematically the same thing, but the order of operations is turned around.

Here's a diffusion demo, it's reaction diffusion, not error diffusion, but ultimately exactly the same diffusion process. Each pass diffuses the previous pass by 1 step.

http://timhutton.github.io/webgl-reaction-diffusion/

devwastaken · on March 27, 2017

>You're right; you can't spread things out (scatter) using WebGL during a single pass, you can't do the classic Floyd Steinberg implementation the same way you do on the CPU. So it's important to understand that there is another way to do it, and it doesn't look like what you're used to. It doesn't spread things out by pushing error around inside the loop. It spreads things out by letting each destination pixel pull the error from it's neighbors before computing it's own error, rather than pushing it's own error to it's neighbors after computing. This is known as a gather, as opposed to scatter. It is mathematically the same thing, but the order of operations is turned around.

That makes significantly more sense now. I thought you were saying I could do normal floyd-steinberg sequentially somehow.

I am unsure on the mathematical implimentation, but I will keep looking at this. I would think there would be a paper on this method somewhere.

I really appreciate the time you've taken to respond. Not many webgl people out there that can actually point out how it all works and whats possible, and I definitely learned something.