I think this approach isn't ideal because you're representing pixels as 150x150 ...

SerCe · on Oct 3, 2023

Thank you for the suggestion. A couple of ML engineers with whom I've spoken after publishing the blog also suggested that I should try representing x and y coordinates as separate tokens.

briandw · on Oct 3, 2023

How would the Minkowski sum be used in the diffusion model? Is the idea to look at the Minkowski sum of the prediction and label?

Jack000 · on Oct 3, 2023

In pixel space a convnet uses pixel-wise convolutions and a pixel-kernel. If you represent a vector image as a polygon, the direct equivalent to a convolution would be the Minkowski sum of the vector image and a polygon-kernel.

You could start off with a random polygon and the reverse diffusion process would slowly turn it into a text glyph.