Important note from the paper - the resolution is limited to 384x384 currently.

just-ok · 2025-01-27T18:01:58 1738000918

Seems like a massive buried lede in an “outperforms the previous SoTA” paper.

franktankbank · 2025-01-27T18:07:37 1738001257

Great for generating favicons!

jimmyl02 · 2025-01-27T18:22:17 1738002137

don't most architectures resolve this via superscaling / some up scaling pipeline after that adds the details?

iirc stable diffusion xl uses a "refiner" after initial generation

dragonwriter · 2025-01-27T19:18:13 1738005493

The SDXL refiner is not an upscaler, it's a separate model with the same architecture used at the same resolution as the base model that is focussed more on detail and less on large scale generation (you can actually use any SDXL-derived model as a refiner, or none; most community SDXL derivatives use a single model with no refiner and beat the Stability SDXL base/SDXL refiner combination in quality.)

vunderba · 2025-01-27T18:04:38 1738001078

Ouch, that's even smaller than the now-ancient SD 1.5 which is mostly 512x512.

ilaksh · 2025-01-27T18:22:21 1738002141

The obvious point of a model that works like this is to see if you can get better prompt understanding. Increasing the resolution in a small model would decrease the capacity for prompt adherence.