100%. Multimodal images surpass ComfyUI and inpainting (for now). It's a step fu...

100%. Multimodal images surpass ComfyUI and inpainting (for now). It's a step function improvement in image generation.

I'm hoping we see an open weights or open source model with these capabilities soon, because good tools need open models.

As has happened in the past, once an open implementation of DallE or whatever comes out, the open source community pushes the capabilities much further by writing lots of training, extensions, and pipelines. The results look significantly better than closed SaaS models.