- downloads 6.94GB SDXL model file somewhere without asking or showing location/size. Just figured out you can find/modify the location in the settings.
- very slow on first generation as it loads the model, no record of how long generations take but I'd guess a couple minutes (m1 max macbook, 64GB)
- multiple user feedback modules (bottom left is very intrusive chat thing I'll never use + top right call for beta feedback)
- not open source like competitors
- runs 7 processes, idling at ~1GB RAM usage
- non-native UX on macOS, missing hotkeys you'd expect, help menu. electron app?
Thanks. Yeah I played with your app early on and just fired it up again to see the progress. Frankly I find the interface pretty intimidating but it is cool that you can easily stitch generations together.
Unsolicited UX recs:
- strongly recommend a default model. The list you give is crazy long. It kind of recommends SD 1.5 in the UI text below the picker but has the last one selected by default. Many of them are called the same thing (ironically the name is "Generic" lol).
- have the panel on the left closed by default or show a simplified view that I can expand to an "advanced" view. Consider sorting the left panel controls by how often I would want to edit them (personally I'm not going to touch the model but it is the first thing).
You are doing great work but I wouldn't underestimate the value of simplifying the interface for a first-time user. It seems to have a ton of features but I don't know what I should actually be paying attention to / adjusting.
Is there a business model attached to this or do you have a hypothesis for what one might look like?
Agreed on UX feedback. It accumulated a lot of crufts from the old technologies to the new. This just echos my early feedback that co-iterating UI and the technology is difficult, you'd better pick the side you want to be on and there is only one correct side (and unfortunately, the current app is trying hard to be on both-side).
I am. I thought this is obvious. My statement is objective. I would go as far as: it is the only app works at 8GiB macOS devices with SDXL-family models.
"You should check out this thing" has a very different implied context than "You should check out this thing I made". The first sounds like a recommendation from an enthusiastic user, not from the the author. Because of this, discovering that you are the author makes your recommendation feel deceptive.
I am sorry if you feel that way. I joined HN when it was a small tight-knit community without much of marketing presence. The "obvious" comment is more like "people know other people" kind of thing. I didn't try to deceive anyone to use the app (and why should I?).
If you feel this is unannounced self-promotion, yes, it is, and can be done better.
---
Also, for the "objective" comment, it meant to say "the original comment is still objective", not that you can be objective only by being a developer. Being a developer can obviously bias your opinion.
Should be in a few days. I asked Stability to clarify whether I can deliver weights through my Cloudflare bucket and whether qualifying as non-commercial is who runs the model, not who delivers the model.
2008, when we both joined, was 15 years ago. In the interim, the userbase has grown. Most people aren't recognizable as the author of an app under discussion, so a simple "Developer here" is appreciated as it was not obvious to me.
Whoa, well let me just say thanks for the awesome app!! it's pretty entertaining to spin this up in situations where I don't have internet (Airplane, subway etc.)
I was also surprised on how well it ran on my iPhone 11 before I replaced it with a 15 pro.
(Let me know if you're looking for some Product Design help/advice, totally happy to contribute pro bono. No worries if not of course!)
Nice app - but for future reference it is very much not obvious to any native English speaker. "You should check out X" sounds like a random recommendation.
I've been generating stuff non-stop in Draw Things for a few days, it's very good. Agree with the comments elsewhere about the rather overwhelming UI, and I have only one feature request: let us input the number of images we want to generate - the 100 limit means I keep having to check if it's finished to restart it.
If you want an absolute beast, especially for this stuff, you probably want Intel + Nvidia. Apple Silicon is a beast in power efficiency but a top of the line M3 does not come close to the top of the line Intel + Nvidia combo.
If it's just for hobby/interest work, then just a heads-up that even the 1st generation Apple Silicon will turn over about one image a second with SDXL Turbo. The M3s of course are quite a bit faster.
The performance gains in recent models and PyTorch are currently outpacing hardware advances by a significant margin, and there are still large amounts of low-hanging fruit in this regard.
I like ComfyUI the most now but it's probably not the most beginner friendly. But has great features, is extensible, and you can build workflows that work for you and save them so you don't have to click a million times like Auto1111.
I just installed InvokeAI and wish I hadn't. It installs -so much- outside of its target directory. A1111 and ComfyUI are fairly self contained where you put them.
It's all isolated in a single directory, though, right? I set it up ages ago, but my recollection is that it installs itself in ~/invoke on Linux and stays contained there.
There are already a number of local, inference options that are (crucially) open-source, with more robust feature sets.
And if the defense here is "but Auto1111 and Comfy don't have as user-friendly a UI", that's also already covered. https://github.com/invoke-ai/InvokeAI
I switched to InvokeAI and won't go back to basic a1111 webui. I like how everything is laid out, there are workflow features, you can easily recall all properties (prompt, model, lora, etc.) used to generate an image, things can be organized into boards, and all off the boards/images/metadata are stored in a very well-designed sqlite database that can be tapped into via DataGrip.
automatic1111: great for the fast implementation of the most recent generative features
comfyui: excellent for workflows and recalling the workflows, as they're saved into the resulting image metadata (i.e. sharing images, shares the image generation pipeline)
InvokeAI: Great UX and community, arguably were a bit behind in features as they were focused on making the UI work well. Now at the stage of bringing in the best features of competitors - Like you, I can easily recommend it above all other options.
> recalling the workflows, as they're saved into the resulting image metadata (i.e. sharing images, shares the image generation pipeline)
Doesn't a1111 already do this? Theres a PNG Info tab where you can drag and drop a PNG and it will pull all the prompt, inverse prompt, model, etc. And then a button to send it to the main generation tab. It doesn't automatically load the model, but that may be intentional because of how long it takes to change loaded models.
Not that provides the same thing, no, largely because of fundamental design differences.
> Theres a PNG Info tab where you can drag and drop a PNG and it will pull all the prompt, inverse prompt, model, etc. And then a button to send it to the main generation tab.
A1111 by nature, has a bunch of disconnected operations in separate tabs and scripts. Even if the PNG captures all of a generation operation that would be executed by a single launch-button click, its not really equivalent to capturing a whole ComfyUI workflow, which can be the equivalent of a process which would be numerous different tasks in A1111 with manually shuttling data between tabs and scripts.
A1111 has a bunch of manual "send to X" buttons to do with the output of runs, so that they can be the input of another task, wherein in Comfy those operations are part of one workflow with a pipeline connecting the output of one to the input of another. And when saving generation data, those manual shuttle points in A1111 are barriers as to what is part of a single generation that can be saved.
Can you actually use those workflows in some sort of API from a script to automate it from lets say a python script. Played arround with comfy. Really nice, but i would like to automate it within my own environment.
It's just missing too many features for me still, even though I like what it has better. I use things like segment-anything, customer upscalers, I prefer how inpainting is controlled in A1111 where you can say if you want whole image or mask area only, etc.
I've personally been using SD.Next, which is a fork of A1111 with support for the diffuser backend, a cleaned-up UI, and also sometimes has support for newer things before A1111, though not always. It's plugin compatible with A1111.
No idea whether or not the UI is user-friendly, but the installation steps alone for InvokeAI are already a barrier for 99.9% of the world. Not to say Noiselith couldn't be open-source, but it's clearly offering something different from InvokeAI.
I can't even figure out how one would install Noiselith. It has some text that says "Download for free on your PC", but it's not a button or a link. Maybe they're doing some weirdly locked-down user-agent sniffing and refuse to allow me to even attempt to download any version on Linux?
InvokeAI is installed via a script, sure, but it's also just a few clicks: download, extract, double-click on a specific file, enjoy.
There are a bajillion local SD pipelines, but this one is, by far, the one with the highest quality output out-of-the-box, with short prompts. Its remarkable.
And thats because it integrates a bajillion SDXL augmentations that other UIs do not implement or enable by default. I've been using stable diffusion since 1.5 came out, and even having followed the space extensively, setting up an equivalent pipeline in ComfyUI (much less diffusers) would be a pain. Its like a "greatest hits and best defaults" for SDXL.
I was afraid of the Python setup (even though I'm a Python developer), but yep: Make the virtualenv, install the dependencies, done. This is amazing, the images it generates are immediately beautiful.
It does look bad that it bundles GTM, though, as a sibling commenter says.
> Be sure to try the styles as well. Thats actually a seperate input than the prompt for SDXL.
No, its not.
There are two text encoders, but they aren't really “prompt” and “style” inputs.
> and most other UIs dont implement the style prompting.
Most UIs default mode of operation sends the same input to both text encoders, but at least comfy has nodes that support sending separate text to them. OTOH, while there may be some cases where sending different text to the two encoders helps in a predictable way, AFAIK most of the testing people has done has shown that optimal prompt adherence usually comes from sending the same to both.
I’m not sure the origin, but using ViT-L (the encoder shared with SD1. x) for what you might call the main prompt and ViT-G (the new SDXL encoder, and also a successor to the encoder used as the single encoder in SD 2.x) for a style prompt was a common idea shortly after SDXL launched, so its understandable.
OpenCLIP ViT-G and CLIP ViT-L. The latter is the same encoder used in SD 1.x, OpenCLIP ViT-H was used as the encoder in SD 2.x, and ViT-G is, as I understand it, a successor and improvement on ViT-H.
Yeah, it is, just need to set an env var of GRADIO_ANALYTICS_ENABLED=False to stop it - probably should be added into launch.py along with the other env vars being set at launch.
Just spent about 10 minutes building it on MacBook Pro M1. I come with significant bias against Python projects, but getting Fooocus to run was very, very easy.
did you get fooocus to run on silicon mac with mps support? i cant get mps support running for the love of god - any help would be much appreciated from me (as well as the 20 or so people plus that are currently looking for a solution on github to achieve normal speed generation compared to the 15 minutes per image :)) thank you
Mine takes about ~3 min per image, didn't do anything special. Left everything at default settings after my initial install (about 5 days ago). Not speedy but certainly not 15 min. Running on an M1 with 32gb ram.
Yeah, Fooocus is much better if you are going for the best local generated result. Lvmin puts all his energy into making beautiful pictures. Also it is GPL licensed, which is a + in my book.
Eh, I messed around with it for a while - it's okay and good for beginners, but without much more effort you can get better results out of A1111 or ComfyUI
Not really. There is a very fast LCM model preset now, but its still going to be painful.
SDXL in particular isn't one of those "compute light, bandwidth bound" models like llama (or Fooocus's own mini prompt expansion llm that in fact runs on the CPU).
i use the same. Any ideas where to find actually (not outdated) guides on how to create your own "modell" out of the most similiar picutures of my dream modell? want to use it further with the same face on it.
Looks like a complete contraption to setup and looks very unpleasant to use at first glance when compared against Noiselith.
The hundreds of python scripts and having the user to touch the terminal shows why something like Noiselith should exist for normal users rather than developers or programmers.
I would rather take a packaged solution that just works over a bunch of scripts requiring a terminal.
Okay. I'll need to install it? What package might that be in, hmm. Moving on, I already know it's python.
> /usr not writeable
Guess I'll use sudo...
= = =
Obviously I know better than to do this, but very few people would. This is not 'dead simple'! It's only simple for Python programmers who are already familiar with the ecosystem.
Now, fortunately the actual documentation does say to use venv. That's still not 'dead simple'; you still need to understand the commands involved. There's definitely space for a prepackaged binary.
The people that make software that does useful things, and the people that understand system security live on different planets. One day they'll meet each other and have a religious war.
This said, it's nice when developers attempt to detect the executable they need and warn what package is missing.
There are projects that set up "fat" Python executables or portable installs, but the problem in PyTorch ML projects is that the download would be many gigabytes.
Additionally, some package choices depend on hardware.
In the end, a lot of the more popular projects have "one click" scripts for auto installs, and there are some for Fooocus specifically, but the issue there is its not as visible as the main repo, and not necessarily something the dev wants to endorse.
You have to make trade-off in software development. Fooocus trades on the best picture rather than the most beautiful interface, and also simplicity in its use. I think it is a good trade-off given the technology is improving at breaking-neck speed.
Look, DiffusionBee is still maintained but still no SDXL support.
Anyone who bet that the technology is done and it is time to focus on the UI is making the wrong bet.
This project is really cool and I like the stated philosophy on the README. I think it's making the right trade-off in terms of setting useful defaults and not showing you 100 arcane settings. However, the installation is too hard. It's a student project and free so I'm not criticizing the author at all but I think it's a pretty fair and useful criticism of the software and likely a significant bottleneck to adoption.
Huh? It has a really simple interface, much much much simpler than anything else that uses SD/SDXL locally. Installation is also simple for Windows/Linux, don't know about macOS.
I realize it may be good marketing, but it's odd to have the fact that it's on device and offline be the primary differentiator when that's probably how most people use Stable Diffusion already.
I'd probably focus more on it being easy to install and use, as that's something that isn't done much. For me, if it doesn't have Controlnet, upscaling, some kind of face detailer, and preferably regional prompting, I'm out.
I also kind of wish all of these people that want to make their own SD generators would instead work on one of the open source ones that already exist.
While an app store might be a good idea, in a world with Auto111 and all of their extensions I think it's going to go over poorly with the Stable Diffusion community, for what it's worth.
You hit the nail on the head when you said it's good marketing, but go all the way. The thing you find odd tells you who they want to use their product; You're not their target audience. They are trying to convert people from using online-only services like Dall-E, not people who already use SD.
I think there's probably a bunch of people who don't use things like A1111 because of the complexities of the download-this-which-downloads-this-which-downloads-this-then-you-manually-download-this-and-this setup model.
I can see how something simpler might appeal to new users, even if it doesn't appeal to existing users.
I’ve oddly found many cloud wrappers to stable diffusion. So I like the upfront on device/offline description.
It was weird when I was first playing with SD how many packages did severe phone home or vms or whatever instead of just downloading a bunch of stuff and running it.
Sales prompt: "Young woman with blonde curls in front of a fantasy world background, come hither eyes, sitting with her legs spread, wearing a white shirt and jeans hot pants."
If the prompt wasn't somewhat sexual, divisive, or offensive it would be wide open to the chorus of "still not as good as midjourney/dall-e/imagen". Freedom from restriction is one of the main selling points.
I'm genuinely curious how many people in the open source community are pouring their sweat and blood into these projects that are, at the end of the day, enabling guys to transform their macbooks into insta-porn-books.
After installation, it wouldn't run on my Windows machine unless I granted public and private network access. Kinda tripped up since it says "offlilne".
On the first run it downloads about 30GB of data. I don't know if it would work offline on subsequent runs because for me it never ran again without crashing!
Also upon uninstallation it left behind all its data (not user data, mind you. But the executable itself, its python venv, its updater, and all the models. Uninstall basically just removed the shortcut in the start menu).
definitely exciting to see more local clients come out. As mentioned in other comments, there are some great ones out already. I've used automatic1111 which is quick and doesn't require a ton of tuning. But it still has lots of knobs and options which makes it difficult initially. Fooocus is super quick but of course less customization.
Then there's ComfyUI, the holy grail of complicated, but with that complication comes the ability to do so much. It is a node-based app that allows you to create custom workflows. Once your image is generated, you can pipe that "node" somewhere else and modify it, eg: upscale the image or do other things.
I'd like to see if Noiselith or some others offer support for SDXLTurbo -- it came out only a few days ago but in my opinion is a complete game-changer. It can generate 512x512 images in ~half a second on consumer GPUs. The images aren't crazy quality but that ability to make a prompt like "fox in the woods", see it instantly and then add "wearing a hat" and see it instantly generate again is so valuable. Prior to that, I'd wait 12 seconds for an image. Sounds like not a big deal, but the value of being able to iterate so quickly makes local image gen so much more fun.
I'm being kind of tongue in cheek because I understand that this is for just making things really easy and ComfyUI is a node based editor that most people would have trouble with. But the best UI for local SD generation that the community is using is https://github.com/comfyanonymous/ComfyUI
If you are a programmer at heart, ComfyUI will feel very comfortable (pun intended). It's basically a visual programming environment optimized for the type of compositional programming that machine learning models desire. The next thing this space needs is someone to build an API hosting every imaginable model on a vast farm of GPUs in the cloud. Use ComfyUI and other apps to orchestrate the models locally, but send data to the cloud and benefit from sharing GPU resources far more efficiently.
If anyone has a spare thousand hours to kill, I would build that and connect it up with the various front-ends including ComfyUI, A111, etc.. not a small amount of effort, but it will be rewarding.
Agreed. It's worth the learning curve for the sheer power you can enable your workflows. I've always wanted to toy around with node based architectures and this seemed quite easy after using A1111 extensively. The community providing ready to go workflows has made it quite enjoyable too.
I can’t seem to get myself to switch. I’ve only used A1111 a dozen times and only for funny work images… I can’t seem to get myself to switch over to comfy because it looks rather intimidating.
Haven't gotten to test it but given i use CoreML on Comfy, i wonder if we'll see more optimizations and performance work on the back end of these platforms as more useful frontends come out. the 1-4it/s on a 512 image is just sad, and the 2-3s/it on 1024 is just sad in this modern day, hell the ANE can't even run SD 1024x1024 images on a Macbook Pro M3 :S
As others have stated, Local AI (completely offline after model/weight download) is the way to go. If I have the hardware why shouldn't I be able to run all these fancy software on my own machine?
There are many great suggestions and links to other similar/better packages, so follow the comments for more info, thanks :-)
What's the privacy and licensing like for this? I'm honestly too ignorant to know if someone's allowed to use this for commercial purposes, or even if it's sending the generated images/prompts somewhere even if it's rendering locally.
The 16GB (base model) M2 Pro Mini, despite its overall awesomeness (running DiffusionBee.app / etc)... does not meet Minimum System Requirements (Apple Silicon requires 32GB RAM).
So now I have to contemplate shopping for a new mac TWICE in one year (never happened before).
Good lord. I can get a 2048x2048 upscaled output from a very complex ComfyUI workflow on a 4090 in 15 seconds. This includes three IPAdapter nodes, a sampling stage, a three-stage iterative latent upscaler, and multiple ControlNets. Macs are not close to competitive for inference yet.
I mean, a 4090 would appear to cost $2000, and came out a year ago; it has about 70bn transistors. The M1 could be had for $700 for a desktop, $1000 as part of a laptop, came out three years ago, and has 16bn transistors, some of which are CPU.
An M3 Ultra might be a more reasonable comparison for the 4090.
When choosing a machine with non-expandable RAM, you went with the minimum configuration? That's a choice, I suppose, but the outcome wasn't exactly hard to foresee.
At the time, it seemed that ANY upgrade made the M2Pro mini cost-inaffective.
One of my coller Q12023 ChatGPT experiences was having it help me "reason through" which machine was most "upgrade-proof," dollar-for-dollar.
Now in Q42023, I would definitely had made the decision to purchase an M2 Studio (base model) instead — those additional upgrades (VS M2Pro mini config, sim.) were much more cost-effective. Overall, I'm extremely satisfied with my M2Pro base model.
The main issue with AMD is that to get reasonable performance you need to use ROCm, and ROCm is only available on Linux. They started porting parts of ROCm to Windows but it's not enough to be usable yet, might be different in few months.
I think it is a matter of why AMD does not support these projects. NVIDIA is involved everywhere. They could easy do the same. At least to what I have observed on the internetz.
Pros:
- seems pretty self contained
- built in model installer works really well and helps you download anything from CivitAI (I installed https://civitai.com/models/183354/sdxl-ms-paint-portraits)
- image generation is high quality and stable
- shows intermediate steps during generation
Cons:
- downloads 6.94GB SDXL model file somewhere without asking or showing location/size. Just figured out you can find/modify the location in the settings.
- very slow on first generation as it loads the model, no record of how long generations take but I'd guess a couple minutes (m1 max macbook, 64GB)
- multiple user feedback modules (bottom left is very intrusive chat thing I'll never use + top right call for beta feedback)
- not open source like competitors
- runs 7 processes, idling at ~1GB RAM usage
- non-native UX on macOS, missing hotkeys you'd expect, help menu. electron app?
Overall 4/5 stars, would open again :)