Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hey HN, one of the team members here!

I hope you all enjoy playing with the new and improved generator! We've been hard at work improving the model quality since the last time the site was posted[1]

As both a professional fantasy illustrator & software engineer, I find the concept of AI creativity so fascinating. On one hand, I know that mathematically AI only can hallucinate images that fit within the distribution of things that it's seen. But from the artist perspective, the model's ability to blend two existing styles into something so distinctly new is so incredible (and not to mention also commercially useful!)

Anyways, happy to answer any question, thoughts, or concerns!

---

[1] https://news.ycombinator.com/item?id=20511459]



Naïvely I thought Waifu generator was just “some guy having a laugh” fine-tuning a model off of hugging face, but reading through the comments here, it is obviously a much, much bigger enterprise.

Can you talk a little about team size, work process, funding and revenue stream? I think the effort required for such an undertaking is vastly underestimated by readers.


Right now it's a small team of 6 people, and we have a bit of funding + compute credits to train models. There's a bit of revenue from some past projects and AI-consulting, but we're mostly betting big on our new AI-powerd mobile title Arrowmancer[1].

> I think the effort required for such an undertaking is vastly underestimated by readers.

Haha for sure. Hosting a real-time ML model for people to do sub 1-second inferences at HN-load scale is definitely nontrivial.

[1] https://arrowmancer.com


That seems like a nice project that you're working on, definitely more high effort than some other attempts at generated art (procedurally or otherwise).

I can't help but to feel that this would be a better fit for the fad of NFTs as well, as opposed to ugly monkeys or other asset flips that were pretty obvious cash grabs.

Either way, good luck!


I'm guessing your referring to the "one of a kind" part of NFTs, which made me wonder, how would they validate that the created Waifu hasn't been created before?

The question is, would people buy, since there are near endless, and a huge difference is that these don't have a story attached to them...


> I'm guessing your referring to the "one of a kind" part of NFTs, which made me wonder, how would they validate that the created Waifu hasn't been created before?

Oh, not even that, since while some data ends up on the blockchain, i find the whole ownership concept a bit nebulous at best, much like how people mock the whole "oh, just right click the image to save it" thing.

What i was actually referring to was more along the lines of the collector economy idea itself - people willing to exchange their monetary resources for something that they enjoy. Having it be out of a sense of endearment and aesthetic enjoyment (waifus and husbandos) rather than the desire to flip those things (possibly a number of other NFTs or collectibles) seems to be less morally questionable to me!

While i'm not really familiar with the whole NFT space, to me it seems that projects like CryptoKitties are more agreeable and thus viable in this space, as would AI generated images of cute characters! Actually, that seems like a really good fit to me, regardless of how the actual "uniqueness" or "ownership" aspect would play out.


Not to get too deep into NFTs as I to aren't an expert, but from what I understand the images that are used in NFTs are hosted on from a server and the owner of the NFT doesn't own the server, thus do they really own the image?

I too like to collect things, my latest obsession was lingerie. But I always have the image of extreme hoarders in the back of my mind, which quite often scares me out of collecting to much.

I'm not scared of having too much stuff, I'm scared of not being able to notice it myself.

NFTs could ease that problem. But sadly I'm one of those people that prefer analog radio over digital. Wired over wireless.


The smart contract can do a content addressable way of identifying the image, with e.g. IPFS, and then, while in practice there may be a server which is hosting it for people, if anyone has the image saved, they can (if they choose to) continue act as a host for the image of the original server stops being around.

Not, uh, that NFTs for images like this aren’t silly and largely pointless,

but at least for some of them, the “there’s a server hosting the image” isn’t that much of an issue (at least provided that the person who “owns it” keeps the file locally and backed up)


Probably for it to work you would have to limit them and have a set of hand selected images. Seemingly people will buy anything if you can hype it enough.


I guess you could limit it with a colour palette and that way give yourself an option to bring special colour palette images in the future, for special events or occasions.


This sounds a lot like what games like Counter Strike: Global Offensive do with their cosmetic item skins, where they have rarity for each and also specific packages with themes and whatnot.

Now that's a fun business idea (ethically questionable gambling related aspects aside, guess it's about how one markets that and how honest they are).


Waifulabs is certainly a good marketing instrument for Arrowmancer. And regardless of whether the game gains traction it will be a great showcase in case you ever want to offer this as an API for other mobile game creators.


> Naïvely I thought Waifu generator was just “some guy having a laugh”

same here. what's naive about it?

not to badmouth the undertaking, but wtf is this doing on HN?


Apparently it takes 6 people making a business to run a waifu generator. That pretty far from one person doing it as a joke.


well, apparently...

i sincerely applaud the creativity of establishing such a business


Firstly, amazing work.

My question is, how do you figure out how to parameterize "Same character, different pose" / "Same character, different eyes" / "Same character, different gender" / etc?

My (super limited) understanding of GANs is that they slowly discover these features over time simply from observation in the data set, and not from any labels.

So how could you make e.x. a slider for head position, style, pose, etc? How do you look at the resulting model and figure out "these are the inputs we have to fiddle with to make it use a certain pose"?

You mention it a bit in this section, but I didn't fully understand: "By isolating the vectors that control certain features, we can create results like different pose, same character"

And I assume the same step needs to be done every time the model is retrained or fine-tuned, because possibly the vectors have shifted within the model since they are not fixed by design?


Yes, your understanding is correct!

You can think of it like coordinates on a many-dimensional vector grid.

We craft the functions the functions that will illuminate sets of those points based on a combination of observation, what we know about our model architecture, and how our data is arranged.

And yes, when the model is retrained, we have to discover them again!


Can you share any resources for reading on this particular topic?


Not affiliated with this project, but there is a gazillion different variations of GANs. Most just change the adversarial loss to improve the learning rate / quality, but others focus on architectural changes, such as StarGAN, Pix2pix (conditional GAN), CycleGAN, MUNIT, etc. It's really a fascinating field.


Roughly speaking how much money did you invest into making this? Just curious if this is something an indie hacker can hope to do one day OR do you need some deep pockets to make a site like this?


Fascinating... Thanks for sharing

A couple questions:

1) I didn't really understand how you went about identifying what vectors of the latent space stand for various things, like pose or color. Did you train one of the AIs to that effect, or did you manually inspect a bunch of vectors, twiddling through them one by one, did to the outcome?

2) If one were to train an AI to the same level using commodity cloud services, what's the order of magnitude cost that you would pay for the training? More like $100, $1,000, $10,000 or $100,000?


1) It was mostly manual, though AIs were useful in certain filtering tasks.

2) Depends on the quality you are seeking. If you only want one run of a similar, off-the-shelf model, around the 1000s is enough. But at the number of iterations you have to run to build your own and improve results, you probably need about 100k.

To tackle this problem, we built our own supercomputer off of parts we bought off of ebay, though I can't say I recommend that route, because it now lives in our living room.


I think that requires one more blog post.


Very curious about the computer, what are the internals?


Not the OP, but my guess would be threadrippers (or similar w/lots of PCIe lanes), each with a great number of GPUs. That's usually what you'd do for training AI in a home lab.

Server processors gets you more bang for the buck... iff you're planning to run the hardware flat out for literally years. You save on power, but the up-front cost takes up most of that, so for a system that's mostly idle you wouldn't use them. On the other hand, any CPU with fewer PCIe lanes than a TR won't be able to run multiple GPUs optimally, and TRs are relatively cheap enough to make the reduction in PSUs/chassises worth it.

Not to mention that there are some approaches to training you can only use if you have multiple GPUs on the same motherboard, aka. sharding a single model across GPUs without communications overhead killing any benefit of that.


Will probably blog probably in a week or two.

But the tldr; is a lot of scavenged EPYCs and nvidia GPUs all in a large sound-proof rack.


You mention it took two weeks to get to the point that we see in the article.

Does this mean two weeks of development, or two weeks to generate the images we're seeing? Or maybe did you train the model for two weeks? That point just wasn't exactly clear for me.


2 weeks to train the model!

Development took on-and-off roughly 2 years to achieve the quality you see today.


Cool! Might want to clarify that. This is crazy impressive!


Got it! Thank you! :D


What are the terms of use for the images generated through your website? I'm guessing any commercial use is forbidden? It would be nice if you could formally spell it out on the website.


I don't think there's any powerful enough way to stop people from generating one and then tracing over it to create their own linework, and customize things like the colouring and shading. The more broadly AI is able to create, the more niche and obfuscated directions human co-creators could take its products in.


Not to mention the copyrightability of AI output isn't legally well tested, and chances are it'll fall in the direction of being copyrighted by the user (i.e. the person clicking in the UI to make a character, which is where creative input happens), not by the AI creator (who has no creative input into the process; they merely made a tool, like any other piece of software - same reason documents printed from Microsoft Word aren't copyrighted by Microsoft).

I'm not entirely sure how much legal weight a ToS on the website would have on what the users do with the output. As I understand it, you could e.g. forbid explicitly using the service/generator for commercial purposes (e.g. during game development), but if someone generates a cool character playing around with no particular commercial objective and then decides post facto to build a media megafranchise out of that character, absent any copyright claim over the image, I don't think there's anything stopping them. They wouldn't even need to trace over it, though if they want new artwork in different poses, they couldn't keep using the AI for that with explicit commercial intent; they'd have to get humans to re-draw it.

Alternatively, a pessimistic view of the interaction between copyright and AI would be that the model is a derivative work of all the training input, and its output also is, and then good luck building a non copyright infringing AI.

IANAL and all that, but it would definitely be legally risky to assume that as the provider of an AI generator you have any control over what users do with the output.


Generally speaking rights protect human's interests, weak AI has no interests yet, so rights are not applicable to it, and do nothing even if applied. And an AI with interests will be quite a bit new kind of creature.


I purchased a waifu from your vending machine (loved the blog post!) at Gen Con in 2019, but can't see the saved model in my account. Is there a way for me to get a v2 generation?


Welcome back!

We're currently working on the data migration from V1! As long as you are using the same email as you did in 2019, you'll be able to see the image again!

As for a V2 generation, sorry, because the models are different, you'll have to discover a similar image again, if you want a V2 version!


Ah that's alright then, thanks for the quick response. I was so happy to see your guys' booth there and own an obscure piece of internet history! https://storage.cloud.google.com/waifus-images/6b94c2ea-51be...


Can't you use projection with the original image as input? Not for an exact copy, of course, but for a similar V2 rendition?


I LOVE that "horror". Reminds me of some of the art I've seen on album/single covers. Any chance of letting people access that kind of intermediate step? (Though I know it's a niche as hell use case).


Ah yes, the fine line between charming anime character and lovecraftian horror

There was such popular demand for these "horror" images that we made them part of the generation in V2! If you refresh enough on the webpage, you can find some horrors!


For anyone looking to do this, here's some I made:

https://i.imgur.com/1V1wPMC.jpg

I rolled around 40 times on the first stage and chose a horror. I skipped the second stage and didn't roll on the third stage, just chose one of the presented details. I skipped the fourth stage because I rolled maybe ~200 times and only saw around 1-2 comparable horrors.

https://i.imgur.com/1vBeg1j.jpg

I rolled around 100 times on the first stage and saw about 3-4 horrors before choosing a horror. I rolled about 70 times on the second stage, didn't find anything interesting, just chose a normal color palette. I rolled maybe 80 times on the third stage and chose a horror, though the results were pretty consistently horrors. I rolled around 60 times on the fourth stage and saw about 1-2 other horrors before choosing a horror. Also, here's what it looked like after the third stage and before finishing the fourth stage:

https://i.imgur.com/ditm8nF.jpg

It's possible the third and fourth stages can produce horrors from normal faces, I didn't check.


With the game you're building, are the character portraits generated once and that's it, or do you plan on making them dynamic or frequently updated?

I've seen a number of mobile games that just get flooded with characters; this tool looks like it could be used to automate that process. It could be combined with AI-generated character profiles as well, creating an 'infinite' character roster in video games.


I wonder what an AI trained to spot deepfake Waifu's will detect.

In humans, things like the pupil can be the give away.

https://www.newscientist.com/article/2289815-ai-can-detect-a...


This is a super interesting question, given that the generator model is trained to fool the discriminator, which is also an AI.


Highlight the pixels with high sharp values. Should be doable.


Why do stuff like this never come down from the web? I'd pay for a program I could download and use with my own image files.


They tend to require specific hardware like a NVIDIA GPU. As well as having an ever evolving large model file which they will want to frequently update. Some tools certainly have had offline versions but I guess not many people are interested in setting it all up and are happy with an instant web ui


While our model is not public, there are good resources online for playing with your own images!

Like this one by fast.ai!

https://docs.fast.ai/vision.gan.html


[flagged]


Except that then anybody could literally just download it and start a competing service saving 2 years of development and hundreds of thousands of $ in compute costs over that time?


No, they couldn’t. And I guarantee you they spent nowhere near that sum to train the model, and that it wouldn’t take two years to clone it.


Same reason the Coca-Cola recipe is not published nor made freely available by the Coca-Cola corporation.


You just need to code up your own model architecture and then train it on your data using some established ML framework. The first step is where well-chosen priors can make a real difference wrt. your end results.


So neat! Where are you based? Boston, I assume?

Is there an email to reach out to you or someone in the team? ($HNusername @ gmail)


San Francisco! Just sent over a ping!


Would you try to create a new style? Train the discriminator on the score tag of danbooru dataset, then use it to rate the generator's style, this way it should be able to create a new style.


Do you plan to provide an API to generate waifu?

I think I could use this for a project.


In the future, perhaps! This is a popular request, so we are thinking about ways we can do this.


Hello and thank you for answering questions. The following is a quote from your article:

>> It is interesting to note that from this process, the AI is not merely learning to copy the works it has seen, but forming high-level (shapes) and low-level (texture) features for constructing original pictures in its own mental representation.

Can you explain what you mean by "mental" representation? Does your system have a mind?

Also, why are you calling it "an AI"? Is it because you think it is an artificial intelligence, say like the robots in science fiction movies? Is it capable of anything else than generating images?


Not OP, but I wonder if the process would be in some way comparable to rigging a 3D model. There is well, you usually have some high-level input parameters, which influence joints on a predefined skeleton, which in turn determines the position of individual vertices in the 3D body. Finally, the 3D shape is used to render the actual pixels.

On each step, high-level parameters are combined with predefined weights to produce a more low-level output.

Seems, a similar transformation is going on here, except that the weights and the structure are somehow learned on its own.


Something I was wondering but couldn't find on the site: What is the license for the generated works through the project?


Who would someone speak with about licensing things made using waifu? My email contact is in my profile...


Is the code or any of the models available to the public? I'd love to mess with this on a local GPU cluster.


Not at the moment! A similar project that I really admire is public, though!

https://www.thiswaifudoesnotexist.net/


The quality and style is mindblowing! What data did you train?


The first iteration of our model was built off of this amazing public dataset:

https://www.gwern.net/Danbooru2020

Though now we have made our own :)


As a rough guess, I think it might be trained on the Danbooru archive dataset, since it's the largest anime picture dataset we can get today.

https://www.gwern.net/Danbooru2020


The more interesting question: What about the sources for the AI to train? How are those artists paid? Do we need to pay them? Or if it used by an AI as train data, we just say: Its like a human learning?


There is nothing that suggests it should be any different than human learning legally. As long as the output is significantly unique, it shouldn’t have any copyright issues.


Does it matter if the artist themselves rejects the idea though?

People were discussing having a "no machine learning clause" back when Copilot was being heavily scrutinized. I wouldn't be surprised if some artists allow republishing but not machine learning use. Plenty of artists already have a clause that prohibits any kind of republishing, and Danbooru is known to rehost some of those artists' content anyway until the artist notices and requests that it be taken down, if ever (for a time, they even allowed paid rewards from Patreon and other subscription services to be republished).

The original dataset from Danbooru probably contained some percentage of content that would not have been there if the original artist had noticed in time.


> Does it matter if the artist themselves rejects the idea though?

Why would it, if there are no copyright issues? No one is obligated to accept a license unless they require permission to do something which copyright would restrict. Of course redistributing the original image as part of a public dataset may be problematic, but simply using it to train an AI model—essentially the equivalent of studying it while teaching yourself to draw—is arguably not among the things covered by copyright, so you don't need a license for that and any clauses in such a license would be irrelevant.

This is also basically educational in nature, even if it's a machine rather than a human being "educated", and educational use is often exempt from copyright in some degree or another to begin with. If the dataset is restricted to non-commercial research and educational use in the right jurisdictions then even redistribution may not be an issue.


A human artist can purchase material or access freely available material, look at it, learn from it and then draw his own.

An AI could do the same.


Exactly, human artists browse through vast number of resources, then call it "inspiration".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: