Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Welcome to Waifu Labs v2: How Do AIs Create? (waifulabs.com)
455 points by Cixelyn on Jan 12, 2022 | hide | past | favorite | 226 comments


Hey HN, one of the team members here!

I hope you all enjoy playing with the new and improved generator! We've been hard at work improving the model quality since the last time the site was posted[1]

As both a professional fantasy illustrator & software engineer, I find the concept of AI creativity so fascinating. On one hand, I know that mathematically AI only can hallucinate images that fit within the distribution of things that it's seen. But from the artist perspective, the model's ability to blend two existing styles into something so distinctly new is so incredible (and not to mention also commercially useful!)

Anyways, happy to answer any question, thoughts, or concerns!

---

[1] https://news.ycombinator.com/item?id=20511459]


Naïvely I thought Waifu generator was just “some guy having a laugh” fine-tuning a model off of hugging face, but reading through the comments here, it is obviously a much, much bigger enterprise.

Can you talk a little about team size, work process, funding and revenue stream? I think the effort required for such an undertaking is vastly underestimated by readers.


Right now it's a small team of 6 people, and we have a bit of funding + compute credits to train models. There's a bit of revenue from some past projects and AI-consulting, but we're mostly betting big on our new AI-powerd mobile title Arrowmancer[1].

> I think the effort required for such an undertaking is vastly underestimated by readers.

Haha for sure. Hosting a real-time ML model for people to do sub 1-second inferences at HN-load scale is definitely nontrivial.

[1] https://arrowmancer.com


That seems like a nice project that you're working on, definitely more high effort than some other attempts at generated art (procedurally or otherwise).

I can't help but to feel that this would be a better fit for the fad of NFTs as well, as opposed to ugly monkeys or other asset flips that were pretty obvious cash grabs.

Either way, good luck!


I'm guessing your referring to the "one of a kind" part of NFTs, which made me wonder, how would they validate that the created Waifu hasn't been created before?

The question is, would people buy, since there are near endless, and a huge difference is that these don't have a story attached to them...


> I'm guessing your referring to the "one of a kind" part of NFTs, which made me wonder, how would they validate that the created Waifu hasn't been created before?

Oh, not even that, since while some data ends up on the blockchain, i find the whole ownership concept a bit nebulous at best, much like how people mock the whole "oh, just right click the image to save it" thing.

What i was actually referring to was more along the lines of the collector economy idea itself - people willing to exchange their monetary resources for something that they enjoy. Having it be out of a sense of endearment and aesthetic enjoyment (waifus and husbandos) rather than the desire to flip those things (possibly a number of other NFTs or collectibles) seems to be less morally questionable to me!

While i'm not really familiar with the whole NFT space, to me it seems that projects like CryptoKitties are more agreeable and thus viable in this space, as would AI generated images of cute characters! Actually, that seems like a really good fit to me, regardless of how the actual "uniqueness" or "ownership" aspect would play out.


Not to get too deep into NFTs as I to aren't an expert, but from what I understand the images that are used in NFTs are hosted on from a server and the owner of the NFT doesn't own the server, thus do they really own the image?

I too like to collect things, my latest obsession was lingerie. But I always have the image of extreme hoarders in the back of my mind, which quite often scares me out of collecting to much.

I'm not scared of having too much stuff, I'm scared of not being able to notice it myself.

NFTs could ease that problem. But sadly I'm one of those people that prefer analog radio over digital. Wired over wireless.


The smart contract can do a content addressable way of identifying the image, with e.g. IPFS, and then, while in practice there may be a server which is hosting it for people, if anyone has the image saved, they can (if they choose to) continue act as a host for the image of the original server stops being around.

Not, uh, that NFTs for images like this aren’t silly and largely pointless,

but at least for some of them, the “there’s a server hosting the image” isn’t that much of an issue (at least provided that the person who “owns it” keeps the file locally and backed up)


Probably for it to work you would have to limit them and have a set of hand selected images. Seemingly people will buy anything if you can hype it enough.


I guess you could limit it with a colour palette and that way give yourself an option to bring special colour palette images in the future, for special events or occasions.


This sounds a lot like what games like Counter Strike: Global Offensive do with their cosmetic item skins, where they have rarity for each and also specific packages with themes and whatnot.

Now that's a fun business idea (ethically questionable gambling related aspects aside, guess it's about how one markets that and how honest they are).


Waifulabs is certainly a good marketing instrument for Arrowmancer. And regardless of whether the game gains traction it will be a great showcase in case you ever want to offer this as an API for other mobile game creators.


> Naïvely I thought Waifu generator was just “some guy having a laugh”

same here. what's naive about it?

not to badmouth the undertaking, but wtf is this doing on HN?


Apparently it takes 6 people making a business to run a waifu generator. That pretty far from one person doing it as a joke.


well, apparently...

i sincerely applaud the creativity of establishing such a business


Firstly, amazing work.

My question is, how do you figure out how to parameterize "Same character, different pose" / "Same character, different eyes" / "Same character, different gender" / etc?

My (super limited) understanding of GANs is that they slowly discover these features over time simply from observation in the data set, and not from any labels.

So how could you make e.x. a slider for head position, style, pose, etc? How do you look at the resulting model and figure out "these are the inputs we have to fiddle with to make it use a certain pose"?

You mention it a bit in this section, but I didn't fully understand: "By isolating the vectors that control certain features, we can create results like different pose, same character"

And I assume the same step needs to be done every time the model is retrained or fine-tuned, because possibly the vectors have shifted within the model since they are not fixed by design?


Yes, your understanding is correct!

You can think of it like coordinates on a many-dimensional vector grid.

We craft the functions the functions that will illuminate sets of those points based on a combination of observation, what we know about our model architecture, and how our data is arranged.

And yes, when the model is retrained, we have to discover them again!


Can you share any resources for reading on this particular topic?


Not affiliated with this project, but there is a gazillion different variations of GANs. Most just change the adversarial loss to improve the learning rate / quality, but others focus on architectural changes, such as StarGAN, Pix2pix (conditional GAN), CycleGAN, MUNIT, etc. It's really a fascinating field.


Roughly speaking how much money did you invest into making this? Just curious if this is something an indie hacker can hope to do one day OR do you need some deep pockets to make a site like this?


Fascinating... Thanks for sharing

A couple questions:

1) I didn't really understand how you went about identifying what vectors of the latent space stand for various things, like pose or color. Did you train one of the AIs to that effect, or did you manually inspect a bunch of vectors, twiddling through them one by one, did to the outcome?

2) If one were to train an AI to the same level using commodity cloud services, what's the order of magnitude cost that you would pay for the training? More like $100, $1,000, $10,000 or $100,000?


1) It was mostly manual, though AIs were useful in certain filtering tasks.

2) Depends on the quality you are seeking. If you only want one run of a similar, off-the-shelf model, around the 1000s is enough. But at the number of iterations you have to run to build your own and improve results, you probably need about 100k.

To tackle this problem, we built our own supercomputer off of parts we bought off of ebay, though I can't say I recommend that route, because it now lives in our living room.


I think that requires one more blog post.


Very curious about the computer, what are the internals?


Not the OP, but my guess would be threadrippers (or similar w/lots of PCIe lanes), each with a great number of GPUs. That's usually what you'd do for training AI in a home lab.

Server processors gets you more bang for the buck... iff you're planning to run the hardware flat out for literally years. You save on power, but the up-front cost takes up most of that, so for a system that's mostly idle you wouldn't use them. On the other hand, any CPU with fewer PCIe lanes than a TR won't be able to run multiple GPUs optimally, and TRs are relatively cheap enough to make the reduction in PSUs/chassises worth it.

Not to mention that there are some approaches to training you can only use if you have multiple GPUs on the same motherboard, aka. sharding a single model across GPUs without communications overhead killing any benefit of that.


Will probably blog probably in a week or two.

But the tldr; is a lot of scavenged EPYCs and nvidia GPUs all in a large sound-proof rack.


You mention it took two weeks to get to the point that we see in the article.

Does this mean two weeks of development, or two weeks to generate the images we're seeing? Or maybe did you train the model for two weeks? That point just wasn't exactly clear for me.


2 weeks to train the model!

Development took on-and-off roughly 2 years to achieve the quality you see today.


Cool! Might want to clarify that. This is crazy impressive!


Got it! Thank you! :D


What are the terms of use for the images generated through your website? I'm guessing any commercial use is forbidden? It would be nice if you could formally spell it out on the website.


I don't think there's any powerful enough way to stop people from generating one and then tracing over it to create their own linework, and customize things like the colouring and shading. The more broadly AI is able to create, the more niche and obfuscated directions human co-creators could take its products in.


Not to mention the copyrightability of AI output isn't legally well tested, and chances are it'll fall in the direction of being copyrighted by the user (i.e. the person clicking in the UI to make a character, which is where creative input happens), not by the AI creator (who has no creative input into the process; they merely made a tool, like any other piece of software - same reason documents printed from Microsoft Word aren't copyrighted by Microsoft).

I'm not entirely sure how much legal weight a ToS on the website would have on what the users do with the output. As I understand it, you could e.g. forbid explicitly using the service/generator for commercial purposes (e.g. during game development), but if someone generates a cool character playing around with no particular commercial objective and then decides post facto to build a media megafranchise out of that character, absent any copyright claim over the image, I don't think there's anything stopping them. They wouldn't even need to trace over it, though if they want new artwork in different poses, they couldn't keep using the AI for that with explicit commercial intent; they'd have to get humans to re-draw it.

Alternatively, a pessimistic view of the interaction between copyright and AI would be that the model is a derivative work of all the training input, and its output also is, and then good luck building a non copyright infringing AI.

IANAL and all that, but it would definitely be legally risky to assume that as the provider of an AI generator you have any control over what users do with the output.


Generally speaking rights protect human's interests, weak AI has no interests yet, so rights are not applicable to it, and do nothing even if applied. And an AI with interests will be quite a bit new kind of creature.


I purchased a waifu from your vending machine (loved the blog post!) at Gen Con in 2019, but can't see the saved model in my account. Is there a way for me to get a v2 generation?


Welcome back!

We're currently working on the data migration from V1! As long as you are using the same email as you did in 2019, you'll be able to see the image again!

As for a V2 generation, sorry, because the models are different, you'll have to discover a similar image again, if you want a V2 version!


Ah that's alright then, thanks for the quick response. I was so happy to see your guys' booth there and own an obscure piece of internet history! https://storage.cloud.google.com/waifus-images/6b94c2ea-51be...


Can't you use projection with the original image as input? Not for an exact copy, of course, but for a similar V2 rendition?


I LOVE that "horror". Reminds me of some of the art I've seen on album/single covers. Any chance of letting people access that kind of intermediate step? (Though I know it's a niche as hell use case).


Ah yes, the fine line between charming anime character and lovecraftian horror

There was such popular demand for these "horror" images that we made them part of the generation in V2! If you refresh enough on the webpage, you can find some horrors!


For anyone looking to do this, here's some I made:

https://i.imgur.com/1V1wPMC.jpg

I rolled around 40 times on the first stage and chose a horror. I skipped the second stage and didn't roll on the third stage, just chose one of the presented details. I skipped the fourth stage because I rolled maybe ~200 times and only saw around 1-2 comparable horrors.

https://i.imgur.com/1vBeg1j.jpg

I rolled around 100 times on the first stage and saw about 3-4 horrors before choosing a horror. I rolled about 70 times on the second stage, didn't find anything interesting, just chose a normal color palette. I rolled maybe 80 times on the third stage and chose a horror, though the results were pretty consistently horrors. I rolled around 60 times on the fourth stage and saw about 1-2 other horrors before choosing a horror. Also, here's what it looked like after the third stage and before finishing the fourth stage:

https://i.imgur.com/ditm8nF.jpg

It's possible the third and fourth stages can produce horrors from normal faces, I didn't check.


With the game you're building, are the character portraits generated once and that's it, or do you plan on making them dynamic or frequently updated?

I've seen a number of mobile games that just get flooded with characters; this tool looks like it could be used to automate that process. It could be combined with AI-generated character profiles as well, creating an 'infinite' character roster in video games.


I wonder what an AI trained to spot deepfake Waifu's will detect.

In humans, things like the pupil can be the give away.

https://www.newscientist.com/article/2289815-ai-can-detect-a...


This is a super interesting question, given that the generator model is trained to fool the discriminator, which is also an AI.


Highlight the pixels with high sharp values. Should be doable.


Why do stuff like this never come down from the web? I'd pay for a program I could download and use with my own image files.


They tend to require specific hardware like a NVIDIA GPU. As well as having an ever evolving large model file which they will want to frequently update. Some tools certainly have had offline versions but I guess not many people are interested in setting it all up and are happy with an instant web ui


While our model is not public, there are good resources online for playing with your own images!

Like this one by fast.ai!

https://docs.fast.ai/vision.gan.html


[flagged]


Except that then anybody could literally just download it and start a competing service saving 2 years of development and hundreds of thousands of $ in compute costs over that time?


No, they couldn’t. And I guarantee you they spent nowhere near that sum to train the model, and that it wouldn’t take two years to clone it.


Same reason the Coca-Cola recipe is not published nor made freely available by the Coca-Cola corporation.


You just need to code up your own model architecture and then train it on your data using some established ML framework. The first step is where well-chosen priors can make a real difference wrt. your end results.


So neat! Where are you based? Boston, I assume?

Is there an email to reach out to you or someone in the team? ($HNusername @ gmail)


San Francisco! Just sent over a ping!


Would you try to create a new style? Train the discriminator on the score tag of danbooru dataset, then use it to rate the generator's style, this way it should be able to create a new style.


Do you plan to provide an API to generate waifu?

I think I could use this for a project.


In the future, perhaps! This is a popular request, so we are thinking about ways we can do this.


Hello and thank you for answering questions. The following is a quote from your article:

>> It is interesting to note that from this process, the AI is not merely learning to copy the works it has seen, but forming high-level (shapes) and low-level (texture) features for constructing original pictures in its own mental representation.

Can you explain what you mean by "mental" representation? Does your system have a mind?

Also, why are you calling it "an AI"? Is it because you think it is an artificial intelligence, say like the robots in science fiction movies? Is it capable of anything else than generating images?


Not OP, but I wonder if the process would be in some way comparable to rigging a 3D model. There is well, you usually have some high-level input parameters, which influence joints on a predefined skeleton, which in turn determines the position of individual vertices in the 3D body. Finally, the 3D shape is used to render the actual pixels.

On each step, high-level parameters are combined with predefined weights to produce a more low-level output.

Seems, a similar transformation is going on here, except that the weights and the structure are somehow learned on its own.


Something I was wondering but couldn't find on the site: What is the license for the generated works through the project?


Who would someone speak with about licensing things made using waifu? My email contact is in my profile...


Is the code or any of the models available to the public? I'd love to mess with this on a local GPU cluster.


Not at the moment! A similar project that I really admire is public, though!

https://www.thiswaifudoesnotexist.net/


The quality and style is mindblowing! What data did you train?


The first iteration of our model was built off of this amazing public dataset:

https://www.gwern.net/Danbooru2020

Though now we have made our own :)


As a rough guess, I think it might be trained on the Danbooru archive dataset, since it's the largest anime picture dataset we can get today.

https://www.gwern.net/Danbooru2020


The more interesting question: What about the sources for the AI to train? How are those artists paid? Do we need to pay them? Or if it used by an AI as train data, we just say: Its like a human learning?


There is nothing that suggests it should be any different than human learning legally. As long as the output is significantly unique, it shouldn’t have any copyright issues.


Does it matter if the artist themselves rejects the idea though?

People were discussing having a "no machine learning clause" back when Copilot was being heavily scrutinized. I wouldn't be surprised if some artists allow republishing but not machine learning use. Plenty of artists already have a clause that prohibits any kind of republishing, and Danbooru is known to rehost some of those artists' content anyway until the artist notices and requests that it be taken down, if ever (for a time, they even allowed paid rewards from Patreon and other subscription services to be republished).

The original dataset from Danbooru probably contained some percentage of content that would not have been there if the original artist had noticed in time.


> Does it matter if the artist themselves rejects the idea though?

Why would it, if there are no copyright issues? No one is obligated to accept a license unless they require permission to do something which copyright would restrict. Of course redistributing the original image as part of a public dataset may be problematic, but simply using it to train an AI model—essentially the equivalent of studying it while teaching yourself to draw—is arguably not among the things covered by copyright, so you don't need a license for that and any clauses in such a license would be irrelevant.

This is also basically educational in nature, even if it's a machine rather than a human being "educated", and educational use is often exempt from copyright in some degree or another to begin with. If the dataset is restricted to non-commercial research and educational use in the right jurisdictions then even redistribution may not be an issue.


A human artist can purchase material or access freely available material, look at it, learn from it and then draw his own.

An AI could do the same.


Exactly, human artists browse through vast number of resources, then call it "inspiration".


Links to related projects in anime art generation for those interested:

Waifu Labs v2, referenced in this post (generate amazing custom anime face images): https://waifulabs.com (write-up is the above link: https://waifulabs.com/blog/ai-creativity)

This Anime Does Not Exist (AI-generated anime-style artwork): https://thisanimedoesnotexist.ai (write-up https://www.gwern.net/Faces#extended-stylegan2-danbooru2019-... and https://nearcyan.com/this-anime-does-not-exist)

This Waifu Does Not Exist (AI-generated anime-style faces): https://thiswaifudoesnotexist.net (write-up: https://www.gwern.net/Faces#twdne)

There's also a lot of literate on e.g. automatic manga coloration, auto-translation, image superresolution, anime frame interpolation, and much more. Worth checking out some places like https://old.reddit.com/r/AnimeResearch/ if you're interested!



Note that TADNE, being based on newer models, generates full images rather than only faces.


This is extremely impressive. It’s the first GAN I have seen which lets you tweak the result in a meaningful way rather than being just random.

I think the speed that GANs have come in to the world has really shaken people up and it’s hard to process what this all means and what it will result in. Especially the ones which generate based on real people.

But the feeling this gives me, is what happens for the future of art. Sure, this example is no where even close to replacing real artists, but it’s already generating images better than I can draw after a year of practice. It does give me a feeling of “what is the point”. Which might be an irrational feeling, but I’m sure others feel the same.


Haha, this is something that constantly ponder, while I am developing this product.

Though, the conclusion I've come to is that that hand-drawn art will always meaningful for humans, because it is born of the human experience.

An interesting example is the invention of photography, which at its time, was very good at doing the thing artists were doing back then (capturing likenesses)

But photography didn't replace art: instead, artists now use photographs to be more expressive, convincing, and make better art. In tandem, the widespread adoption of photography meant that more average folks could get their likenesses taken!

Personally, my skills as an artist has improved by quite a bit, after launching this product, purely because observing it offers some fascinating insights into how anime is created!

I hope that as an industry, we'll find better ways to create, and what we know to be the "best" art today will be even better in the future!


One thing to consider is the number of artists employed by businesses. I dont have actual numbers but i know video game companies employs many. Businesses have different values than an individual human, and won't care if the art is made by a computer if its cheaper and passable. Theres also huge market for NSFW art. Whos going to know who or what made the image?

Comparing photography to hand drawn art is silly. They are two different mediums.

Your company could be the first to capture the market. I guess if you can sleep with the consequences of your work,who cares? Im not judging because if its not you, it will be someone else.

Personally i think we as a society need to step back and press pause and really consider the consequences of this technology, and even existing technologies.

If you become rich, could you set up a charity for all the future starving artists, if that future comes to pass? I dont want to live in a world where theres no room for human creativity.

Not an artist, just a concerned human.


It's no longer art if it's for commercial purposes.

Those who have the courage necessary to become artists, and renounce the vulgarity of the world, will continue to do so.

Those who delude themselves into thinking they're creating anything while being employed in commerce, will be managed out.

The deep crevice where the two meet and manage to find compromise, will continue to be filled by wealthy, independent patrons.

Asking others to think and do as we wish is silly.

Ironically, if it's that important to you, why don't you start giving monetary support directly to artists? Changing one's own actions is more impactful than trying to change those of the many (and the prior is more likely to lead to the latter, than if one were to focus solely on the latter).


I think your definition of art is just a partial one. As an example, the characters drawn for an RPG are works of art. They are receiving monetary value in return for that art. So its commercial/entertainment art, and a place where many artists aspire to be.

Your definition is what I would call "culture defining art", which is art that some part of the culture identifies with (or more specifically, a person's way of communicating that they can identify with). The currency here is tribalism, i.e. it creates a way for two or more people to bond together through what they feel and think.

>Those who have the courage necessary to become artists, and renounce the vulgarity of the world, will continue to do so.

Courage is trumped by needs. If they need pay rent, buy food, support a family, pay for a car, etc, then no matter what they are sacrificing some part of their time in order to obtain those things. Thus any artist who can make money off their work would have more time for their work, and possibly grow faster.

>Those who delude themselves into thinking they're creating anything while being employed in commerce, will be managed out.

Seems like you're too attached to the idea of what an artist is and isnt?

>Ironically, if it's that important to you, why don't you start giving monetary support directly to artists?

Because Im not building something that is taking away from their dreams (e.g. living off their work/passion).

<side thoughts> I wonder if people are aware of the consequences of automating creativity? IMO humans need human input in order to stay human. The less and less we come into contact with humanity, the less human we'll become.. and at the very end of that long path is a bag of chemical reactions that's forgotten the meaning of "how are you?" [1]

Which made me think, perhaps its the inefficiencies of life that is what makes us human

[1] This is because some company/companies will realize/have realized that tuning the machine to become most efficient at creating what the masses want will be the most profitable path.


That's not my definition of art. That's your take; and you're projecting your (mid-brow) sensibilities onto me.

I don't define art. It's a sense, not a logical box you can put things in.

Receiving money for your art is one thing; going out of your way to use it as a means of living is another. The work immediately becomes tainted, and is no longer art.

It could be an amazing piece, but if your line of work is receiving money for what you create, you're an artisan, not an artist.

A character drawn for an RPG is not art. It is not a work of art. It is a graphic designed for utility. That is all it will ever be.

The sublime nature of art is there because it transcends everyday vulgarity. One transcends mere personage and becomes an artist by being in the world, but not of it.

The more money an artisan makes, the more his craft suffers. He almost always improves his technical ability through this process (otherwise, he would not make money), but loses his soul, and will never be an artist. He does not have the fiber in his heart that allows one to suffer through all manner of anguish, and material poverty, to dedicate oneself towards something above oneself; so he settles for being an artisan.

I can understand not being educated on these matters. But the amount of misplaced confidence you carry, writing on things you know nothing about is detestable.

If your inquiries into the nature of humanity and what it means to be are genuine (and not mis-attributed self-importance), my recommendation is to read and listen more, and talk less.

Matthew B. Crawford's works are a decent bridge into all that, for the modern middle crust who feels something stirring in his soul, and needs a direction.

If you feel like your assessment of your own abilities is honest, then I would completely skip anything modern, and begin with Burke's A philosophical enquiry into the origin of our ideas of the sublime and beautiful. I will even buy you an unabridged copy and have it shipped to you, if you're a starving artist that cannot afford it (and my respect for you would increase, all the same).


If interested in an other project that offers good tweaking, artbreeder (artbreeder.com) has been around for a while too. You have different models to play with (faces, anime, general, buildings, paintings etc...). Not all of them have the same level of quality and customization, but for example if you go for the face one, you get a really high level of customization: "cross-breeding" of 1 to 8 faces and then tweaking of many attributes (gender, color theme, age, ethnicity, realism, and other various details).


Art isn't about statical copying or drawing well. It's about human expression.


These generated pieces look extremely similar to real human art so they look very expressive to me. And you can even add your own input by hitting refresh until you get something similar to what you wanted.


> And you can even add your own input by hitting refresh until you get something similar to what you wanted.

that is the exact opposite of creation.

I'd put it this way: this is the IKEA of drawing.


I hear what you're saying, but at the same time, there's a ton of mass-produced, low-creativity art out there, where the objective is not to express oneself but to churn out products with the intent to make money. This is not just in artwork, but in other design based things like e.g. website designs as well.


There is definitely a difference between art and mass-produced art. In the beginning, we only had art as an expression of human creativity, and it's not until later that we got mass-produced art, whose only goal is to provide profits for the author(s). It'd be useful to distinguish the two in conversations like this.


It doesn't seem far fetched that in the near future an "artist" doesn't need to be able to draw at all. If GANs or the like become sufficiently tunable, I can express myself by telling the GAN what I would like to have drawn.


That's certainly a possibility, but is that intelligent or are paintbrushes already intelligent? :) In other words, in that case, it would simply be a tool directed by an intelligent being. We have that in many senses with generative and other computer-based art.


Well, digital art already allows you to copy/paste a layer and stuff that traditional art never allowed for. Imagine copy pasting a statue.

This is sort of the same thing on steroids. You can copy/remix previous art by feeding them into a ML model in training mode, and it will be massively utilized the same way ctrl-c ctrl-v is used, but it's a part of the toolset of art creation, not replacing it.


> You can copy/remix previous art by feeding them into a ML model in training mode, and it will be massively utilized the same way ctrl-c ctrl-v is used,

Then you need attributions for said previous arts, all of it, at least going by texts of laws.


No you don’t. Otherwise human artists would have to list out hundreds of attributions for every bit of art they saw that influenced their style.


Ctrl-C Ctrl-V is a career killer for human artists. Why it's not for ML?

e: the argument for human artists is human inspirations are generalized and artists’ own interpretations are always added to it, and for ML it’s that their network weights are highly generalized and likewise not a copy; but IMO a more solid line has to be drawn and distinction has to be made between C-c C-v and generalization at some point forward.


> Ctrl-C Ctrl-V is a career killer for human artists. Why it's not for ML?

Human artists use copy/paste as a tool all the time, and it doesn't kill their careers. The element you're missing here is that jychang wasn't talking about copying pieces verbatim from others' work without permission & attribution, but rather using it as a tool to manipulate your own art ("to copy/paste a layer"). Imagine being able to do the same but with an AI model working behind the scenes to adjust the object to fit the scene it's being pasted into.


> But the feeling this gives me, is what happens for the future of art. Sure, this example is no where even close to replacing real artists, but it’s already generating images better than I can draw after a year of practice. It does give me a feeling of “what is the point”. Which might be an irrational feeling, but I’m sure others feel the same.

There's a similar situation ongoing with fiction writing, by way of NovelAI. (And some competitors, but NovelAI is head and shoulders ahead of the pack. Thankfully; they seem to be the nicest of the lot.)

I'm a fairly prolific (fan-)fiction writer, and also AI enthusiast, so of course I jumped on that bandwagon as soon as I could. What I've found is...

- AI cannot write stories on its own. It just can't, full stop. Some people try, including me, but the results are nonsensical without significant tweaking. I expect that to change eventually, but not without a conceptual breakthrough or two.

- AI is immensely useful as a prosthetic imagination.

What I use it for isn't to write the story for me. It's to, in case I ever get stuck at some point, offer me suggestions for how the story can continue -- suggestions that I can accept or deny. Even if I deny it, it's useful as a way of illuminating my own ideas for the story. There's got to be a reason I don't like that continuation, and that is often enough to think of something I do like.

In other words, it's mostly eliminated writer's block.

It's also handy for expanding my vocabulary. English is my third language, and while I like to think I'm good enough for daily life -- I've lived in Ireland for over a decade, after all -- there's a big difference between 'good enough for daily life' and 'good enough to write good fiction'. Prior to using NovelAI, my writing was... dry. Conceptually heavy SF doesn't necessarily require high-end wordcrafting, but it helps.

The AI, especially when told to emulate Sheridan Le Fanu or any of the other great authors, is better than me at this. And since I can ask it to jump in at any point, it's become the most attentive, capable cowriter I've ever had. Perhaps noticing this, NovelAI now calls their default AI tuning 'Co-Writer'.

It's still likely to write something I can't immediately use, but that just means I need to absorb its ideas and make them my own. Repeat a hundred times per day, and I end up learning much, much faster than I ever did when I was writing on my own.

To summarize, I don't use AI to write my stories for me. I use it to get better at writing.

I think it should be possible to do the same for other forms of art.


We have been here before, and we certainly will again.

It was not so long ago computers bet humans at chess, yet people still play.


I've been thinking about this a bit lately, incomplete thoughts ahead...

Yes, people still play, but they no longer create.

With the exception of Adversarial attacks on particular algorithms, no human is creating new Chess theory, discovering new openings, for example.

As a game, challenge, competition, social activity, chess is alive and well.

As a creative endeavour, or vehicle for discovery, Chess is solved. It is no longer an art of its own.

We're part way through this transition now with Go as well. New opening theory, new joseki, new strategies are being played by robots, and at the highest professional levels we are playing catch-up to understand.


> no human is creating new Chess theory, discovering new openings, for example.

https://en.wikipedia.org/wiki/Bongcloud_Attack


But isn’t that inevitably the final form of any highly complex tactics game (in lieu of a better word) humans create? I guess we reached the ceiling of what we can fathom on our own, so the logical next step is to hand over to robots that can grasp a wider range of possibilities, and learn from them.

I once heard someone say people will never be able to understand God’s plan, just as a dog will never be able to really understand why their owners do things the way they do them. I feel like this is a similar threshold; a humans mind is incredible, but not perfectly suited for thinking in advance, or calculating probabilities.

So I guess my point is, while this certainly feels a little scary, I think it’s just a consequence of the game, and probably okay.


> With the exception of Adversarial attacks on particular algorithms, no human is creating new Chess theory, discovering new openings, for example.

That's not true though? Human chessplayers are constantly exploring new lines, and engines are a useful tool but by no means the bottom line - indeed lines that were neglected because engines gave them low evaluations have been a fruitful source of ideas lately, because humans play differently from engines and may not be able to find the refutation.


Yeah I totally get this is the same thing that has happened so many times before. But at least for chess, it was never a practical thing and was a competition. Like how the existence of aimbots does not ruin a game as long as you are playing against other humans.

For art it feels a bit different since it’s not competitive and more a practicality thing. Perhaps art will shift from placing individual strokes on an image and move to making creative directions for AI to resolve in to an image or enable more people to create labor intensive works like animation.


Or with the internet , Linux and open source. Suddenly you could downloads 1000s of apps, most much better, what I ever could do byself.


hahaha. Now everyone will get a piece of my existential ennui. Welcome to the club!


This isn't really creativity is it? I like to call stuff like this statistical copying, and indeed, the linked Wikipedia article on GANs says:

> Given a training set, this technique learns to generate new data with the same statistics as the training set.

There isn't a creative process here nor any creative introspection going on. While the technical results are impressive, this article does not address creativity even superficially, and just slaps the label on. There isn't any AI either. It's machine learning, i.e., statistical models and algorithms.


What makes you think human creativity is anything more than statistical modelling? What else can it be, based as it is in a physical substrate? Objections like yours are usually a sneaky way of implying that human consciousness breaks free of the physical world, cartesian dualism hidden within people's arguments.

"It cannot be creative because it's only bits/cogs/linear algebra/etc/etc." Well describe to me the way it is different to the processes of the human brain? "There is some magic sauce in the human brain we do not yet understand!", well then how do you know that this magic sauce does not exist within the statistical models inside the computer?

I find it very irritating that such shallow reasoning prevails amongst intelligent people.


>> What makes you think human creativity is anything more than statistical modelling?

For example, humans don't need to see millions of examples of waifu before they can draw their own.

Also, humans can draw in different styles, including novel styles that look nothing like styles they have seen before. Statistical models like GANs can only draw in styles similar to the ones in their training sets.

Statistical modelling can only represent the data in a training dataset and is incapable of novelty. Humans are capable of novelty.

>> Well describe to me the way it is different to the processes of the human brain?

We haven't created the human brain and it's very unlikely it uses a technology we understand, like linear algebra.


> For example, humans don't need to see millions of examples of waifu before they can draw their own.

Humans have the advantage of being trained for years on a far larger and more generalized dataset before they're asked to draw anything.

> Statistical models like GANs can only draw in styles similar to the ones in their training sets.

The intermediate states in the article's example seemed to contain a number of novel styles to me. The final results are filtered for conformance to a specific style due to the nature of the project, but that doesn't mean the GAN is incapable of drawing in other styles. (Of course, most of these alternate styles are not particularly appealing to humans, but the same is true for many of the experimental styles invented by human artists.)

> Statistical modelling can only represent the data in a training dataset and is incapable of novelty.

These AI models are generating images which never existed before, and which were not in their datasets. How is that not novelty?


>> Humans have the advantage of being trained for years on a far larger and more generalized dataset before they're asked to draw anything.

That's a big assumption wrapped up in an over-wrought analogy. Humans don't "train" in the sene that statistical models, or neural nets, are trained. We don't have any clear supervision for example, no ground truth. And we don't need examples of exactly the things we learn, to learn them. For instance, nobody ever saw an example of a manga character before the first manga character was drawn. And yet, someone drew it.

>> These AI models are generating images which never existed before, and which were not in their datasets. How is that not novelty?

How I like to think of it, which is a bit of a fudge, is that neural nets learn to convert each of their input images into a connect-the-dot puzzle (the "dots" are the data points in a very high-dimensional space that encompasses the pixels of all their training images; like I say, it's a bit of a fudge). Every new training image gets its own connect-the-dot puzzle superimposed on those of all previous images. Once training is done, you can ask the trained model to generate new images and it basically puts its pen down on a dot, and starts drawing a line. What dot comes next depends on timey-wimey model-probabilities. Obviously, in that way, it can't draw a line to a dot outside the big network of superimposed connect-and-dot puzzles it has put together. Such outside-context dots don't exist for the model, in any real sense. So it can only create images that exist within that puzzle.

In truth, the puzzle, i.e. the trained model, is a dense region of cartesian space (a manifold). What comes out of the model must already exist in that manifold, so it must be a variation, or combination, of the training images used to construct the manifold.

Which means, it can't innovate. So for instance, you can't expect to train it on images of manga characters and find that it now draws you in the style of Michelangelo. That's what I mean. Of course you'll see images that are not exactly the images you put in, but you won't see images that are very different from the ones you put in. It is, in a very concrete sense, a very limited ability to generate new images.


> Humans don't "train" in the sene that statistical models, or neural nets, are trained. We don't have any clear supervision for example, no ground truth.

GANs are a form of unsupervised learning. They don't have "ground truth" either, just lots of existing images which they learn to imitate and to distinguish from other kinds of images not present in the training set. Similarly, humans learn to distinguish natural images from unnatural ones starting from birth, and use that learned feedback to filter the images produced by our imaginations: a natural example of a GAN. Our input is less… focused, and includes non-visual elements, and there are of course other aspects to general intelligence besides visual processing and imagination, but in this area at least we operate on the same basic principles.

> So for instance, you can't expect to train it on images of manga characters and find that it now draws you in the style of Michelangelo. That's what I mean.

Are we talking about GANs here, or humans? A human trained exclusively on manga wouldn't suddenly develop the ability to imitate Michelangelo either. On the other hand, a GAN trained on manga may sometimes produce images which are not recognizably part of the manga style—which could be seen as an entirely new style. (It would help the process along if you included non-manga images in the training set, as a human would have access to those as well. Then different styles of the same scene just become one more dimension in your "manifold" of all possible images.)

Inventing and learning to draw in a new style isn't something that comes spontaneously to humans. It takes a lot of practice both learning what makes the style distinctive and learning to create art in the new style. A GAN has most of the basic elements required to do the same, but we generally don't use it that way. An interesting experiment might be to permute the discriminator to favor specific elements which were not common in the training set and then train the generator to satisfy the altered discriminator.

> Which means, it can't innovate.

What exactly do you mean by "innovate"? To me the word implies intent, which is clearly out of scope for a mere GAN. Intentional behavior would put it in the domain of an artificial general intelligence or AGI. However, generating images which aren't in the training set is just a matter of choosing a point on the "manifold" which doesn't correspond to any of the input images. Though expecting the GAN to spontaneously invent a distinctive and consistent new style which appeals to humans, without being one itself or otherwise being trained in what humans might find appealing, is a bit much IMHO.

The biggest difference remains the fact that this GAN only has manga for its input, which limits its ability to produce anything outside that context. Its whole life is manga and nothing else. Humans have the same issue with creating things completely unrelated to any prior experience, but they have a much larger and more varied pool of experiences to draw from. (And even then humans can easily get stuck in one particular style and find it difficult to change.)


I'm sorry for the confusion I caused with my inexact terminology. What I mean about "ground truth" in the context of this conversation is the images that GANs are trained to reproduce. Supervision doesn't need to come in the form of labels. GANs are weakly supervised but they are given examples of exactly what they need to model. They are trained to reproduce those examples and like you say, they can't be expected to learn to do anything else.

This is a general rule about neural networks, as we have them today: they learn to reproduce their training set. Nothing more, and nothing less.

Humans, now, don't need to see examples of a thing before we can make one. If that were the case, we would never have created all the technology we have, of which there was no previous example. For instance, at some point in our history someone figured out how to carve a hand axe for the first time, ever. That person didn't have any examples to go by. There were no such objects in nature, before that time. Certainly that person had some idea of concepts such as "sharp" or "pointy" or who knows what else, but they had no blueprint for a hand axe. This is what I mean by "innovation".

"Inventing and learning to draw in a new style" is absolutely something that comes spontaneously to humans! That's the entire history of human art: people inventing new ways to express themselves through various art forms. Art would be way too dull if nobody could come up with new things.

But I certainly agree that it's unfair to expect the same kind of innovation from GANs or from other neural nets. However, I think that's the case because neural nets are nothing like humans. But if I understand correctly, you're claiming that how the human mind works and how neural networks, work, is very similar, so I'm confused a bit because in that case you should expect them to have the same abilities as humans do. Sorry if I misunderstand you, but could you clarify? If human creativity is statistical modelling and GANs do statistical modelling (they kiiind of do) then we should expect GANs to be able to do everything that humans can do, no?


> What else can it be, based as it is in a physical substrate?

This is no less shallow reasoning. The question of whether the academic field of statistical modelling already contains the necessary ideas to produce strong AI is not decided, and won't be unless/until somebody makes a strong AI. People have different intuitions about what the answer will be and until it can be determined empirically I suggest treating them as what they are: intuitions.


I didn't mean to imply it was a settled question. I had in my mind the notion that all this is rather unproven at the moment. I quite strongly suspect that there is a small but significant amount of whatever "intelligence" might be within GPT3 et al.

But I completely agree, it is just a suspicion.


Sure, but I see no reason to not treat things like GPT-3 and TWDNE as evidence. If it looks like a creativity, and quacks like a creativity, then we should at least entertain the notion that we have a small artistic process on our hands.


Of course we can entertain the notion. We just shouldn't pretend the question is settled one way or the other like the two comments above and pointlessly argue about it on hn.

Personally I'm inclined to see things like GPT-3 as qualitatively different to human creativity, but I don't claim to know this for a fact.


Well, but also, if you think that GPT or TWDNE are different from creativity, when they look similar to creativity on the face of it, I do think it's not enough to say "well I think it's a different thing", you should actually put out there where you think that human creativity materially diverges from the output of an ANN.

I just think there's a lot of people arguing "well this can't be creativity" out of some sort of anti-AI signalling sentiment, where they've become so used to disagreeing with unwarranted AI optimism that they undervalue AI progress routinely. In other words, "this can't be creativity, because we don't understand creativity" or "this can't be creativity because AI is very far from human-level" or such.

(Personally, I believe GPT-3 and CLIP are already superhuman; that is, superior to the circuitry in the brain doing the equivalent job. They only look weak because we're trying to make them do all of the job in one step, when we're comparing them to the brain's highly heterogenous self-correcting approach.)


You say I'm required to "put out there where [I] think that human creativity materially diverges from the output of an ANN" but you state your opinion with as little support as I did. Both are just opinions.

My views on the matter have nothing to do with being "anti-AI" or "signalling" to anyone, and the latter suggestion is mildly offensive. Like many here, while being no sort of expert I've thought quite a lot about these questions and have fairly detailed views about them. I suspect strong AI is possible and will not require new fundamental physics or anything like that. However, I suspect that if it is achieved it will be of a very different nature to GPT-3 etc and to ML in general. I don't think the behaviour demonstrated by these systems so far resembles human creativity, understanding or reasoning at all, although they are capable of impressive and fascinating things. I'd go so far as to say the differences seem, to me at least, rather obvious, if difficult to define. However, I'm open to alternative views and to the possibility I may be wrong.

I greatly enjoy discussing these sorts of things, but adversarial exchanges on hn filled with ad-hominems aren't the way I like to do it.


Exactly. There's some definite substrate-ism (like sex-ism) going on here!


> What makes you think human creativity is anything more than statistical modelling?

Our ability to make the decision between following the rules and breaking the rules, when suitable. A computer could also break the rules, but in most cases it wouldn't make sense or look like good, while a human could make judgement about when to break the rule. Sure, we all learn by copying, but after a while, we start getting a feel for when to break the rule, and that's when unique art appears. Computers seems to not have learned this fact yet (or rather, haven't been taught that yet).

Using the tool that this submission offers, all the results will look similar and can be traced back to the training set you give it. Do something similar with a human (over similar amount of time that the machine got, in terms of human time) and eventually the results will look way different than the training set, as what we see with artists in real life.


You imply a bunch of things I didn't say. We don't have a definition or model of creativity or intelligence, even within humans and other animals. Most humans struggle to view other things as intelligent when they are at a close inspection. We have some mild understanding at a sort of system level, but it's usually enough to generally bin things up into not intelligent or not.

For something like the original post, we do know what these things are. They're statistical models. Full stop. They show no indication of what we see in creative and intelligent behavior, that is the ability to self-adapt to both internal and external initiatives. This GAN in the post has no ability to step outside of the statistics in the training set unless the model is updated to prod it to do so. The model can be changed, but it is a forceful change. If you show me tens of thousands of images, I am not, at an emergent, top-level, system level, etc., bounded to the statistics of that image set. Is this GAN asked something or given a goal aside from an implicit "draw something like what we've given you"? Even if I do draw something like or akin to the given image set, I have full creative control over the image (assuming some drawing skill).

If the human brain (and really body) can be modeled via a statistical model (which is not yet known but is surmised as you imply), that doesn't necessarily explain high-level behaviors. More is different. You call it magic sauce, but others call it emergence. Our understanding of emergent behavior and complex systems at large is still in work.

In my view, metaphorical thinking, of which analogical thinking is a subset, is a likely kernel of human intelligence. While these statistical models are copying, which is similar in a way to analogy building, it's not quite there. The reason things it generates looks like other things is because it searched a parameter space for matching statistics. However, it cannot even explain that's why it generated what it did. We explain for it. These things are no more artificially intelligent than things like thermodynamics are naturally intelligent.

Lastly, as I pointed out in my original comment, if this is indeed creative as someone like you implies, the article fails to make a convincing argument and bounces around a lot of buzzwords.

> I find it very irritating that such shallow reasoning prevails amongst intelligent people.

I was offended, but I suppose I agree. ;)


"create" doesn't necessarily imply "creativity" so I'm not sure why this is brought up. Of course generated images right now don't have any creativity attached to them and that's not the purpose of this article. All the images shown seem to mimic the generic "anime" art style and seems close enough to what a human draws which seems to have been the goal


One of the things being an anime fan has made me realize is how the industry goes through phases in everything: gimmicks, art styles, tropes, etc. This can at best replicate what exists, but it cannot create new styles which honestly comes from the creative process of hundreds to thousands of artists creating and influencing one another.

People keep forgetting that you really can only fit to data you have. Extrapolation exists as a concept but the requisite intuitive knowledge needed in order to create something new that can be successful is hard to understand even by humans at this point (how much of the business press is full of gimmicky blogposts about how to be successful, full of contradictory anecdata, opinions, advices), I don't even know how AI researchers would go about tackling that. I am not an AI person but as a plain old scientist I know extrapolation without intuition is almost always a fraught effort.


> but it cannot create new styles

Sure it can, but then it's not considered anime anymore. I think this sentiment is confusing genre with training set constraints. A new model is not required for some artist to do something different from anime or even an anime artist to do something different. Humans can self-adjust all with approximately the same base model (whatever that is).


> There isn't any AI either. It's machine learning, i.e., statistical models and algorithms.

I always thaught those were synonyms.


Both the A and the I in AI are debatable terms.

The sole definition of intelligence is under heavy load of reconsideration the last decades with the emergence of a better knowledge of animal cognition, for example.


Why is the A[rtificial] under debate? It's not natural (biological) intelligence, unless one goes the route "humans are a part of nature therefore the things we make like plastic and skyscrapers and hamburgers are natural".


It needs to be wondered whether drawing a face or winning a game of go may or may not actually be real intelligence, only very narrow.

We like to think that AI is only statistical models, which may be true (is true?), but we have no idea that what happens in brains is anything else than a highly optimised, deconstructed, such model.

Current datasets need 10ks pics of a cat while a kid only need to see a cat twice? Good for the kid, but there's no hint there's a fundamental difference of process.


AI has been co-opted as a synonym for machine learning, if only for marketing purposes. Machine learning existed for decades before it was "renamed" as AI.

I just don't see anything intelligent yet. People somehow have gotten confused with the success of machine learning being treated as intelligence. We have a lot of statistical models of things that are very successful, but those aren't considered intelligent. For reason, machine learning telling you something about data has suddenly been treated as AI. Machine learning can do some impressive things, but I think its short-sighted to equate AI and machine learning.

Intelligence is really a tough thing. Watching a video of even a single-cell organism displays a sort of intelligence and behavior far beyond anything I've seen of machine learning. So why is it intelligent? Or is it intelligent? I'm not entirely sure, but my point is that machine learning is orders of magnitude incapable of describing (i.e., modeling) even the simplest self-directed and self-adapting behavior that we see in the real world.


There isn't any I either. It's biological learning, i.e. neuronal models and feedback loops.

/s?


Purists seem to define AI as anything that is currently too hard for computers. Wasn't long ago that chess engines with no neural networks or machine learning were being called AIs.


I find odd that no one is tracing back portions of generative arts back to publicly posted sources. Internet horde of bored people were used to be good at finding where an unattributed pieces come from. Are these really generalized and stored in highly abstract forms? I doubt it...


> This isn't really creativity is it? I like to call stuff like this statistical copying

Or, through decades of AI research, we're just now starting to better understand what actual creativity really "is"?


read the bottom! The part about creativity is on the bottom :D


I made a waifu, then I pressed save. It wanted me to signup so I pressed back so I could just rightclick saveas but... I lost my waifu forever now :(


This is like the condensed story of humanity


This is why we need local self-hosted AI. Keep the waifus safe.


I mourn for your lost perfect waifu, but if it helps, your comment likely saved other waifus because we clicked [Download] instead of [Save].


Waifu is kept in memory, normies.


It's okay, Step 43636 will come to console you in your dreams.


"Keep precious things inside you, or you will lose them"


Could have been an NFT


comment of the year, and it's not even friday


> Step 43636: During this phase, the training gets unstable at times, so we have snapshots of occasional horrors like this.

Ah, make that three things the public shouldn't see being made: sausage, legislation, and waifus.


I wish I had the stones to call my company Waifu Labs


I didn't know what you meant so I googled it. The technology is cool, but the content is ... problematic.

From urban dictionary:

   "Waifu" is used to refer to a fictional girl or woman (usually in Anime, Manga, or video-games) that you have sexual attraction to, and you would even marry.
Huh.


A big part of it you're missing is that it's a joke. Anime fans definitely know they're weird, and are very passionate about the things they like (and are conscious of that), and thus a lot of humor in the community is self-deprecating and ironic (for example, calling your favorite fictional character your "waifu" or "husbando"). The fact that outside observers might think it's "problematic" is kind of the whole point.


youd be surprised at the waifu market size


I can’t wait for this technology to come to video.

Imagine a future where people can compile written scripts into Hollywood quality movies.



I wonder what will happen when somebody combines a GAN with a feature recognizing network like the Tesla cars use, so it can use its own extrapolated map of the surroundings to stabilize its output as the camera moves around.


I think it will have its limits, but the possibilities for editing together, supplementing and modifying products from smaller AI modules should stretch out what you can do on a small budget.


The explainer video about gans is top-notch! Excited for Arrowmancer!


(link for the lazy: https://youtu.be/Pab8pG5WbXQ)

Thanks so much! It's done by our fantastic animator[1]!

GANs are quite interesting and we didn't see many approachable explainer videos targeted at lay people, so we decided to make one ourselves!

[1] https://twitter.com/bumblingbeebo


Is there a gameplay video? Or at least screenshots of arrowverse?


Here's our current game trailer! https://www.youtube.com/watch?v=8WvRgb6kh4s


For the author: there's a small typo "Discrimniator" instead of "Discriminator" in the video at 1:11

One thing I was confused by: the video says the discriminator "AI" is trained to detect true vs. generated results, with the hope the generator becomes good enough to fool the discriminator. But why is the discriminator useful, then? Couldn't you just tell generator "AI" whether the result it produced was true or not?

I think the answer is.. you don't want just a perfect recreation of the training data you gave to the generator, instead you want the generator to produce variations of that training data, so there's a "how would you know if it's 'a true result' / good enough?" problem. So the discriminator is useful because it's not a direct comparison, but rather a "this looks approximately good enough" comparison of the true vs. generated result.

This all makes me wonder: what sort of data set needs to be fed to the discriminator to train it? Is it some sort of "true image" and "true image w/bad alterations (e.g. lines, scratches, etc.) to it" data set?


If you think about how humans draw things, there's a repeated process of "create content" to "inspect for issues" back to "create corrections", "inspect for issues" etc. It seems that generation and criticism are two rather distinct skills, which is why GANs, generative-adversarial networks that contain both a "generator" and a "critic" network, end up making fewer mistakes and learning distant correlations better.


Thank you for the typo!

Indeed, it contributes to the variations problem.

also: If the discriminator starts off perfect, then the generator can't learn to be better.

Sort of like a human learning to play chess: If you start off with top-tier opponents that crush you, then you don't have a gradient to learn from. Instead, you need players at your own level to grow your skills.


The game is an interesting mix of mobile fun with the concept for El-Fish. https://www.wired.com/1993/02/maxis/


This is mind-blowingly good. You keep pushing the state of the art further to the point of broad applicability. It won't be long until everyone can be an artist without putting in the ten thousand hours of drudgery of training their muscles, hand-eye coordination, structure of shape and perspective, etc. I can't wait!

Do you have a team page? How many of you are there? Do you work with gwern and nearcyan? Are you going to raise for this? (You should totally scale this!)

Great work, and keep it up!


No team page yet! Friends with near, but haven't had a chance to meet gwern yet (maybe one day??)

Arrowmancer is really our first attempt at scaling it up; hoping to do even cooler generative AI-related in that production.


All the best! Super excited!


It's funny how people complained about github copilot 'stealing' people code, but nobody here complained about this AI 'stealing' artworks.

Don't get me wrong, I have nothing against this, but I think we should start discusing morality of AI generated content, even if it doesn't train on existing artworks/code.


The image quality is good, but now I realize I'm experiencing "uncanny waifu". Authentic character designs bear two things in common:

1. Simplifications of reality(the actual artist training method would be traditional studies off life and photo reference followed by gradual reduction and symbolization to a style)

2. Symbolic meaning. Things like the style of eyes, clothing, etc are all meant to signal personality. This is stuff that current AI techniques don't really touch upon in any direct sense.

Since the ML method is built on interpolating off final results, it's going to lack in these qualities and produce something that is consistently an "average impression". Akin to asking the algorithm to generate mythical heroes by mashing up the various stories: you get a hero that is somehow the average of Icarus, Heracles and Achilles, which would be less of a character than the originals.


Could it work backwards? Eg, to take a hero like Heracles and determine he's say 40% shared with Gilgamesh. Then we might see that even the originals aren't very unique.

Just a thought, I don't really know anything about ML.


I found the most interesting part was the evocative comment about the 'vast and parched' nature of the latent space.

I wonder if the OP's intuition regarding the sparseness of the latent space, and the relatively small area occupied by the 'useful' manifold? embedded within it provide us any clues as to what symbol grounding might look like for some neuro-symbolic infrastructure that sits atop that latent space.

I.e. how should we be trying to represent concepts like 'male' and 'female' within that space?

Is it important to have these concepts represented as a low dimensional manifold?

Is it important that this manifold be easily described by some simple geometric form like a convex polytope?

Is it important that nuances and variations on the concept be separable within the bounds of the concept-specific manifold?

What other properties might be important?


What methods are there to estimate how many unique characters a model can generate? The answer is not infinitely many, but determining when two images are of different 'characters' is fuzzy.


It's hard to say, but I think a useful measure would be to look at mode-dropping compared to the training data. Whatever the 'number of unique characters is', it clearly ought to be at least as large as the characters you see in the original training data, right?

For TADNE, Arfafax ran Danbooru2019 and a few million TADNE samples through CLIP to get the image embeddings, and clustered them; when the two sets of clusters were graphed using tsne, you could see that the TADNE StyleGAN2-ext did a lot of mode-dropping in that many smaller outlying clusters of characters/franchises/topics simply did not appear in TADNE samples. The TADNE looked like a big galaxy, while Danbooru2019 looked more like it was surrounded by archipelagos. TADNE was extensively trained on them and was a very large model, but the GAN dynamics & StyleGAN architecture mean it didn't do a good job absorbing rarer/more idiosyncratic Danbooru2019 image-clusters.

I expect newer generative models which avoid GAN losses and which use more flexible (but expensive!) architectures, like DALL-E, would perform much better in terms of mode-dropping, so you'd see a lot more unique characters/images out of them. (I'm very excited about them. As good as TADNE or Waifu Labs v2 may be, I think they are still far behind what could be done with just existing data/arch/compute.)


I hope these generators expands into non waifu / pretty boy anime depictions. There's a lot anime gaijin faces out there to explore.


I would like to see how it generates late-80s/early-90s style features. The current pool of anime art styles are very generic (aside stylistic outliers) and I'd love to see Cowboy Bebop/Akira/Bubblegum Crisis type character designs


Miyazaki's style of character design and fantasy would be amazing.


Agreed.

In many ways Miyazaki's style is "nonstandard" for anime, possibly because he was partly inspired by European artists (think Moebius, whose influences can be seen in Nausicaa for example).


I'd be curious if such a GAN could actually beat Oda at generating new One Piece characters. 20 years into it he just doesn't seem to stop at creating hilarious characters.


This one has some rough-looking ones! They're a bit rarer than most.


So putting some Anime digital illustrators out of work? ....or I could see many simply use them and pretend they did it themselves?

Obvioisly there will be plenty of illustrators doing custom work that these can't (yet) replicate.

Also good for those countless anime avatar'd Twitter users.


Amazing work and progress. The previous version appears almost toyish in comparison.

FYI uBlock Origin complains about the registration link, because it on "Peter Lowe’s Ad and tracking server list".


That's because it bounces the registration link through a tracker.

If you're OK with being tracked, you can permanently allow that domain.


I would be interested to find one I like and use it for a desktop companion project, not an original idea but I am not an artist ha.


Is there any way to approximate embeddings for a novel image?

novel meaning user provided, not generated by the model or in the training set.


Not at the moment in our tool, though this is an area of great curiosity and research for us!


Does the discriminator model translate the images into an embedding space of it's own? Could such a space be used to generate images themselves?


Have you considered applying similar models to VR avatar creation? That's a market in itself


This tool has many applications, and those that will make you rich isn't about anime.


Oh, I see what you are saying....

More things for, like, adults?


It needs to make money?


step 1: generate random waifu

step 2: NFT all the things

step 3: profit

step 4: GOTO step 1

step 5: automate steps 1 to 4


any good info on manipulating the "control vectors" in the latent space?


Cool, it's like an AI-generated picrew


project name is kinda incel cringe :(


[flagged]


Painting with a brush stroke set to galaxy levels.

I'm not saying there isn't creepy anime, or even creepy motifs in excellent anime. But that stuff sells over there.


I'm not assessing whether it is creepy or not, but how can 'it sells' have any place in that argument?


Because if you are a mangaka/anime artist, you'll be pressured to add otaku stuff to boost sales.


And then -- pedophile content is ok for you? Wow.


You say pedophile, they say legal age of consent. I think it's different set of mores. Although it's complicated the age of consent depends on prefecture, age and circumstances and it's around 13 year old.

And anime throw wrenches by having people's apparent age being different than their actual age.

Take for example ReZero, where female heroine is 118 year old elf that looks like an average 17-18 year old, but has the same knowledge as a 12 year old human. This of course squicks out main protagonist who's about 17.

Even though by applying our social mores, he'd be in the right.

Different mores, different opinions. I honestly don't care too much about them. They are most often just stupid thing for Otakus to obsess about. And you are very fixated on pedo content, when the Otakus content is much, much broader see (and just as degenerate) - loli yuris, incest, BDSM, any combinations of previous, etc.

And these are just tips of the shitberg.


Let me summarize. For you, child rape being bad is merely an arbitrary social custom ('I think it's different set of mores.'), and you see no difference ('just as degenerate') between two 30 year old adults doing BDSM -- and child rape. Finally, for you it's just the 'story'; you find it normal to watch a 6 year-old being raped as long as the story is 'oh, she's actually 107 and from a different planet', and you think we all agree with that ('Even though by applying our social mores, he'd be in the right.')?

Did I get that right?


I mean I agree child sex is bad. I also agree incest is bad.

But there were societies that promoted both and still kept existing. Just because I have a set of social norms, caused by my upbringing, doesn't mean I get to judge people that had different upbringing.

---

Let me clear some things. Pedo content 99 times out of 100, isn't literal child rape. Hell, there isn't any sex at all (actual scenes of sex get you 18+ rating so anime and manga would avoid actual depictions since you narrow the audience). If there are any such scenes where an older person has involuntary sex with a younger person, they are depicted as vile and heinous acts.

It's usually just minors* having sex with other minors with both parties agreeing. It's usually not depicted as actual non-consensual sexual acts.

So where is the child rape? Well, in the details, i.e. consent. See my comment about apparent age. It gets into squicky territory quickly.

* - One of the minors or both might have different apparent age than actual age.


Japanese just naturally look underage to western eyes. Opposite is also true btw. I think I’ve seen some women mention having to wear “Chinese eyed Pocahontas” makeup to be treated as a human in Western countries or else everyone looks away.

“500yo loli girl” argument might be pure BS but there is no way that age guesstimation on anime girls by someone who aren’t super familiar with local culture works either.

I mean take a look at a prehistoric clay figure, and try guessing age. I can’t.


I am sure all of this is true, but it's not the discussion in this thread. Instead, the first post (now flagged) initiated the discussion about pedophiles (so there's really no room for any interpretation about how old someone looks) and Ygg essentially argues that that's just a question of point of view. I fundamentally disagree with that.


I think he failed to clarify what would pivot depending on a point of view. I've seen enough examples of Japanese individuals past age of consent, even close to end of fertility or past it, yet assumed underaged, so the premise of "pedophile culture" seem to be based on inaccurate understanding to me.


No one has brought up a 'pedophile culture' in this discussion up to now, so there is no inaccurate understanding as this premise does not exist in this discussion.

Instead, the discussion was about how (i) there being demand fully justifies delivering whatever pedophile customers desire, with Ygg arguing that (ii) child rape being bad is just an arbitrary societal convention.

Again, I fundamentally and vehemently disagree with both of these premises.


Amazing now I can create my own anime girl!


I don't want to go all SJW on you guys, amazing work, but can you try to make sure there's an inclusive array of starting faces please? Talking about things like skin tones, thanks!


Indeed, we spent 2 years working on this!!!

It's an extremely hard research problem, because darker skin tones account for only about 0.3% of all anime art produced in the world.

We have employed an absolutely exhaustive array of art and data science tricks to give the model the ability to draw darker skin tones, though they are underrepresented. The results that you see today are the culmination of many months of careful tuning!

It's not definitely perfect, but from a data science perspective, this situation can't be rectified until the art world makes a shift.

Personally, I hope that more art representing dark skin tones will be created in the world!


I don't quite understand the need to mandate racial caricaturizations in every modes of communication. I would find it upsetting if every pieces of texts and quotes were prefaced with an "origin" indicator to aid forming prejudices. Why have it in manga?


If you are part of a marginalized group, it is both nice to the individual and productive for societal-level integration if you find yourself represented in media, especially if it's a generator with a promise of "generate anyone".

No one here is mandating shit, GP asked a friendly question in a most respectful manner, which prompted an informative answer from OP even. If such interaction generates such an allergic reaction in you, the problem is not with the grandparent comment.


> societal-level integration

I really think this integration predominantly means systemic internalization of racism, with side effect of affirmative actions. When ethnicity is expressed with intent, a distinction is made, and distinction is synonymous to discrimination. Or classification, for that matter.


What do you want to tell an otaku about marginalized groups?


What mandate? It's a product designed to be used by living, breathing people and people tend have various preferences. Do people requesting darker skinned models somehow upset you more than people who request red haired models?


Okay, I backtrack a bit, GP didn't specify that they want an adjustable ethnicity, only skin colors, and I made an assumption that exaggerated ethnic features is the intent.


I can’t understand why the GP comment is flagged. If you can look past all the “culture war” stuff, this is pointing out some of the limits of algorithmic creativity.

It does not do well generating instances with features that are not well represented in the training dataset.

Compare this to human creativity. I suspect that fulfilling GPs request would be almost trivial for a human professional artist.

To be clear this is an amazing achievement, a creative use of the technology, and a positive contribution to the world. Pointing out limitations (i.e. areas with potential for future innovation) does not diminish it.


That's because humans have also seen a lot of human beings with diverse skin tones. If we had only seen anime our whole life it would be much more difficult for us to conceive of also. With more compute, we will eventually be able to make bigger models with more knowledge of the world that will also be able to overcome this.


I think that is correct, but also a much much easier said than done when it comes to the implementation.


Thanks, happy to hear that you guys are on top of it.


Honestly asking, isn't this stuff usually Japanese characters? I dunno if theres even "data" they could use for other skin tones


Anything irrelevant to plot or story building, such as nationality, are usually left unspecified. Readers often project their own identity into characters, and some people seem to find it odd that they do not encounter traits that differ from their own.

Ethnicity is sometimes incorporated, that is, some distinctions would be necessary if there was a documentary manga about a match in an Olympic Games played by teams from multiple parts of the world, and in that case an American players might be given smaller eyes or extra wrinkles in face, or African players might be colored darker than other characters, Chinese players could be drawn with slightly different shapes of chins, etc.

But the default is unspecified or an averaged, most simplified shapes and forms that the author uses in their own cognition.


Well, there is (or rather was) "ganguro" at least, which appeared in mangas an animes occasionally, too. Not sure how many people would be getting offended by that these days, though.


Ganguro is associated with a certain Japanese subculture and aesthetic, and not meant at all to represent darker-skinned races. There are works that feature a more diverse cast, but perhaps the proportion of representation accurately reflects what you would see in Japanese society.


> Ganguro is associated with a certain Japanese subculture and aesthetic, [...]

Yes. What's your point?

> [...] and not meant at all to represent darker-skinned races.

No one ever claimed that. The question to my post was simply where to possibly get data from for "darker skin tones". Ganguro seems like an option because ... well, darker skin tones? Care to elaborate why you bring up "representation of darker skinned races" up in this context?


In the context that they were using skin tones in, (cf. "usually Japanese characters", "go all SJW on you guys"), it's easily inferred that the interest is in more diverse representation of races and not excessively salon-tanned Japanese skin.


... because that was the request of the root comment? And also because Ganguro does not, in fact, represent races with darker skin tones so is a falsely equivalent set of data?


It's funny how people get attached to this topic even when it's just about fictional drawn characters. But yeah, for the sake of data: Just put a color filter on the images, tone the skin down and put them back into the training set. Is that a "falsely equivalent set of data" to represent those groups? Who knows. Probably, if you really want to believe in that. But before we go any deeper into this completely out-of-context racial representation argument, keep in mind that the process I just described is the exact same thing actual artists of said fictional drawn characters use to achieve the results in question.


> process I just described is the exact same thing actual artists of said fictional drawn characters use to achieve the results in question.

If we understand the ancestor comments as asking for more diverse representation, then no, because ganguro and many other anime archetypes and their clothing and accessories are ultimately Japonicentric, in the same way that you might consider angels and demons contrasted against each other post-antiquity Eurocentric.

My opinion on the matter is that it's not necessarily a worthy goal, but there is a more distinct difference to representing people of different races even in anime faces than just skin tone.


In anime specifically dark skin can be more associated with demons, monster girls and dark elves than humans though :)


The Pokemon Jinx was banned and redesigned, so it's offensive enough to some people.


On the other hand, you get a full spectrum of eye and hair colour.


In Japanese anime, ethnically Japanese characters are regularly represented with unusual hair and eye colors, to help distinguish them from each-other. Including e.g. purple eyes or blue hair not naturally seen on humans.


Anime has a pretty severe "representation of blackness" problem (https://www.youtube.com/watch?v=hi2_S6kBgIg is one video discussing this). I'm afraid to imagine what a model trained on that source content would generate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: