Topic: So I trained an AI on e621...again

Posted under e621 Tools and Applications

Gentlemen BEHOLD!

Result
Training Timeline

I posted a few weeks back about a picture drawing AI I made using pictures from e621. This is the second version, generating larger images, though it seems to be pretty unstable. It may well get quite a bit better if I let it train some more. There is still a lot of experimenting to do.

Follow my FA if this kind of abuse of science interests you.

Updated by savageorange

It's actually sort of nightmarish, because they almost have faces, bodies, but not quite. It's trippy. When I first see the images my brain goes crazy because it thinks it sees humanoid forms and faces (our brains are so crazy about faces that we can see them in anything, a phenomenon called pareidolia) but then they slowly dissolve into the blobs that they are. Cool stuff.

Updated by anonymous

Is it just me, or are some of them supposed to be MLP images?

Updated by anonymous

Aeruginis said:
It's actually sort of nightmarish, because they almost have faces, bodies, but not quite. It's trippy. When I first see the images my brain goes crazy because it thinks it sees humanoid forms and faces (our brains are so crazy about faces that we can see them in anything, a phenomenon called pareidolia) but then they slowly dissolve into the blobs that they are. Cool stuff.

Indeed, I have noticed the same. Seeing that these kinds of neural networks have a lot of similarities with the primate visual system, seems plausible that pareidolia would arise.

GameManiac said:
Is it just me, or are some of them supposed to be MLP images?

I've noticed that too, especially in earlier epochs. My theory is that MLP images tend to have relatively simple, distinct, colorful shapes, making them easier to recognize. And the program is trained on the top 100k highest rated images, which I think may contain a pretty high percentage of MLP images.

Updated by anonymous

Lance_Armstrong said:
Were you inspired by Google DeepDream?

I think they are using tensor flow for the process.

deepfur said:
Gentlemen BEHOLD!

Result
Training Timeline

I posted a few weeks back about a picture drawing AI I made using pictures from e621. This is the second version, generating larger images, though it seems to be pretty unstable. It may well get quite a bit better if I let it train some more. There is still a lot of experimenting to do.

Follow my FA if this kind of abuse of science interests you.

great work!

Updated by anonymous

Lance_Armstrong said:
Were you inspired by Google DeepDream?

That was definitely one of the biggest inspirations that got me into machine learning at first! Though the actual technique I use to generate these images is totally different from how DeepDream does it. I use the method described in this paper, with some modifications.

Updated by anonymous

TheKvltGoat said:
One of the pictures has eyes, so that's a start.

I would argue that those are the worst :P, because when you cross that line and are unlucky you might start the horrific descent into uncanny valley.

Updated by anonymous

Chessax said:
I would argue that those are the worst :P, because when you cross that line and are unlucky you might start the horrific descent into uncanny valley.

You have to journey through the uncanny valley before you can reach the peak of hotness.

Updated by anonymous

I for one would like to see this with a larger training set: such as everything! mwhahahaha

I attempted something like this trying to make a tagger ai with image recognition using neuroph. My training set was 150 select pictures with the erect tag. Unfortunately I ran into the resolution problem and had major problems with my computer stalling during training.

Maybe you could consider including only simple_background into next run to cut out some usless data.

Updated by anonymous

CuteCoughDeath said:
My training set was 150 select pictures with the erect tag.

You need to do more than that. You should also feed it pictures that don't have the erect tag and downsize all pictures to the same resolution at the least.

The full public image set is around 688GB which will take forever if training 150 images was taking too long.

Updated by anonymous

mrox said:
You need to do more than that. You should also feed it pictures that don't have the erect tag and downsize all pictures to the same resolution at the least.

The images I was feeding it where 150px sq cropped sections of the penis. Then for the comparison I was going to split the test image up into overlapping sections and test each section. This is because of the resolution limit. Too many pixils = too many neurons on the first layer. Also for my initial test I didn't use the GPU but I abandoned hope because neuroph would keep stalling.

mrox said:
The full public image set is around 688GB which will take forever if training 150 images was taking too long.

I mean deepfur. Deepfur did 100,000 images in 9 hours. That's the whole e621 in about 3.5 days.however (me-thinks) it would make more seance to train it exclusively with like simple images.

Updated by anonymous

This is trained on 128x128 thumbnails as opposed to the previous 64x64 thumbnails, right?

It's promising enough that I wonder about training it on just a specific artist's works, to reduce the scope of the problem and hopefully thereby improve the output quality. Do we have any artists prolific enough on e621 that that might be possible?

Also about training on sketches, not just finished work. (probably that would require a more carefully chosen representation of the input data, as small thumbnails would fail to represent a lot of the subtleties of lineart.)

(I'm thinking about this because I do a lot of sketches. This can't compare to the e621 corpus of 900,000 images, I could probably only dig up about 8-10,000 unique sketches from my archives.)

But it would be really interesting to see if an AI could suggest [vague in the manner of al.chemy] artworks or designs that fitted an artist's overall style, and how much training of the AI would be necessary to get to that point of stability. It might possibly be a helpful tool for an artist to hilight the merits and failings of how they're doing things.

(augh, now I'm thinking about how you could apply analysis filters like Isophotes, Difference of Gaussians, Segmentation, high-radius Gaussian Blur, and wavelet decomposition, to input so that you could get simplified outputs that just predict one aspect of the input corpus: edge structure, value structure, color structure..)

EDIT: Why not add more wall of text:
Have you tried at all to achieve some level of image classification? Was just thinking about the structural differences between 3d renders, color paintings, grayscale, and linearts. Similar idea as before: "partition up the problem space so that you get N smaller but higher-quality (more consistent) data sets".

Updated by anonymous

The robopocolypse isn't going to be the end of humanity, it's just going to be really awkward for nonfurries.

Updated by anonymous

So what would be the ultimate goal for this AI? to be able to generate original art?

Updated by anonymous

CuteCoughDeath said:
The images I was feeding it where 150px sq cropped sections of the penis. Then for the comparison I was going to split the test image up into overlapping sections and test each section. This is because of the resolution limit. Too many pixils = too many neurons on the first layer. Also for my initial test I didn't use the GPU but I abandoned hope because neuroph would keep stalling.

I mean deepfur. Deepfur did 100,000 images in 9 hours. That's the whole e621 in about 3.5 days.however (me-thinks) it would make more seance to train it exclusively with like simple images.

As mentioned, you need a MUCH bigger training set to get anything resembling a good result. Imagenet (a famous AI image classification challeng) for example includes over a million images, and I hypothesize image generation may need even more. The 9 hour figure was for the 64x64px dataset, training this version took around 3 days. I need to update my FA later today. Also, trust me, don't even bother using a CPU for this kind of work. A high end GPU is a must for your sanity. I use a GTX 970 and I'm severely limited by it still. Many super interesting architectures just aren't feasible with my hardware sadly. Most industry standard AIs are trained on machines with 4 Titan X cards, or a dozen of them. Only using simple_backgrounds is a good idea, I'm currently going through tags by hand to filter out ones I think will be more useful than others.

savageorange said:
This is trained on 128x128 thumbnails as opposed to the previous 64x64 thumbnails, right?

Yep.

savageorange said:
It's promising enough that I wonder about training it on just a specific artist's works, to reduce the scope of the problem and hopefully thereby improve the output quality. Do we have any artists prolific enough on e621 that that might be possible?

Also about training on sketches, not just finished work. (probably that would require a more carefully chosen representation of the input data, as small thumbnails would fail to represent a lot of the subtleties of lineart.)

(I'm thinking about this because I do a lot of sketches. This can't compare to the e621 corpus of 900,000 images, I could probably only dig up about 8-10,000 unique sketches from my archives.)

But it would be really interesting to see if an AI could suggest [vague in the manner of al.chemy] artworks or designs that fitted an artist's overall style, and how much training of the AI would be necessary to get to that point of stability. It might possibly be a helpful tool for an artist to hilight the merits and failings of how they're doing things.

The "complexity" of the images as we as humans perceive it isn't really the problem. The AI needs the same amount of computation to process a sketch as it does a renaissance master piece (although reproducing the latter is of course harder). If anything, more data is considered the one universal thing that always makes AIs better. But while one artist is unlikely to have enough images to build a whole AI, thanks to the way these AIs are constructed, they can "reuse" "tricks" they learn from images. So if it learns to find faces from one artist, it usually can apply that to other styles. For an amazing display of this, check out Neural Style Transfer and the even more fun Neural Doodle. While it would be possible to construct an AI that understands style, that is sorta beyond the computational resources I have access to at the moment. And honestly, I doubt how useful it'd be to artist, since art is inherently subjective and personal. But it's a neat idea.

savageorange said:
(augh, now I'm thinking about how you could apply analysis filters like Isophotes, Difference of Gaussians, Segmentation, high-radius Gaussian Blur, and wavelet decomposition, to input so that you could get simplified outputs that just predict one aspect of the input corpus: edge structure, value structure, color structure..)

Funny story actually: Convolutional neural networks learn to apply filters like that themselves! Check out this paper for some cool visualization (you can just look at the pictures and ignore the jargon). This is what makes them so good at visual tasks, and why you can take a network trained on photographs to for example apply Van Gogh's style to a picture of the Golden Gate bridge (See Neural Style Transfer), even so it wasn't initially intended to do that at all!

savageorange said:
Have you tried at all to achieve some level of image classification? Was just thinking about the structural differences between 3d renders, color paintings, grayscale, and linearts. Similar idea as before: "partition up the problem space so that you get N smaller but higher-quality (more consistent) data sets".

I haven't attempted to build such a network yet, my current technique is purely focused on generation, it doesn't look at tags at all. I plan for my next network though to be a kind of hybrid classification and generation network.

Updated by anonymous

Ryuzaki_Izawa said:
So what would be the ultimate goal for this AI? to be able to generate original art?

The ultimate goal is to amuse me, really. This is just a funny hobby. But I guess yea, it'd be cool to make an AI that can generate original art on demand.

Updated by anonymous

Aeruginis said:
(our brains are so crazy about faces that we can see them in anything, a phenomenon called pareidolia)

kinda freaky how that happens irl sometimes. o_O but it's definitely true.

lol especially if say...you happen to see a face on the bathroom floor formed from the weird patterns of the...floor. ಠ_ಠ be careful with whatever you happen to be doing...the floor is watching you. >.> freaky and kinda creepy. sorry for any unwanted mental images there.

i hate my mind sometimes for this very reason. the world just conjures up some strange thoughts and images sometimes.

edit: come to think of it. that might be good to keep in mind if you want to really freak someone out. the walls, floor, and ceiling literally all have eyes to watch what you do...

Updated by anonymous

deepfur said:
The "complexity" of the images as we as humans perceive it isn't really the problem. The AI needs the same amount of computation to process a sketch as it does a renaissance master piece (although reproducing the latter is of course harder).

As a programmer, I understand well enough that it's exactly the same amount of data no matter what. The bolded point is related to what I meant: Constraining the problem so that you're solving a smaller problem to begin with.

While it would be possible to construct an AI that understands style, that is sorta beyond the computational resources I have access to at the moment. And honestly, I doubt how useful it'd be to artist, since art is inherently subjective and personal. But it's a neat idea.

If you look at al.chemy, it's a crude tool that includes no AI at all. but its popularity comes simply from it's rorschach-blot-like abilities of easily creating shapes that spark the imagination.My mind went there because your AI here already seems just a short distance away from automatically creating interesting shapes or even compositions.

(IOW: you seem to be thinking in terms of 'generating art', but I just mean 'generating visual ideas/compositions with a loose theme'. It could be pretty 'incompetent' and still be quite useful.)

I've already looked at Neural Style, but Neural doodle is new to me and pretty cool!

That paper was neat. but AFAICS the images were not illustrative of the learning of filters. I can only vaguely infer that gaussian functions like gauss-blur, difference of gaussians, and wavelet decomposition are a natural thing for convolutional NN to learn, whereas other filters like segmentation are not.

(Power failure just before I posted this, but amazingly, firefox preserved the reply data)

EDIT: another interesting idea is training an AI on procedurally-generated output - eg http://img.uninhabitant.com/spritegen.html . Might seem a bit abstract in nature; the idea would be to see if the AI's generation is more interesting than the procgen stuff, and hopefully understand its improvements and incorporate them back into the generator.

Updated by anonymous

savageorange said:

If you look at al.chemy, it's a crude tool that includes no AI at all. but its popularity comes simply from it's rorschach-blot-like abilities of easily creating shapes that spark the imagination.My mind went there because your AI here already seems just a short distance away from automatically creating interesting shapes or even compositions.

(IOW: you seem to be thinking in terms of 'generating art', but I just mean 'generating visual ideas/compositions with a loose theme'. It could be pretty 'incompetent' and still be quite useful.)

Ahh I see what you mean now, I misunderstood you slightly. I thought you were talking about Alchemy.ai, which confused me. Something like that would be doable, though as a non artist I'm not quite sure what would actually be useful. Definitely added that to the shelf of ideas, might be a more practical use for this stuff.

savageorange said:
That paper was neat. but AFAICS the images were not illustrative of the learning of filters. I can only vaguely infer that gaussian functions like gauss-blur, difference of gaussians, and wavelet decomposition are a natural thing for convolutional NN to learn, whereas other filters like segmentation are not.

That may be, which also raises the question of whether the information such filters extract is useful. Maybe. IIRC this is called "boosting" in machine learning, applying "enhancements" to data before feeding it to a network. I have thought of trying stuff like that, there are dozens of possible filters and algorithms. It's a technique mostly used in classification tasks, so I haven't looked too far into it. In the limit every possible such enhancement should be learnable, but with our fallible algorithms they can bring benefits, if you have the domain knowledge to find good enhancements.

savageorange said:
another interesting idea is training an AI on procedurally-generated output - eg http://img.uninhabitant.com/spritegen.html . Might seem a bit abstract in nature; the idea would be to see if the AI's generation is more interesting than the procgen stuff, and hopefully understand its improvements and incorporate them back into the generator.

Interesting idea. In theory, it would overtime approximate the function of the generator (which would be boring), but maybe imperfect training or injecting other data/noise could yield interesting results. I doubt it'd work on such low res images though, I think it would overfit too easily on the limited space of possible images.

Updated by anonymous

deepfur said:
In theory, it would overtime approximate the function of the generator (which would be boring)

I think that depends on whether the result is more interesting without the omitted elements. In procedural generation, over-constraint (excessive order) and under-constraint (excessive chaos) are both potential problems.

I doubt it'd work on such low res images though, I think it would overfit too easily on the limited space of possible images.

You can increase the size of generated sprites, but the output starts to look noisy rather than coherent, yeah, so that's probably not a good input source.

Updated by anonymous

  • 1