Topic: This Fursona Does Not Exist

Posted under General

Interesting. It seems similar to this fursona generator I saw a while ago. The one linked seems to have been trained for much longer and on more source images. I wonder if the github has a way to generate images on my own.

problem is, to quote something I've shared:

its tendency towards lots of sonic, zootopia, undertale, Digimon and other very recognizable characters indicates to me the net is not all that abstracted away: there's still people's individual creations embedded directly in it.

It would have been nice if they actually asked artists permission to use images as sample data. There are ways to amplify a small set of sample data. See https://www.youtube.com/watch?v=nBcZGjxnpDY

I would not be surprised if automated image scrapers become forbidden in the ToS as a result of this.

Edit: I would not be surprised if the API endpoints and download links are rate limited as a result of this, though I’d hate to see the day I need to type a captcha to look through more than X images in a given timespan.

Further Edit: huh, said YouTube author ever so happens to be the same HackerPoet Idem linked to.

Updated

bigotedsjw said:
its tendency towards lots of sonic, zootopia, undertale, Digimon and other very recognizable characters indicates to me the net is not all that abstracted away: there's still people's individual creations embedded directly in it.

That's actually 2 different well known problems in AI, Dataset Imbalance and model Overfitting. Although since this doesn't seem to have been created with academic rigor in mind it's not too surprising that they weren't in mind when developing the application. Don't be too harsh on the people who made it, those are not exactly trivial problems to deal with.

bigotedsjw said:
It would have been nice if they actually asked artists permission to use images as sample data. There are ways to amplify a small set of sample data. See https://www.youtube.com/watch?v=nBcZGjxnpDY

Data augmentation can only really get you so far when generating a new dataset. I'll 100% agree on your stance that it would be nice to ask for permission, but at the same time it is really impractical to do so. As far as I saw they're not sharing the original dataset used to train the model, so there isn't much really wrong with what they're doing. It is really not unheard of in academia to build a new dataset in this way if you're not planning on making it available

bigotedsjw said:
I would not be surprised if automated image scrapers become forbidden in the ToS as a result of this.

Edit: I would not be surprised if the API endpoints and download links are rate limited as a result of this, though I’d hate to see the day I need to type a captcha to look through more than X images in a given timespan.

E621's API requests are already limited to 2 requests per second, with a "1 request per second" cap being strongly recommended

Overall, it's fine. There's nothing really wrong coming out of this and there will likely be no strong changes in the site resulting from it. And if it lets GANs (or more likely, NNs in general) get a tiny bit more exposure and makes people interested in machine learning, I'm all for it

bigotedsjw said:
problem is, to quote something I've shared:

It would have been nice if they actually asked artists permission to use images as sample data. There are ways to amplify a small set of sample data. See https://www.youtube.com/watch?v=nBcZGjxnpDY

I would not be surprised if automated image scrapers become forbidden in the ToS as a result of this.

Edit: I would not be surprised if the API endpoints and download links are rate limited as a result of this, though I’d hate to see the day I need to type a captcha to look through more than X images in a given timespan.

Further Edit: huh, said YouTube author ever so happens to be the same HackerPoet Idem linked to.

Do furry artists take permission from companies when they make fanarts/porn of their IPs?

gradiusgadwin said:
Do furry artists take permission from companies when they make fanarts/porn of their IPs?

"B-but this is a completely different case! He's ripping off me personally, when I rip off big companies it's all good and acceptable!"

mabit said:
That's actually 2 different well known problems in AI, Dataset Imbalance and model Overfitting. Although since this doesn't seem to have been created with academic rigor in mind it's not too surprising that they weren't in mind when developing the application. Don't be too harsh on the people who made it, those are not exactly trivial problems to deal with.

Data augmentation can only really get you so far when generating a new dataset. I'll 100% agree on your stance that it would be nice to ask for permission, but at the same time it is really impractical to do so. As far as I saw they're not sharing the original dataset used to train the model, so there isn't much really wrong with what they're doing. It is really not unheard of in academia to build a new dataset in this way if you're not planning on making it available
...
Overall, it's fine. There's nothing really wrong coming out of this and there will likely be no strong changes in the site resulting from it. And if it lets GANs (or more likely, NNs in general) get a tiny bit more exposure and makes people interested in machine learning, I'm all for it

Yeah, similar to tracing cases on the site, unless someone comes out and shows that even one of those images is looking almost identical to any of the images on this site, they are considered to be unique creations.
I guess it just feels wrong because of how many of those look almost proper, however you do also have to remember that people have been all in with tools like waifu2x for some time now and that is essentially doing similar thing: just taking input, using the learned data from bunch of other peoples artwork to guesttimate how the image did look at double the resolution if it were originally double the resolution and then doing the thing.

Also this is why e621 stance is that users should ask for permission when uploading content here, but at the same time the scale and amount of content it is impossible to enforce. If those 55k images were even 10 from single artist, that would still be asking 5500 individual artists for their permission to use their publicly available work on dataset which is never shared publicly.

fenrick said:
Furries are, ostensibly, part of a sort of community, so when people do that kind of thing to "one of their own," there's something to be said of it being more personal.

That being said, I have no problem with, say, Nintendo filing a C&D over someone profiting off of that eeveelutions comic.

Yeah, when it is single individual, it's much more easier to have more direct contact with them, where corporations are these big bullies which are all about money making.

I feel like this kind of effect of just assuming how things work based on observation that people do is constantly causing all of this shit, because I see it on the site as well.
Pokemon company and Nintendo still own rights to their IPs, reason why fanart is allowed to exsist is because doing legal fights would be expensive and all you win is bad PR, most of the commissions are money going under the table. Meanwhile opening up Patreon to directly ask money from people is direct profiting from earlier said IPs which of course is easier for companies to just tell you to stop. An in case of the actual takedowns that happened, even if it wasn't Nintendo doing the takedowns, you most likely need to be in contact with the actual rights holders to get the fake solved at which point you are now making them aware of the situation and the real deal will tell you to stop.

But this is getting outside the point.

I'm the creator of this project. I've gotten a lot of flak for not asking artists for permission to use their art, but honestly it would have been impossible to try to coordinate contacting over 10k different artists to get permission, especially considering I had no social media presence before I started working on this. Neural networks need a lot of data in order to get good results, and if I'd been limited to only the images from artists who I was able to get into contact with and who explicitly gave permission, the results would likely have been much worse and no one would be talking about it in the first place. It also would have set the project back several months, and originally I only had access to the compute resources I used to train the neural network for a limited amount of time, so I wouldn't have been able to complete the project by the deadline if I had to spend a bunch of additional time trying to get in contact with artists.

Still, I do sympathize with the artists who have asked to be able to opt out of future machine learning projects. Several people have indicated that they don't want their art used for future projects, but it's difficult to keep track of each of these artists and what they want, especially when their Twitter names don't match the artist name in the e621 DB. I'm not really sure what the correct solution is. Ideally artists would be able to tag their posts with a license, in a way that would be easily queryable via the API. That would at least allow artists to opt out of having their work used for future datasets, assuming they add the licenses to their posts before the dataset is created. I'm not sure how feasible it would be to implement something like that in e621, though.

I'm eventually planning to progress to larger GAN models that can create full images, not just faces. Ideally, that would allow me to train the network on *all* images from e621 (not just the ones where I was able to successfully detect and crop a face), but I'd also expect a lot more backlash for using several orders of magnitude more art. But what's the alternative? I've already been contacted by around half a dozen people who said they've been working on similar projects, so it seems like the furry community is going to have to contend with AI-generated art sooner or later.

I personally think you are doing an incredible service to the community. When coming up with ideas for characters, I’m able to see a base AI created character to serve as a jumping off point. Being able to just sift through hundreds of pictures is amazing.

This is amazing. It shows off the power of GAN's, celebrates the breadth of furry art and it can provide inspiration for furry artists and fans. It's magical and holds more promise still. Thank you for your contribution!

As you expand the scope of this project I think the critical things to keep in mind, for everyone, are practical, technical and legal limitations. Please correct me if I get anything wrong. Practically speaking, you will not be able to keep track of Do Not Use My Art requests from hundreds of artists across different platforms. e621 has a pretty good database of artist aliases that maybe you could leverage to identify all art from artist A only using their Twitter name for example, but that's still a losing battle. And technically speaking, you cannot simply retroactively remove an artist's work from the training set after the GAN has been trained. Training the network takes a lot of processing I imagine. So a solution like an any time opt in DNUMA list, like the one e621 uses, is not perfect.

Legally speaking, this is fascinating new ground. TFDNE is almost certainly fair use. This becomes less clear if you either seek a profit through it or somehow destroy the furry art market. (Please note that this won't prevent people from abusing DMCA to have TFDNE removed from Google.) I don't think it counts as a derivative work though. Philosophically speaking, this seems to approximate what a human might do. Learn from thousands of images, pick up certain shapes and techniques here and there and produce new art as a synthesis of what its learned. But just as any art that looks too similar to an existing one is decried, plenty of images from TFDNE will be attacked as copying or overfitting. You should probably just ignore that, though I admit that if they were posted on e6 I would itch to find the single most appropriate parent post.

Morally speaking, the nice thing to do would be to allow artists to excuse themselves from the training set. I doubt they would ever ask a human artist to selectively not learn from their art, but they might feel justified in asking you to selectively not train your GAN from their art. Even if that's a difficult task as described above.

Sorry for the wall of text. And you probably already know all of that stuff. Here are some thoughts on your question more specifically:

One solution is as mentioned a centralized DNP list for TFDNE and your future projects. The onus would be on artists to supply all pseudonyms which they wish to make DNP. And you might have to verify that each request actually comes from that artist. This is probably the most straightforward solution, but it's still a lot of work. And it won't change the existing network unless you spend time/money retraining it after every DNP addition. And artists will need to put in requests with every new AI art project. It's not scalable. But this solution will placate most people for now.

Another solution is supply side rather than utilization side. Artists could attach permission information to their art upon posting. Some artists already do this. But there's no standard for you to interpret, different gallery websites have different capabilities anyway, and not to mention that since you're accessing art through e621 this information may get lost or modified. I don't think this is a viable solution without far reaching overhaul and standardization efforts.

On a related note, one concern that I as a long time fan of e6 would like to express to you is that because TFDNE uses data from e6, it may have the effect of increasing DNP requests sent to e6. That would decrease the amount of art in this archive and that makes me sad. How viable would it be for you to get images not from e6 but directly from each source? Until recently e6 allowed a maximum of 5 sources per post, so it should be feasible to determine which one was actually uploaded and scrape from there instead of e6. This would allow you to say that the art used by TFDNE comes from artist's galleries (you could even restrict it to the main sites like DA and FA and IB) and the metadata comes from e621. While the distinction between this suggestion and the current situation is basically one of semantics from a technical viewpoint, from a PR viewpoint this would greatly decrease any fire directed at e6 as an archive of reuploaded art. It would also make it more obvious, I think, that TFDNE isn't so different from a human artist browsing other artist's galleries and learning from them. But this is a fundamentally self serving request from someone who really likes e6. Thanks for hearing me out.

Updated

I'd just have the app say which posts they used. Maybe organized by artist and post number? And just have that accessible from the Git page. Plain text or an alphabetized spreadsheet would be okay enough storage formats I think. By having the list on a repository, folks can't accuse you of editing the logs because doing so would have shown a change on the working branch.

arfa said:
I'm the creator of this project. I've gotten a lot of flak for not asking artists for permission to use their art, but honestly it would have been impossible to try to coordinate contacting over 10k different artists to get permission, especially considering I had no social media presence before I started working on this. Neural networks need a lot of data in order to get good results, and if I'd been limited to only the images from artists who I was able to get into contact with and who explicitly gave permission, the results would likely have been much worse and no one would be talking about it in the first place. It also would have set the project back several months, and originally I only had access to the compute resources I used to train the neural network for a limited amount of time, so I wouldn't have been able to complete the project by the deadline if I had to spend a bunch of additional time trying to get in contact with artists.

Still, I do sympathize with the artists who have asked to be able to opt out of future machine learning projects. Several people have indicated that they don't want their art used for future projects, but it's difficult to keep track of each of these artists and what they want, especially when their Twitter names don't match the artist name in the e621 DB. I'm not really sure what the correct solution is. Ideally artists would be able to tag their posts with a license, in a way that would be easily queryable via the API. That would at least allow artists to opt out of having their work used for future datasets, assuming they add the licenses to their posts before the dataset is created. I'm not sure how feasible it would be to implement something like that in e621, though.

I'm eventually planning to progress to larger GAN models that can create full images, not just faces. Ideally, that would allow me to train the network on *all* images from e621 (not just the ones where I was able to successfully detect and crop a face), but I'd also expect a lot more backlash for using several orders of magnitude more art. But what's the alternative? I've already been contacted by around half a dozen people who said they've been working on similar projects, so it seems like the furry community is going to have to contend with AI-generated art sooner or later.

It would be great if you could make it more clear that you've done this without our direct contributions, and that you aren't affiliated with us. We've gotten quite a few complaints about your project despite having had no hand in it.

leomole said:
As you expand the scope of this project I think the critical things to keep in mind, for everyone, are practical, technical and legal limitations. Please correct me if I get anything wrong. Practically speaking, you will not be able to keep track of Do Not Use My Art requests from hundreds of artists across different platforms. e621 has a pretty good database of artist aliases that maybe you could leverage to identify all art from artist A only using their Twitter name for example, but that's still a losing battle. And technically speaking, you cannot simply retroactively remove an artist's work from the training set after the GAN has been trained. Training the network takes a lot of processing I imagine. So a solution like an any time opt in DNUMA list, like the one e621 uses, is not perfect.

Correct. The problem is, if I create a new dataset today and start training a neural network, and someone comes along tomorrow and says they want to opt out, it's too late. I'd have to start over. And training the neural network from scratch each time costs on the order of $15000 to $20000 (of compute credits donated from Google, but still, there's only so much I can get away with). And once I've trained a model and released it, it's out there in the wild.

leomole said:
Legally speaking, this is fascinating new ground. TFDNE is almost certainly fair use. This becomes less clear if you either seek a profit through it or somehow destroy the furry art market. (Please note that this won't prevent people from abusing DMCA to have TFDNE removed from Google.) I don't think it counts as a derivative work though. Philosophically speaking, this seems to approximate what a human might do. Learn from thousands of images, pick up certain shapes and techniques here and there and produce new art as a synthesis of what its learned. But just as any art that looks too similar to an existing one is decried, plenty of images from TFDNE will be attacked as copying or overfitting. You should probably just ignore that, though I admit that if they were posted on e6 I would itch to find the single most appropriate parent post.

Morally speaking, the nice thing to do would be to allow artists to excuse themselves from the training set. I doubt they would ever ask a human artist to selectively not learn from their art, but they might feel justified in asking you to selectively not train your GAN from their art. Even if that's a difficult task as described above.

Gizmodo did an interesting article on the legal aspects, if you haven't already seen it: https://gizmodo.com/the-internet-furry-drama-raising-big-questions-about-ar-1843412922
The article has a bit of forced neutrality going on, but I think the main takeaway is that I'm probably in the clear here unless Disney decides it's worth it to sue me, which is unlikely.

Also, the fact that they got a bunch of law professors to look at AI generated furry art and give their opinions about the IP implications is pretty hilarious.

leomole said:
One solution is as mentioned a centralized DNP list for TFDNE and your future projects. The onus would be on artists to supply all pseudonyms which they wish to make DNP. And you might have to verify that each request actually comes from that artist. This is probably the most straightforward solution, but it's still a lot of work. And it won't change the existing network unless you spend time/money retraining it after every DNP addition. And artists will need to put in requests with every new AI art project. It's not scalable. But this solution will placate most people for now.

The difficulty here is that I'm not the only person working on these types of projects. Like I said, I've already had about half a dozen people contact me saying they've been doing similar things. So if I were to do this, I'd possibly be kneecapping myself if a large number of artists opt out, and someone else would just come along and use the art anyway for their own project and say "Well, you only opted out for arfa. No one asked to opt out of my project."

leomole said:
Another solution is supply side rather than utilization side. Artists could attach permission information to their art upon posting. Some artists already do this. But there's no standard for you to interpret, different gallery websites have different capabilities anyway, and not to mention that since you're accessing art through e621 this information may get lost or modified. I don't think this is a viable solution without far reaching overhaul and standardization efforts.

IMO this would be ideal, and I think it'll eventually be what ends up happening as these types of machine learning projects become more common. At the very least, artists are starting to become aware of the fact that this technology now exists, and I'd argue it's up to the community as a whole to build and maintain tools that allow artists to have a say in how their work is used. I'm certainly not trying to be a jerk here or disrespect anyone's wishes, but having to maintain and keep track of this stuff myself is more work than I can take on right now.

leomole said:
On a related note, one concern that I as a long time fan of e6 would like to express to you is that because TFDNE uses data from e6, it may have the effect of increasing DNP requests sent to e6. That would decrease the amount of art in this archive and that makes me sad. How viable would it be for you to get images not from e6 but directly from each source? Until recently e6 allowed a maximum of 5 sources per post, so it should be feasible to determine which one was actually uploaded and scrape from there instead of e6. This would allow you to say that the art used by TFDNE comes from artist's galleries (you could even restrict it to the main sites like DA and FA and IB) and the metadata comes from e621. While the distinction between this suggestion and the current situation is basically one of semantics from a technical viewpoint, from a PR viewpoint this would greatly decrease any fire directed at e6 as an archive of reuploaded art. It would also make it more obvious, I think, that TFDNE isn't so different from a human artist browsing other artist's galleries and learning from them. But this is a fundamentally self serving request from someone who really likes e6. Thanks for hearing me out.

Yeah, I had this concern as well. I'm sure there are a lot of artists that are fine with their art being on e621 (and in fact, benefit from it) but who don't want their art used for projects like mine. I'd hate to see them boycott e621 entirely because there is no "middle ground" option where they can specify a license. For future projects, I'll definitely keep this in mind.

celadonsissy said:
I'd just have the app say which posts they used. Maybe organized by artist and post number? And just have that accessible from the Git page. Plain text or an alphabetized spreadsheet would be okay enough storage formats I think. By having the list on a repository, folks can't accuse you of editing the logs because doing so would have shown a change on the working branch.

I do list the artists in the dataset I've posted on Github: https://github.com/arfafax/E621-Face-Dataset/
Although the dataset used to train the final network is just a subset of that one.

notmenotyou said:
It would be great if you could make it more clear that you've done this without our direct contributions, and that you aren't affiliated with us. We've gotten quite a few complaints about your project despite having had no hand in it.

I'm planning on updating the site soon with an FAQ to clear up some misconceptions. I'll be sure to make that more clear. Hopefully it hasn't caused you guys too much trouble.

arfa said:
training the neural network from scratch each time costs on the order of $15000 to $20000 (of compute credits donated from Google

Wow okay, you would definitely need to know who wants to opt out before training takes place not after.

arfa said:
https://gizmodo.com/the-internet-furry-drama-raising-big-questions-about-ar-1843412922

Thanks, it looks like they covered the topic quite well already. I didn't realize someone had already tried to DMCA it!

arfa said:

leomole said:
One solution is as mentioned a centralized DNP list for TFDNE and your future projects.

if I were to do this, I'd possibly be kneecapping myself if a large number of artists opt out, and someone else would just come along and use the art anyway for their own project... having to maintain and keep track of this stuff myself is more work than I can take on right now.

You are the first notable success in this particular domain. But clearly that honor comes with some drawbacks, like people not understanding the project or trying to opt out of your projects specifically, as though this is a one off effort rather than the beginning of a wave. It seems to me that there are a few ways you could handle this. You can take a hardline stance and say that art anywhere on the internet is fair game for learning from, whether the learner is human or AI. You could explain to people that soliciting, verifying and implementing a DNUMA list is beyond your time and financial resources at this point. You could call for a community created database that for each artist lists their preference regarding the use of their art for learning, for creating memes, for recoloring, for archiving externally, for inspiration etc. I'm skeptical that could ever work tbh. But you could do all these things in Tweets or the FAQ and be done with it.

  • 1