Topic: Have we tried to DO anything about the booru bots?

Posted under General

I understand none of us are exactly happy about other image boorus autoscraping this site. Hell, there's a persistent misconception I keep seeing that we quote "let them" when I know that isn't the case.

What I'm wondering is, have we ever tried to take any kind of action or countermeasures against them in the past? Or is that something that just... isn't possible?

I'm not usually that bothered by it happening to my work, like, I get some feedback there that I'd never get here, but when I recently saw that a birthday gift of a friend's fursona, not marked with any copyright tags, still got scraped by a site allegedly focused on fan art... I don't know, that in particular just made me really uncomfortable. (I probably don't have my priorities straight.)

How would you create a system that targets a certain user downloading images without also affecting every other user in the crossfire?

strikerman said:
How would you create a system that targets a certain user downloading images without also affecting every other user in the crossfire?

Point taken. I was mostly just wondering if anything's been attempted in the past. Not that I expected it to be effective. (I assume they don't exactly respect formal takedown requests, do they.)

I guess if I was willing to sacrifice usability for security I'd suggest a captcha or something, but I'm not suggesting that because those do still tend to really irritate users (including myself, I HATE those things). I don't know. It's a tricky problem.

CAPTCHAs don't work since the site has an API that most everyone will program against instead. It's how tools like PostyBirb post art on the site, and how many of the archival scripts download stuff to people's machines. The site probably prefers automation goes through the APIs since they transfer less data than the website itself and can be optimized for the unique use cases that would come from said software.

alphamule

Privileged

lendrimujina said:
I understand none of us are exactly happy about other image boorus autoscraping this site. Hell, there's a persistent misconception I keep seeing that we quote "let them" when I know that isn't the case.

What I'm wondering is, have we ever tried to take any kind of action or countermeasures against them in the past? Or is that something that just... isn't possible?

I'm not usually that bothered by it happening to my work, like, I get some feedback there that I'd never get here, but when I recently saw that a birthday gift of a friend's fursona, not marked with any copyright tags, still got scraped by a site allegedly focused on fan art... I don't know, that in particular just made me really uncomfortable. (I probably don't have my priorities straight.)

Ugh, the tagging on some of the Rule 34 sites (and Pixiv) comes to mind. And yes, if it's on a Rule 34 Booru, it should by definition not be OCs.

Hmm, looked up rules at them, and XXX doesn't say, US has no rules I could find, Paheal has rule 18 (which forbids tagging OCs), and r34h doesn't have a working Wiki to check. This is actually kind of abyssmal. I stopped at 4 of them. Only 1 seems to outright say! :facepalm:

kyureki said:
CAPTCHAs don't work since the site has an API that most everyone will program against instead. It's how tools like PostyBirb post art on the site, and how many of the archival scripts download stuff to people's machines. The site probably prefers automation goes through the APIs since they transfer less data than the website itself and can be optimized for the unique use cases that would come from said software.

Yeah, this is kind of the whole point of why sites offered API access. Irony, people started scraping again when the APIs went subscription on... some sites. You don't actually 'scrape' anything if doing it right on e621. It's a clean JSON file that has good parsing support in a lot of languages.

TBH, it would be counterproductive if all the Boorus (including e621) went all Sankaku Complex/DeviantArt/ImgUr and tried to kneecap themselves.

Wait, with the APIs, are you saying there's a grain of truth to the misconceptions that we specifically allow it? I was under the impression that we were in direct conflict with those sites.

EDIT: Yes, it was R34XXX I was referring to. I wasn't sure whether or not it was against the rules to refer to them by name, so I played it safe by being vague.
(And I'm sorry if I'm a bit snappy, I've barely gotten any sleep.)

EDIT 2: Changed wording to be less aggressive-sounding, which is not what I wanted.

Updated

There's no realisitic way to both prevent "scraping" (it's not really scraping) and that wouldn't majorly disrupt API usage, which many people use for legitimate purposes. And then at the end of the day, the bots will always be able to get around any system we implement, so the only people we're hurting our the legitimate developers utilizing tools that are expected to exist.

Watsit

Privileged

lendrimujina said:
Wait, with the APIs, are you saying there's a grain of truth to the misconceptions that we specifically allow it? I was under the impression that we were in direct conflict with those sites.

The APIs are for allowing third-party applications to access posts and make requests to the site, separately from using the web interface. It also allows accessing the site using a user's account, using private keys to identify the user (so things done through the API are treated as being done by that user). Stuff like PostyBirb and re621 use the APIs to post new images, or get posts, do searches, and other such functionality for the user. It's not specifically for bots to scrape posts, just an alternate interface that doesn't rely on a web browser accessing individual web pages.

lendrimujina said:
Wait, with the APIs, are you saying there's a grain of truth to the misconceptions that we specifically allow it? I was under the impression that we were in direct conflict with those sites.

Eh, you can find a grain of truth in anything. It's equivalent to saying that selling kitchen knives is condoning stabbings.

snpthecat said:
Eh, you can find a grain of truth in anything. It's equivalent to saying that selling kitchen knives is condoning stabbings.

That's a good way to put it, I get it now.

alphamule

Privileged

lendrimujina said:
Wait, with the APIs, are you saying there's a grain of truth to the misconceptions that we specifically allow it? I was under the impression that we were in direct conflict with those sites.

EDIT: Yes, it was R34XXX I was referring to. I wasn't sure whether or not it was against the rules to refer to them by name, so I played it safe by being vague.
(And I'm sorry if I'm a bit snappy, I've barely gotten any sleep.)

EDIT 2: Changed wording to be less aggressive-sounding, which is not what I wanted.

Same, I wasn't going to directly link to them here. Already had a forum comment hidden because mentioning link to certain site, so wasn't gonna push my luck, you know? ;)

At least you now have a list of the ones to issue takedowns on.

I didn't see the unedited reply, but it's cool!

snpthecat said:
Eh, you can find a grain of truth in anything. It's equivalent to saying that selling kitchen knives is condoning stabbings.

I wonder what the analogy to a rate limit is...

personally i think it's funny that other imageboards try to rip content submitted to e621 and then get a bunch of inferior versions that suck, or miss out on a lot of very important tag updates.

lafcadio said:
personally i think it's funny that other imageboards try to rip content submitted to e621 and then get a bunch of inferior versions that suck, or miss out on a lot of very important tag updates.

often imitated, never duplicated: e621

#

alphamule

Privileged

dba_afish said:
often imitated, never duplicated: e621

#

LOL, even better when the damn database with all the new tags is posted as a single CSV and archived on frigging Wayback at least every few months!

Truth be told, it was never about the image files. People come here for the tagging system/ease of search. Pray this site never goes the way of Google Search, or worse, Yahoo/Wix/Twitter.

  • 1