Topic: API behind Cloudflare

Posted under e621 Tools and Applications

Is there a way to use the API now that e621 is behind Cloudflare? Currently it just returns a Cloudflare bot check page.

Also having the same problem.

I tried adding the Cloudflare cookie from my browser to all requests, but it doesn't work when using a descriptive user agent as per the API documentation.

The Wolf's Stash has been getting around this by having users solve the captcha in its own little window, then returning to the app. I feel like that kind of implementation might be the only way to do it for now. Would love to be proven wrong, though.

If you change your browser's user agent to that of your app for a single page load, then it's Cloudflare cookie will work in the app. The cookie is paired with the user agent string. There's a few other ways to get around it but that seems the simplest.

This makes it much more complicated to use the API, could we get some guidance on working around this? It also seems to not have been communicated ahead of time?

mrox said:
If you change your browser's user agent to that of your app for a single page load, then it's Cloudflare cookie will work in the app. The cookie is paired with the user agent string. There's a few other ways to get around it but that seems the simplest.

Tried that, but didn't seem to work for me, but I'm using a python script API, not that mobile app.

mrox said:
If you change your browser's user agent to that of your app for a single page load, then it's Cloudflare cookie will work in the app. The cookie is paired with the user agent string. There's a few other ways to get around it but that seems the simplest.

Thanks! It's working, for now..

Tip for anyone else: the browser you use for grabbing the cookie has to use the same IP address as the one you will make API calls from, I had a little trouble with that too.

mrox said:
If you change your browser's user agent to that of your app for a single page load, then it's Cloudflare cookie will work in the app. The cookie is paired with the user agent string. There's a few other ways to get around it but that seems the simplest.

That is... dumb. I mean the part where the User Agent string matters. It should be tied to IP address range/ISP? XD

Mentioning, like with some other sites, CF totally blocks Verizon mobile users, even on hotspots, hard. Even the VPS I'm currently on is allowed in one refresh. I wish you could whitelist currently logged-in users. :(

Kinda related, but it seems RSS/Atom feeds are broken now too, very likely because of Cloudflare. My reader reports status 301 (i.e. redirect) and '503 Service Temporarily Unavailable'.

Updated

alphamule said:
That is... dumb. I mean the part where the User Agent string matters. It should be tied to IP address range/ISP? XD

Things that matter for CF are: cf_clearance, session (both are cookies) IP and user-agent. If you copy them all, it will work. If any of them changes, you'll be rejected.

Man, I'm really starting to despise ClownFlare. They must've done something horrid to their service lately, because this isn't the only site that's had APIs murdered by it within the last month.

ganbat said:
Man, I'm really starting to despise ClownFlare. They must've done something horrid to their service lately, because this isn't the only site that's had APIs murdered by it within the last month.

Not really CF's fault. It's not supposed to protect APIs (like, at all) because CF is protecting against bots. While APIs are, by definition, used by bots (software automating things so you don't have to).
Just blanket misuse of CF protection settings for the whole website.

ayokeito said:
Not really CF's fault. It's not supposed to protect APIs (like, at all) because CF is protecting against bots. While APIs are, by definition, used by bots (software automating things so you don't have to).
Just blanket misuse of CF protection settings for the whole website.

Hmm, I've been researching stuff like OAuth and wonder if the bots would be better served with something like both ends having their own private keys. Unlike SSL/TLS where only one end has a certificate. I've used SSH without a password, and it sort of works that way. You need a key to access it, but server only pins, doesn't offer it's own matching key. However, you can always authenticate AFTER starting a SSH session (just run a daemon/service behind some port you routed that you access with TLS, for example).

Anything new on the matter ? Cloudflare is blocking my service too, even when I try to download the database export for local use :/ I guess forwarding the captcha to the user could be a workaround, but I must admit I don't really know how to proceed, I'm not sure how it all works. If someone has resources on the subject feel free to share.

werewolf92 said:
Anything new on the matter ? Cloudflare is blocking my service too, even when I try to download the database export for local use :/ I guess forwarding the captcha to the user could be a workaround, but I must admit I don't really know how to proceed, I'm not sure how it all works. If someone has resources on the subject feel free to share.

It seems like a more recent problem. See this https://e621.net/forum_topics/38076

Can I just say I hate cloudflares monopolization of web security. Stupid security that works when it feels like it...

maria_kauffman said:
Can I just say I hate cloudflares monopolization of web security. Stupid security that works when it feels like it...

Imagine if Oracle ran it. *shudders*

RIP, my downloader tools don't appear to work anymore because of the change. Downloading images in a pool one-at-a-time isn't the most fun thing to do when I had a nice command-line downloader that could do it for me.

aorpheat said:
RIP, my downloader tools don't appear to work anymore because of the change. Downloading images in a pool one-at-a-time isn't the most fun thing to do when I had a nice command-line downloader that could do it for me.

It sucks, it really truly sucks, to have to recommend opposite advice from API documentation. You basically have to clone the user agent, cookies, et al of user's Chrome/Firefox/Safari session. This defeats the entire point of the API and rules for using it! :(

I think CF has a setting to fix this, but not sure how you whitelist clients (still getting most of the protection, because you can just block abused signatures). Fingerprinting is a real dog's lunch when direction seems to be spoofing fingerprints as browsers evolve, anyways.

"command-line downloader" You'll love JS injection, then. Be the next to get your entire house blocked. ;)

Are there any solutions for API-GET? I tried to do everything with my application and every time I get a response
"Attempt to decode JSON with unexpected mimetype: text/html; charset=utf-8"
I just collected some art according to my favorite tags and sent it to telegram...
Fixing headers won't help?
Error 403, is there any way to bypass this in python?

furkamore said:
Are there any solutions for API-GET? I tried to do everything with my application and every time I get a response
"Attempt to decode JSON with unexpected mimetype: text/html; charset=utf-8"
I just collected some art according to my favorite tags and sent it to telegram...
Fixing headers won't help?
Error 403, is there any way to bypass this in python?

Not really; this is an e6 problem, it has nothing to do with the client or what programming language the client's making requests in. The best solution I think anyone could really give atm would be to abandon the API, mimic a browser well enough to get through, then scrape what you want the old-fashioned way.

auwnl said:
Not really; this is an e6 problem, it has nothing to do with the client or what programming language the client's making requests in. The best solution I think anyone could really give atm would be to abandon the API, mimic a browser well enough to get through, then scrape what you want the old-fashioned way.

thanks for the answer!
I understand correctly that the site owners will not fix this problem and there is no point in waiting?
I'm not that good at programming and I wrote a bot for myself for a very long time until I understand how you can mimic a browser...

auwnl said:
Not really; this is an e6 problem, it has nothing to do with the client or what programming language the client's making requests in. The best solution I think anyone could really give atm would be to abandon the API, mimic a browser well enough to get through, then scrape what you want the old-fashioned way.

>Using chrome on an android
>I am an idiot savant
>Explain how to fix this slowly like Scooter from Borderlands 2 please?

furkamore said:
thanks for the answer!
I understand correctly that the site owners will not fix this problem and there is no point in waiting?
I'm not that good at programming and I wrote a bot for myself for a very long time until I understand how you can mimic a browser...

very long reply

I don't know much about CloudFlare, nor e6's implementation of it, so I can't say for sure whether they *can* change it or not. Some other discussions have made it seem like it's possible, and I'd imagine the staff are working on it, but it may be more difficult and nuanced than one may think.

Describing the entirety of how to mimic a browser is rather complicated, and I personally wouldn't dare using Python for it. Python, despite a staggering amount of people's attempts, is still a scripting language, and this is starting to encroach on program territory. Especially since, as a newbie, I'm assuming you're doing this on Windows, which makes bodging even harder. But... I'll give it a shot anyway.

Basically, you want to request the actual webpages that browsers use, not make API calls. There's a program called wget that you should be able to install and call via the subprocess module . It should have options to give it a specific user-agent to use; I'd recommend using whatever the latest version of Chrome or Firefox uses. You should be able to send cookies with wget as well. For help on all of this, I'd recommend searching tutorials on the command prompt (Windows) or BASH (MacOSX/Linux), on wget, and on Python's subprocess module.

- Pick your browser, go through cloudflare's checks, and copy that browser's user-agent and cloudflare cookies to give to wget.
- You'll probably have to do this daily, since your public IP changes every 24 hours unless you have some kind of deal with your ISP (if you're unsure, then you don't. And I'm not talking about port-forwarding).
- Then, on your browser, make the searches you want; the ones you normally use the API for.
- Copy the URLs of the search pages; that's what you want wget to request
- Put all of this into your script, making one subprocess call to wget per search
- You're gonna get some HTML documents, hopefully of the search results pages that your browser normally deals with.
- It's a good idea to keep the last pages you requested along with the current ones, so you can compare them. If the pages are the same, then obviously there's been no new posts in that search.
- Assuming you already know how to parse/search through strings and basic HTML knowledge, you're gonna have to find the "anchor" elements (the ones that start with "<a") that correspond to the posts you want.
- I'd imagine they'd be the ones with an href attribute starting with "https://e621.net/posts/" followed by a number and some POST data. Or maybe just a number and some POST data, idk, I'm on my phone rn.
- Get those href values and feed them into wget like before. That should give you the post pages. Now, to make it easier, I'd look for the anchor element who's innerHTML says "Download". Its href value would be the direct link to the image (something starting with "https://static1.e621.net/data"). Again, wget to get the image

I'll leave getting comments and stuff up to you. If you have further questions, I can try to help, but a forum or subreddit for programming and/or scrapers would be better.

maria_kauffman said:
>Using chrome on an android
>I am an idiot savant
>Explain how to fix this slowly like Scooter from Borderlands 2 please?

If Chrome on Android isn't getting passed Cloudflare, try Firefox; that's what's working for me. Above is me talking about programming stuff, which I'm assuming is not what you're looking for. If it is, then... well, hopefully it's not too complex. I'm not very good at simple explanations.

auwnl said:
Not really; this is an e6 problem, it has nothing to do with the client or what programming language the client's making requests in. The best solution I think anyone could really give atm would be to abandon the API, mimic a browser well enough to get through, then scrape what you want the old-fashioned way.

This is what happened with Twitter API. XD

auwnl said:
very long reply

That is way too much work. Just keep using the API. You mimic the browser by copying the cf_clearance cookie, the user-agent string, and use the home page as the referrer (not sure if required or not). Per site rules you also need to add "?_client=<tools_original_user-agent>" to all the urls. Replace "<tools_original_user-agent>" with the user-agent string that the tool was previously using. Use a "&" instead of "?" if there's other parameters in the url, such as when you do a search. The cookie expires after 24 hours so you'll need to update your tool daily. Remember to turn off the mimicking when API is fixed.

furkamore said:
I understand correctly that the site owners will not fix this problem and there is no point in waiting?

Incorrect, it's happened before and the admins fixed it within 2 days. Same this time, it's been fixed already. It'll probably happened again in the future too.

  • 1