Topic: [Userscript and webextension] e621 downloader and scrapper

Posted under e621 Tools and Applications

E621 downloader

Firefox addon link: Firefox ๐ŸฆŠ
Download link: Github

Source code:Yusifx1/e621-userscript

You can download webextension to chrome via debug extension mode.

What this script can do:

  • Scrape whole page posts link (it despite from page posts count and don't download anything).
  • Scrape all page of tags/posts

* There is limit function that you can select count of pages that you want download ( 1 page is 320 posts. Its for all function except page scrapper. 1 page with 320 posts about 200 Kbytes).

  • Download pool/set to folder with renaming (It just download links from scrapper)
  • Ignores global-blacklist
  • Scrape pool/set links fully.

* deleted post finder (not work for tampermonkey. Security reason)
* You need logged in to Furaffinity to get nsfw content
* It also find direct link in sources list (.png,.gif,webm and etc)
* You tired downloading full pool with your slow internet? You can limit posts count of pool that you want download. If input fields is empty it will download all posts of pool. First input field: start downloading/scrapping from # numbered post ( 2 for from second). Second input field: download/scrape posts from start to # numbered post. You can use both inputs same time to download interval of posts.

  • Gallery mode.

*You can change post count of page in settings.
*There is play button that start slideshow. You can change mode in settings. Instead using one post for per page use fullscreen mode for better experience

For scrappers
* Copy links to clipboard.
* Save links to txt file.
* Show links in text field with scroll

To-do

  • change design of script (how you want reply)
  • add more deleted file finder.

Known issue that not critical

Compatibility issues:
If you use tampermonkey and your browser not download to folder, give download permission to tampermonkey.

Naming issues:
When scrapping main page of posts and save it to txt there will not be any name. You can open this file with any text editor (I use notepad of phone) or change name with .txt or other text extension. Will fix some day.

Chrome scroll behaviour:
When slideshow mode is focus on image it won't be smooth scrollung on chrome

If you have suggestion, question or whatever reply. Finded issue? Then report it in my github page.
Ps. Sorry if my inglish some where call eye pain.

Updated

What to do? (Write what you want I will add it if I can.)

Change script style

Change settings menu style.
Change button style:
What color you want
How button will look
How this button will function (apper/disapper, hide when used something like this)

Add function

What it would do? Like:
Add download function for sets (in progress)
Favorite downloader

Small things that I want change but don't know how

I don't know how it would be called why i called it just only e621. Please write your variants.

My english not so good with termins and I don't have imagination. Please help changing buttons (input field description to) name.
It is scrapper or scraper or ripper or link grabber or what else. I don't know how should I call this extension/userscript.

If I still not added function that promised send it to github issue (wait about 4 week. I am not lazy, I am busy.)

Updated

As someone who spent way too much time adding mass-downloader functionality into my own script (re621), I feel like I should offer a bit of feedback.

The way you are requesting data from the API is pretty poor. You are sending a query for every single post ID in the pool without properly rate-limiting it. The API limits you to sending a request once very 500ms, exceeding it means will result in error 503. Thus, you should add a delay between requests to prevent that. However, this would mean that fetching data will for a large pool will take a while.
Thus, Instead of requesting data from /posts/12345.json, you should request it from /posts.json?tags=id:12345,67890,etc. Note that this format is limited to 100 IDs per request, so you'll have to split the requests into batches for bigger pools.

Also, you are not setting the user-agent, which is something that you absolutely must do.

Why I should add User-Agent if it work nice without it? In bash you must add this because e621 dont let you use it, but in browser its already have User-Agent. And also with /posts.json?tags=id:12345,67890 method it will not be with order. I tried it in my bash script. Also if I need to get all pool url link I can use posts.json?tags=pool:"id of the pool"&limit=320

Updated

tysh1 said: Why I should add User-Agent if it work nice without it?

To quote from the API article:

A non-empty User-Agent header is required for all requests. Please pick a descriptive User-Agent for your project. You are encouraged to include your e621 username so that you may be contacted if your project causes problems. DO NOT impersonate a browser user agent, as this will get you blocked. An example user-agent would be MyProject/1.0 (by username on e621)

You should set a user-agent because that's what the admins are asking you to do. With javascript, the process is a little different, though.

If you are using a javascript based method for requests, such as creating a userscript, a browser extension, or are otherwise unable to set a custom header from inside a browser, please attach an additional url query parameter named _client and set it based on how you would have set your user-agent.

tysh1 said: And also with /posts.json?tags=id:12345,67890 method it will not be with order. I tried it in my bash script.

No, it wouldn't. But you already have the correct order that you got from the pool's post_ids. You can iterate over that to get the download links in the correct order.

bitwolfy said:

a request 500ms
Also, you are not setting the user-agent, which is something that you absolutely must do.

Okay I will add request limit and user-agent.
And posts.json?tags=pool:"id of the pool"&limit=320 better way than id

tysh1 said: And posts.json?tags=pool:"id of the pool"&limit=320 better way than id

That's true. I just prefer to search by post ID because that way, you know exactly how many requests you are going to make. When searching by pool ID, you'd have to keep checking if you need to load more pages of results after every request.

bitwolfy said:
That's true. I just prefer to search by post ID because that way, you know exactly how many requests you are going to make. When searching by pool ID, you'd have to keep checking if you need to load more pages of results after every request.

In my bash script I used "if post count/page = 320 then continue ". This method you send less request if you scrape big pool. You can get posts number(via post_ids) and then get how many page it would be.

Updated

tysh1 said: This method you send less request if you scrape big pool.

Yeah, that's a good point. I switched to that method, resulting in 4 requests instead of 10 when used with a 972-post pool.
Considering that it takes 194 seconds to process and download that pool, an improvement of 3 seconds isn't a whole lot, but I'll take it.

I have slow internet(400 Kbyte if I lucky) + bad connection(when I download serials speed slow down to zero for minute) + phone with firefox that have many scripts and addons. You are sure about that? In bash(termux) yea I can say it not econome time,but my browser slowdown with all request. For why I removed "scrape all page" for this version to improve it. And also I tried your script, well done, but my browser not work well with it. I don't know why it not start script most of time(shutting down all script not help, restarting to and yes you can say that i use only phone)

Updated

My connection is also quite poor at times, but downloading images does not slow down the browser at all. All the code that handles downloads is asynchronous.
re621 is absolutely not optimized for mobile, so I'm not surprised it does not work for you.

Are you actually doing programming on mobile too?.. If so, that's kinda amazing.

Yep. What is here amazing? I use text editor and test it with tampermonkey for js (I can't use tampermonkey like editor because my code disappear and special characters delete code) and termux for bash. I am not programmer or script lover, i started it because new API ruined site and bash scripts that i used to download pools.

New version.
Made webextension version.
Added settings menu.
Deleted file finder.(you should logged in to Fa and not work in tampermonkey)
Ignores global-blacklist.
And more...
(Will upload userscript version as soon as possible)

Version 1.1
Added scrape function to sets
Fixed gallery when sample file not found
Fixed gallery when used one image for page

New version
Fixed scraping of sets
Added slideshow mode

tysh1 said:
Version 1.1
Added scrape function to sets
Fixed gallery when sample file not found
Fixed gallery when used one image for page

New version
Fixed scraping of sets
Added slideshow mode

I have no idea how to run this or what to do, man is coding confusing... Add me on discord to help me about this because im rarely on here and i just wanted to bulk download some things... (Discord is: Prof. Goose#4142)

  • 1