Forum - e6collector.py: A very bare-bones CLI tag downloader for Python 3 (updated for 2020)

user 2929

Member

This is a bare-bones Python script I'm using to mirror my favourites. You might not find this useful or even usable. This is what it can do:

Work with just Python 3.6 (no extra modules needed!) or higher on Linux (other systems probably are fine as well)
Download images by tag using the API. You need to make sure that your tag list is URL encoded already if you want multiple ones.
Save images as ID-SOME-TAGS.EXT to a directory
Save all sources and tags associated with the image to tags.csv in the destination folder (it does not update those though, at least ATM). Great for grep-ing!
Don't redownload images because the tags changed, it strictly goes by ID.
Optionally log in with user name and API key (required to be able to see some images)
Spam your terminal
Fail hard if e621 is slow/down

Usage example: python3 e6collector.py /home/me/pics/e621/ "fav:myname" "myusername" "mYAp1k3y1234"

I might actually work on this a little more now, had planned to add a less verbose mode at least from the beginning but never did. Oh, and putting sources in the CSV file works again as well, still in the same, janky format.

Download version 2.1: https://pastebin.com/pf7snA9k

Have fun I guess!

Updated almost 5 years ago

user 2929

Member

about 10 years ago

Windows unicode issues fixed, download link updated.

Updated by anonymous about 10 years ago

user 2929

Member

over 8 years ago

Finally updated this thing to work with the not-recent-at-all API changes. It might blow up though, so be warned.

Updated by anonymous over 8 years ago

Chessax

Privileged

over 8 years ago

stealthmode said:
Fail hard if e621 is slow/down

Sound like fun :P

Also this is gonna sound really nerdy, but where are all the goody CLI flags? Any good CLI-tool needs those.

Important note: I'm not going to use this tool, so don't change anything for my sake, just found it interesting and gave me an idea or two for my own tool(s)!

Updated by anonymous over 8 years ago

user 2929

Member

over 8 years ago

Chessax said:
Also this is gonna sound really nerdy, but where are all the goody CLI flags? Any good CLI-tool needs those.

I honestly don't know what I would add, except maybe verbose/quiet mode. It is meant to be bare-bones, after all.

Updated by anonymous over 8 years ago

Maxpizzle

Member

over 8 years ago

stealthmode said:
I honestly don't know what I would add, except maybe verbose/quiet mode. It is meant to be bare-bones, after all.

--help, -h        - Display this help message and also register you for psychiatric help
--quiet, -q       - Omit useful error messages; all other output is retained
--verbose, -v     - Leave comments describing, in haunting detail, exactly how much the image turns you on
--no-nsfw, -n     - Do not download anything at all
--contribute, -c  - Complain of missing tags but do not add them yourself
--desperate, -d   - Also search FurAffinity

Updated by anonymous over 8 years ago

savageorange

Member

over 8 years ago

It would be nice to generate a metalink file instead of the script doing the image downloading itself. That way you could import it into a download manager like DownThemAll, and easily rate-limit and cope with downtime.

Spam your terminal

Learn to use '\r', and output right-padded strings.
Your code looks simple enough that you should be able to use something like this:

_termwidth = None
def say(message, _width = None):
    global _termwidth
    if not _termwidth:
        import shutil
        _termwidth = shutil.get_terminal_size()[0]
    fmt = '%-' + str(_termwidth - 1) + 's'
    padded = fmt % message
    print(padded, end='\r')

Which will show only the latest message, rewriting the line each time a new message needs to be shown.
You could also alter it to dedicate part of the line to message, part of the line to progress indicator, whenever you implement progress.

(disclaimer: not tested on Windows. Used many many times for different scripts on Linux)

updating the tag list

If you mean the CSV rather than the filenames,
I suggest you look at TMSU -- if you want, all tagging can be managed by simply shelling out to it (updates can be done just by two shell-outs : 1. removing all tags from the file, 2. tagging again with the new set of tags)

Updated by anonymous over 8 years ago

user 2929

Member

over 8 years ago

Maxpizzle said:

--help, -h        - Display this help message and also register you for psychiatric help
--quiet, -q       - Omit useful error messages; all other output is retained
--verbose, -v     - Leave comments describing, in haunting detail, exactly how much the image turns you on
--no-nsfw, -n     - Do not download anything at all
--contribute, -c  - Complain of missing tags but do not add them yourself
--desperate, -d   - Also search FurAffinity

This post put me in verbose mode.

savageorange said:
It would be nice to generate a metalink file instead of the script doing the image downloading itself. That way you could import it into a download manager like DownThemAll, and easily rate-limit and cope with downtime.

That actually sounds like an useful feature, I'll think about it.

savageorange said:
Which will show only the latest message, rewriting the line each time a new message needs to be shown.
You could also alter it to dedicate part of the line to message, part of the line to progress indicator, whenever you implement progress.

Nah, I'd rather just skip messages for already existing images to reduce the spamming.

savageorange said:
If you mean the CSV rather than the filenames,
I suggest you look at TMSU -- if you want, all tagging can be managed by simply shelling out to it (updates can be done just by two shell-outs : 1. removing all tags from the file, 2. tagging again with the new set of tags)

Interesting project, I'll look at it. Won't put it in as a dependency, though.

Updated by anonymous over 8 years ago

TheLuggage

Member

over 7 years ago

hi i took this and made a quieter, faster, parallel downloadin', rate-limited, error handling version: https://gist.github.com/anonymous/f9936e74cedca08368561e3e6d505b91

$ ./e6collector.py --help
usage: e6collector.py [-h] [--jobs JOBS] [--verbose] [--quiet]
                      destination tags [tags ...]

Download files by tag from e621

positional arguments:
  destination           Directory to store the files in
  tags                  Tags to look for. Try "fav:yourname"

optional arguments:
  -h, --help            show this help message and exit
  --jobs JOBS, -j JOBS  Downloads to run in parallel
  --verbose, -v
  --quiet, -q

Updated by anonymous over 7 years ago

KiraNoot

Member

over 7 years ago

TheLuggage said:
hi i took this and made a quieter, parallel downloadin' version: https://gist.github.com/anonymous/8956030a367323d673943868bba3c076

$ ./e6collector.py -h
usage: e6collector.py [-h] [--jobs JOBS] [--verbose]
                      destination tags [tags ...]

Download files by tag from e621

positional arguments:
  destination           Directory to store the files in
  tags                  Tags to look for. Try "fav:yourname"

optional arguments:
  -h, --help            show this help message and exit
  --jobs JOBS, -j JOBS  Downloads to run in parallel
  --verbose, -v

Please don't use a while True: with no rate limits and no error handling to fetch posts. You're almost guaranteed to get the tool blocked doing that. Infinite loops and HTTP requests are bad, mmkay!

Some ideas:
The maximum number of requests you make for more posts can never be more than the maximum post id returned on the first request divided by the number of posts requested, add one.
The before_id should change on every request, if it is not changing, something is wrong.
Test for response codes other than 200 and delay a few seconds, if you continue to get non 200 more than 5 times, abort, because something is dreadfully wrong. A wrapper class around the requests would make this fairly trivial to implement.

Updated by anonymous over 7 years ago

TheLuggage

Member

over 7 years ago

Really good points. Thank you. I've added request rate limiting, and HTTP error handling with retry and exponential backoff; see updated link.

Updated by anonymous over 7 years ago

KiraNoot

Member

over 7 years ago

TheLuggage said:
Really good points. Thank you. I've added request rate limiting, and HTTP error handling with retry and exponential backoff; see updated link.

It should be noted that the urllib library does not throw an exception on non 200 response statuses from the server. The request may be successful(you get a response), but the server may have rejected it for rate limiting or other error reasons. Exceptions are primarily limited to protocol violations and connection errors.

https://docs.python.org/3/library/http.client.html#http.client.HTTPResponse.status should be checked in this case for the value 200.

Updated by anonymous over 7 years ago

TheLuggage

Member

over 7 years ago

KiraNoot said:
It should be noted that the urllib library does not throw an exception on non 200 response statuses from the server.

urlopen

uses the globally installed `OpenerRedirector`. The default global OpenerRedirector has a HTTPErrorProcessor step that raises `HTTPError` on non-200 responses. I also tested that it raises these against https://httpbin.org/status/ just in case: https://gist.github.com/anonymous/5c3fbd0cee301973f9c26002dc4854da

edit: new ver with much faster checks if a post is already downloaded or tagged, a quiet mode, and stats: https://gist.github.com/anonymous/82f1512434d66d68e9d5cfa9fd6933c7

Updated by anonymous over 7 years ago

KiraNoot

Member

over 7 years ago

TheLuggage said:
urlopen uses the globally installed `OpenerRedirector`. The default global OpenerRedirector has a HTTPErrorProcessor step that raises `HTTPError` on non-200 responses. I also tested that it raises these against https://httpbin.org/status/ just in case: https://gist.github.com/anonymous/5c3fbd0cee301973f9c26002dc4854da
edit: new ver with much faster checks if a post is already downloaded or tagged, a quiet mode, and stats: https://gist.github.com/anonymous/f9936e74cedca08368561e3e6d505b91

Two thumbs up. I learned something. I'm way too used to using requests.

Updated by anonymous over 7 years ago

user 2929

Member

almost 5 years ago

Currently updating this, stay tuned all two of you!

user 2929

Member

almost 5 years ago

Version 2.0 is now available under https://pastebin.com/pBNAtJ2D

It should be stable enough. Should. Also updated the first post a bit.

user 2929

Member

almost 5 years ago

Looks like logging in will be required to view some images (like https://e621.net/posts/2172703 ) from now on and if you aren't you still get most of the metadata, but no URL. The script currently can't handle this, fix upcoming.

user 2929

Member

almost 5 years ago

stealthmode said:
Looks like logging in will be required to view some images (like https://e621.net/posts/2172703 ) from now on and if you aren't you still get most of the metadata, but no URL. The script currently can't handle this, fix upcoming.

Version that handles empty URLs and logging in: https://pastebin.com/pf7snA9k

Piipperi

Member

over 2 years ago

Not really sure if this whole script works anymore, but I just get SSL errors on whatever I try to download.

News - Dec 01, 2024 (about 1 month ago) Click to show.

Topic: e6collector.py: A very bare-bones CLI tag downloader for Python 3 (updated for 2020)

user 2929

user 2929

user 2929

Chessax

user 2929

Maxpizzle

savageorange

user 2929

TheLuggage

KiraNoot

TheLuggage

KiraNoot

TheLuggage

KiraNoot

user 2929

user 2929

user 2929

user 2929

Piipperi

Login to respond »

Over 18?