Topic: Does up to date downloadable version of E621 exist?

Posted under General

Being lets just say... slightly paranoidal and concerned about this site's existence, I come here to ask if there is an "official" downloadable copy of it that is up to date and recent.

Yes, total size is 8TB+ but if we exclude webm, gif and flash, total size should be less. Can also exclude comments, forum and other non-essential parts to keep size down (yes, 8TB is just media).
Images can be divided in parts, archived and then set up for download. Exact setup is for admins to decide, since presumably internal structure is dairly well figured out and optimized, just eh... ≈8TB+ of art (yes art and not just "furry porn stuff", because some examples are that good) are rather important 8TB. That's about 67 modern CoD games. Not big deal to download that. Hosting isn't an issue either.

Updated

i don't know of any easy way or tools to download all of the site's media. if your an experienced programmer or know someone who is and know how to use the api you could make something to download and keep a archive of the site but if you just want to keep a small subset of media you can use the re621 userscript so you can easily download everything in your favorites. link:re621.app

list

Member

I'm also interested but only with tags and preferably description.

I've seen dumps in the past but they didn't have the tags which leaves a few million files and no way to find anything because it's all named like 735a017aa1d3cc44f2dc0028e1570ee6.jpg

list said:
I'm also interested but only with tags and preferably description.

I've seen dumps in the past but they didn't have the tags which leaves a few million files and no way to find anything because it's all named like 735a017aa1d3cc44f2dc0028e1570ee6.jpg

https://e621.net/db_export/

justkhajiit said:
Being lets just say... slightly paranoidal and concerned about this site's existence, I come here to ask if there is an "official" downloadable copy of it that is up to date and recent.

Disclaimer: I don't work for or volunteer for e621.

I don't think this exists, at least for the images and other media. Like Donovan_DMC pointed out, e621 makes regular dumps of the tags and other metadata available, but you can't get the media files from that link. The API can give you the media files, if you're interested.

Keep in mind that a number of home internet connections, at least in the US, have data caps of roughly 1.0 to 1.2 TB a month - you can download more than that, but there starts to be a fairly steep additional charge. Other areas of the world might not have this concern.

8.7 TB is still enough, in 2024, that it's probably still faster in many cases to mail e621 a hard drive or SSD, have them plug it into the server, copy the images to it, unplug it, and then mail it back to you. I have no idea if the site management would agree to such a request, but in theory, that's probably the fastest way to do it.

Andrew S. Tanenbaum said:
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.

Yes, total size is 8TB+ but if we exclude webm, gif and flash, total size should be less.

The average file size, including those types, is still only about 1.8 megabytes. WebM, GIF, and Flash files only make up about 4% of the files on the site (by file count), so excluding them might not be very much of a gain.

kora_viridian said:
I don't think this exists, at least for the images and other media. Like Donovan_DMC pointed out, e621 makes regular dumps of the tags and other metadata available, but you can't get the media files from that link. The API can give you the media files, if you're interested.

While it cannot give you the files you absolutely can get the url to them from the dump, just take the md5 and the extension then plug it into this
https://static1.e621.net/data/{md5[0..1]}/{md5[2..3]}/{md5}.{ext}

kora_viridian said:
Keep in mind that a number of home internet connections, at least in the US, have data caps of roughly 1.0 to 1.2 TB a month - you can download more than that, but there starts to be a fairly steep additional charge. Other areas of the world might not have this concern.

I've heard a bunch of people mention datacaps on wifi but none of the wifi networks I've ever had between now and when I was a kid have had a limit

kora_viridian said:
The average file size, including those types, is still only about 1.8 megabytes. WebM, GIF, and Flash files only make up about 4% of the files on the site (by file count), so excluding them might not be very much of a gain.

Going by the exports
WEBM is 1,284 GiB
GIF is 274 GiB
SWF is 52 GiB
PNG is 5,694 GiB
JPG is 1,629 GiB

donovan_dmc said:
I've heard a bunch of people mention datacaps on wifi but none of the wifi networks I've ever had between now and when I was a kid have had a limit

Comcast (Xfinity) has one, at least in some of their markets. That's where the 1.0 or 1.2 TB number I gave comes from. They have about 32 million broadband subscribers in the USA, as of fall 2023. Spectrum doesn't have data caps, and they have about 31 million subscribers. I think the next biggest single company is Altice (Suddenlink/Optimum), with about 4.5 million subscribers, and I don't think they have data caps.

With Comcast, if you go over the cap, it's $10 per 50 GB, up to a max extra charge of $100. Downloading all the media from e621 within one month would hit that $100 extra charge. They send you an email when you hit some fraction of the cap (don't remember if it's 50%, 75%, or what), and I've gotten that email maybe once in several years. I've never been charged extra for going past the cap.

I don't know, but I would guess the cap is there as a hangover of earlier fears about people torrenting like crazy, or a way to keep people from running high-volume web servers at home, or some combination of those.

My understanding is that data caps on home broadband are relatively uncommon outside of the USA.

My understanding is that data caps on home broadband are relatively uncommon outside of the USA.

Fairly hard to hit them, if there are any at all. 7.3TB of image data is a lot, several days non-stop if connection is not throttled, spaced out it's gonna be several weeks in batches of say 100GB per run.

Pre-filtering images using tag data and account's set blocklist could help too. At least we'd get smaller archive size and less need to filter out later.

Basically combine tag and filename/md5 dumps, filter by tags, create links, download, save in folder categorising by some common tags like artist, gender and some traits. Think I could make an app for the first part, but so sure about the second (actual downloading).

Updated

kora_viridian said:
Comcast (Xfinity) has one, at least in some of their markets. That's where the 1.0 or 1.2 TB number I gave comes from. They have about 32 million broadband subscribers in the USA, as of fall 2023. Spectrum doesn't have data caps, and they have about 31 million subscribers. I think the next biggest single company is Altice (Suddenlink/Optimum), with about 4.5 million subscribers, and I don't think they have data caps.

With Comcast, if you go over the cap, it's $10 per 50 GB, up to a max extra charge of $100. Downloading all the media from e621 within one month would hit that $100 extra charge. They send you an email when you hit some fraction of the cap (don't remember if it's 50%, 75%, or what), and I've gotten that email maybe once in several years. I've never been charged extra for going past the cap.

I don't know, but I would guess the cap is there as a hangover of earlier fears about people torrenting like crazy, or a way to keep people from running high-volume web servers at home, or some combination of those.

My understanding is that data caps on home broadband are relatively uncommon outside of the USA.

Comcast has no limit if you pay a small fee. I have that fee paid each month because the speeds reached in my area, with that uncapped limit, are nearly 1gigabit per second, and I need that to get to that speed. They're a good ISP for this reason, at least as far as areas they have worked on are concerned.

Edit: Also, just a note, one thing that's at least thankfully safe is likely art of any kind, because it falls under the 1st amendment under expression. And the repubs thankfully get horny for defending amendments to make the american people happy with them.

list said:
Thank you! I had no idea that was a thing. Is that recent or has that been available the whole time and I just never noticed?

That has existed for years

"Pre-filtering images using tag data and account's set blocklist could help too. At least we'd get smaller archive size and less need to filter out later.

Basically combine tag and filename/md5 dumps..."

Nope, won't work. Tag dump is data about tags and not what post has that or other tags. Need to find another way.

I don't have enough mental [insert here] to go through 5M+ files to categorise and/or filter them. Unless...

Donovan, does every image post have assigned author tag? And is there a dump of author-only tags and/or mention of tag type in thar tag dump that is not just its group in numerical format (that means idk what)?

One more question is about search query: currently it returns some amount of pages (idk how many exacrly), are these pages containing everything (and I mean literally everyting) that fits the search criteria or something is cut off? Any way to expand search query to say... 40 terms? 50 or 100 as edge case perhaps?

I've got *an idea* as you can see.

donovan_dmc said:
That has existed for years

Hopefully you don't mind if I tag you like that? Idk how actual tagging system works here or if E621 forum even has one. Got questions for you :P

P.S. Yes, those dumps did indeed exist for at least I think 3 years already. Stats get collected for 6+ years (I wasn't always an account-having visitor as you see).

SCTH

Member

justkhajiit said:
Nope, won't work. Tag dump is data about tags and not what post has that or other tags. Need to find another way.

The post database export, as has been mentioned, contains all the metadata for each post. That includes tags. That means, as long as you can program, you can search for literally any combination of tags.

The tag database also has category.

scth said:
The post database export, as has been mentioned, contains all the metadata for each post. That includes tags.

Then lets hope my PC can handle that file... at least it's not a dump of all subreddits currently in existence. First and only time so far I've seen csv file *that* big.

Also makes process much, much easier and hopefully automatable... a worthy challenge.

Tag database category is literally a number, not much explanation either. Edit: need to think about this one... doesn't seem that useful if post data dump has post tags.

Updated

kora_viridian said:
My understanding is that data caps on home broadband are relatively uncommon outside of the USA.

On Russia it's trully unlimited on wired provider. Not true for cellular network. But you can't access that site without using... Let's say alternative ways. I downloaded such huge base several years ago in case of zombie apocalypse, but really you barely need even 1% from that huge base and your base will be outdated after some time

justkhajiit said:
Donovan, does every image post have assigned author tag? And is there a dump of author-only tags and/or mention of tag type in thar tag dump that is not just its group in numerical format (that means idk what)?

I'm not Donovan, but I can answer that.
There are 22 thousand posts without an artist tag, unfortunately: arttags:0.

And here is the list of tag categories: https://github.com/e621ng/e621ng/blob/10b9389c23a14195a4d1dec64595382ff7ffb215/app/logical/tag_category.rb#L25-L34
To get the full list of artist tags, you would just need to filter the export to find tags with the category 1.

justkhajiit said:
One more question is about search query: currently it returns some amount of pages (idk how many exacrly), are these pages containing everything (and I mean literally everyting) that fits the search criteria or something is cut off? Any way to expand search query to say... 40 terms? 50 or 100 as edge case perhaps?

You can add limit=320 to the URL parameters on just about any index page to change the number of results shown per page.
For example: https://e621.net/tags?search[category]=1&limit=320

You can also change the number of posts per page in your settings, but that only affects the post search results.

justkhajiit said:
Also makes process much, much easier and hopefully automatable... a worthy challenge.

To answer the original question: no, this kind of thing does not exist, and is unlikely to exist in the future.
The demand for it is very marginal, while the effort and resource usage required do not seem to make it worthwhile.

And on a side note – I would not recommend downloading hundreds of gigabytes worth of images from the site daily.

cinder said:
There are 22 thousand posts without an artist tag, unfortunately: arttags:0.

To get the full list of artist tags, you would just need to filter the export to find tags with the category 1.

Thanks

cinder said:
You can add limit=320 to the URL parameters on just about any index page to change the number of results shown per page.

Thanks but not quite that, what I meant was actual query. IIRC search here is limited to 6 tags, which is, admittedly, leading to some fairly specific search results when used correctly.

cinder said:
And on a side note – I would not recommend downloading hundreds of gigabytes worth of images from the site daily.

*best 10th Doctor impression* Oh, not daily, no... a hundred here, two more there, then pause for a bit. :)

I understand why you said that and resource usage concerns, etc, etc.

justkhajiit said:

Thanks but not quite that, what I meant was actual query. IIRC search here is limited to 6 tags, which is, admittedly, leading to some fairly specific search results when used correctly.

The search query can have up to 40 terms, it only being able to handle 6 is outdated information

For example:
mammal anthro hi_res male genitals penis balls erection tail bodily_fluids fur digital_media_(artwork) nude sex penetration hair butt cum open_mouth penile male_penetrating cum_inside duo tongue

Updated

  • 1