Topic: How BIG is e621? (Personal archival project)

Posted under General

I know that over 7 years ago in 2014, someone asked this exact question, the site then was almost at 600,000 posts and at around 360+GB or so large. Since then, more file types have been implemented, such as WEBM and APNG/AJPG. Much larger file sizes are also now possible to post here. Now the site has amassed more than 2 million posts, how much more storage space does the site's contents take up? The reason I ask this is because I would like to archive vast swaths of the site, posts that are not on my blacklist mainly. I'm a bit of an archivist myself so I'd like to keep things that after many years may be deleted. Many artists tend to fall down that path and some of my favourite work is now gone, so in order to archive anything, I need to know how much space I need in order to keep a site of this size's media.

I imagine it's a few Terabytes of data. I think the Admins should have easier access to this information, but be prepared to need a rather huge hard drive or two.

earlopain said:
https://e621.net/stats has what you're looking for. That includes deleted posts as well I think.

Wow, the stats page is back from the dead. It wasn't working a month or two ago.

jojo935 said:
I know that over 7 years ago in 2014, someone asked this exact question, the site then was almost at 600,000 posts and at around 360+GB or so large. Since then, more file types have been implemented, such as WEBM and APNG/AJPG. Much larger file sizes are also now possible to post here. Now the site has amassed more than 2 million posts, how much more storage space does the site's contents take up? The reason I ask this is because I would like to archive vast swaths of the site, posts that are not on my blacklist mainly. I'm a bit of an archivist myself so I'd like to keep things that after many years may be deleted. Many artists tend to fall down that path and some of my favourite work is now gone, so in order to archive anything, I need to know how much space I need in order to keep a site of this size's media.

The file sizes on the stats page are rounded, but if we take it to be 1 MB exact average file size, it is easy to calculate.

See topic #22373

About 132,000 posts have been added since Jan. 1. That's about 1375 posts per day, an increase from last year. Probably more than 1.375 GB per day if file sizes are larger than the early days of the site. The reason I made that feature request is to make these questions easier to answer.

We are at about 2.7 or 3 terabytes now. If we increase to an average of 1500 ppd at 1 MB each, we could reach 8 terabytes around December 18, 2030. Double the average file size to 2 MB going forward at an average of 2000 ppd, and we get to 10 terabytes around April 8, 2026. Realistically, the 8 TB drive will probably hold all of e621 until at least 2028, and you can get it for about $130. More if you do RAID.

Storing a local copy of e621 is not that expensive unless you want to use SSDs.

Updated

I tried doing something like that way back in 2013-14 or so and was told/chastised by the mods that you should absolutely not use any kinda of automated crawler to grab image files from the site for any reason whatsoever.

So are you planning on hand downloading all the files individually? or have things changed since then?

Lol I still have a chunk of what I had grabbed back then before being told to cut it out sitting on an old hdd around here somewhere...

zypher0s said:
I tried doing something like that way back in 2013-14 or so and was told/chastised by the mods that you should absolutely not use any kinda of automated crawler to grab image files from the site for any reason whatsoever.

So are you planning on hand downloading all the files individually? or have things changed since then?

Lol I still have a chunk of what I had grabbed back then before being told to cut it out sitting on an old hdd around here somewhere...

I think it might be allowed. Ask KiraNoot or NotMeNotYou.

lance_armstrong said:
I think it might be allowed. Ask KiraNoot or NotMeNotYou.

Oh I'm not planning on trying it again. Just wanted to make sure everything was kosher.
At the time back then I think e6 was under some serious ddos fire, and everyone had hackles up.

Honestly my internet was slow and the crawler I was using only pulled/searched 2 images at a time so I'm pretty sure it wasn't taxing the site like someone with some fiber grabbing 50 images at once would. But when I asked a question about it in the forum they told me not to do that.
So I stopped.
Def not set up for archiving atm, and I've got other stuff to do now days.

If it is allowed though then cool! I'm all for archiving, and am sure others are on the case.

I'm still archiving the media content. I have a 5 second pause between downloads and run the task about once a day to give the posts a chance to have their tags updated and any illegal posts deleted. My archive doesn't recheck posts, so I don't have any tags changes after the first day. When I wrote the script, waiting around 8 seconds would often cause the connection to be reset so I think 4-5 seconds is a good middle ground. When grabbing everything you can just go by ID and not bother the server with running searches.

Someone released an e621 torrent a couple years ago. If you're going to start archiving then download that first and only scrape the newest content. There's a few other e621 rips out there too. I think the most recent one was a metadata dump.

Originally I was going to make an AI tagger, but I've moved on to more important things so haven't gotten around to it yet.

jojo935 said:
I know that over 7 years ago in 2014, someone asked this exact question, the site then was almost at 600,000 posts and at around 360+GB or so large. Since then, more file types have been implemented, such as WEBM and APNG/AJPG. Much larger file sizes are also now possible to post here. Now the site has amassed more than 2 million posts, how much more storage space does the site's contents take up? The reason I ask this is because I would like to archive vast swaths of the site, posts that are not on my blacklist mainly. I'm a bit of an archivist myself so I'd like to keep things that after many years may be deleted. Many artists tend to fall down that path and some of my favourite work is now gone, so in order to archive anything, I need to know how much space I need in order to keep a site of this size's media.

About 60TB as of 03/24/2021

dripen_arn said:
1. why are you responding to a nearly 1-1/2 year old thread?
2. atleast you're not spreading false information

LOL 60TB sounds more like Tenboro's site. It's better than creating a brand new one, I guess. I'm morbidly curious how much it would cost in bandwidth bill to seed/sync a set of torrents for each year.

  • 1