Topic: How big is e621?

Posted under General

I was just wondering how big is e621 entirely?
Like how many PB of storage does it have? How many files in general?
Is there a way to look it up or does anyone know?

7

point 69

Terrabytes

that's unbelievably small

and that's including unapproved posts (but excluding "destroyed" posts, which i'm assuming is mostly stuff that would be actually dangerous to keep around)

dripen_arn said:
7

point 69

Terrabytes

that's unbelievably small

and that's including unapproved posts (but excluding "destroyed" posts, which i'm assuming is mostly stuff that would be actually dangerous to keep around)

manitka said:
I'm honestly surprised that it's not even 10 TB

but with your help, we can change that. find an artist who does nothing but 3d animations and bulk post all of their stuff.

strikerman said:
but with your help, we can change that. find an artist who does nothing but 3d animations and bulk post all of their stuff.

Lol, true. I will contribute just my meager art hoard for now :P

dripen_arn said:
(but excluding "destroyed" posts, which i'm assuming is mostly stuff that would be actually dangerous to keep around)

Destroyed posts are erased in their entirety, the only thing kept around is their MD5 to report to the feds, so in a way I guess they are "excluded", they flat out don't exist to be counted

dripen_arn said:
that's unbelievably small

manitka said:
I'm honestly surprised that it's not even 10 TB

A huge portion of posts are downscaled 1280px JPGs from Fur Affinity (the increased filesize limit is recent in the grand scheme of how long e621 has been around) or highly-compressed Twitter images, so it's not too big a surprise. The average 1280px FA image seems to be ~100kb, so 3 million of those (300 GB) could fit on a single USB drive you can get in the modern day for under $50. That's kinda crazy.

According to this random web archive capture I clicked on there was only 4TB of posts, so in the past 2 years we've nearly doubled what was uploaded in the previous 15 years to that. The number of posts only raised by 46%, so the average filesize has increased by quite a lot.

I'm curious what the total file size is excluding webm, but I'm too lazy to download and parse the db_export right now.

faucet said:
A huge portion of posts are downscaled 1280px JPGs from Fur Affinity (the increased filesize limit is recent in the grand scheme of how long e621 has been around) or highly-compressed Twitter images, so it's not too big a surprise. The average 1280px FA image seems to be ~100kb, so 3 million of those (300 GB) could fit on a single USB drive you can get in the modern day for under $50. That's kinda crazy.

According to this random web archive capture I clicked on there was only 4TB of posts, so in the past 2 years we've nearly doubled what was uploaded in the previous 15 years to that. The number of posts only raised by 46%, so the average filesize has increased by quite a lot.

I'm curious what the total file size is excluding webm, but I'm too lazy to download and parse the db_export right now.

true. I tried to upload a png that was like 160 mb here once and it didn't work. idk what the file size limit on here is anyways actually lol

manitka said:
true. I tried to upload a png that was like 160 mb here once and it didn't work. idk what the file size limit on here is anyways actually lol

100MB for everything except gifs, which are 20MB. 15000x15000 max dimensions.

tarrgon said:
100MB for everything except gifs, which are 20MB. 15000x15000 max dimensions.

e621:supported_filetypes
https://github.com/e621ng/e621ng/blob/e3fdc5d61be6ee6a6c2d5e3d13d60265bff3093e/config/danbooru_default_config.rb#L348-L363

APNG is 20MiB max

It's a small distinction, but the sizes are MiB (powers of 2), not MB (powers of 10)
JPG is 100MiB (104,857,600 bytes)
PNG is 100MiB (104,857,600 bytes)
APNG is 20MiB (20,971,520 bytes)
GIF is 20MiB (20,971,520 bytes)
WEBM is 100MiB (104,857,600 bytes)
SWF was 100MiB (104,857,600 bytes)

(despite the fact that the method is .megabytes, it returns mebibytes - see rails/rails#33130)

faucet said:
I'm curious what the total file size is excluding webm, but I'm too lazy to download and parse the db_export right now.

I did a quick and dirty script and ran it on the posts export from early January, 2024, because I had it handy. At that time, posts of type webm totaled up to 10,673,643,821,130 bytes, or about 1,017,917 MiB, or about 994 GiB. There were 72,593 posts of type webm, out of 4,505,469 total posts - webm posts were about 1.6% of all posts.

manitka said:
I'm honestly surprised that it's not even 10 TB

7.7 TB isn't even that huge. I was expecting much more. That would mean I could actually hoard all of e621 on my home NAS. Crazy.

mythicalbanana said:
7.7 TB isn't even that huge. I was expecting much more. That would mean I could actually hoard all of e621 on my home NAS. Crazy.

good, mythicalbanana, good for you...

dew it

dripen_arn said:
good, mythicalbanana, good for you...

dew it

I mean there is this program, that can scrape according to tags, artists or pools... xD

for only roughly 4 million posts, it's not farfetched as furaffinity's 50 million posts or deviantart's 1 billion posts.
and hell, 7 quintillion tumblr posts, and almost 2 sextillion twitter posts. just put that to the perspective of how extremely tiny e6 still is.

Updated

snake-girl said:
for only roughly 4 million posts, it's not farfetched as furaffinity's 50 million posts or deviantart's 1 billion posts.
and hell, 7 quintillion tumblr posts, and almost 2 sextillion twitter posts. just put that to the perspective of how extremely tiny e6 still is.

I mean only a small fraction of art is actually uploaded to E621 for example I uploaded stuff earlier and the artist had a lot of art that could be on this site but only had 3 artworks on E621, it also doesn't help that sometimes stuff that should be approved isn't and sometimes things fall through the cracks and gets never approved but I'm glad so far that E621 hasn't been bloated to hell by shit AI art like R34 and Paheal because that would probably cause Mods to lose their mind having to deal with that much shit lol

snake-girl said:
for only roughly 4 million posts, it's not farfetched as furaffinity's 50 million posts or deviantart's 1 billion posts.
and hell, 7 quintillion tumblr posts, and almost 2 sextillion twitter posts. just put that to the perspective of how extremely tiny e6 still is.

You're taking snowflake (timestamp) based IDs and assuming they're sequential
They aren't
Twitter has nowhere close to 2 Sextillion posts, according to this twitter gets about 200 Billion tweets per year
If twitter got 200 Billion tweets every year since inception (2006, which it obviously hasn't), that'd be ~3.6 Trillion posts
For reference:
2,000,000,000,000,000,000,000
3,600,000,000,000
That first number is 555,555,555x greater

Same goes for tumblr, according to this tumblr had 171.5 Billion posts total in 2019
For reference:
7,000,000,000,000,000,000
171,500,000,000
That first number is 40,800,000x greater

Deviantart feels high, but it's also too low to be a timestamp based ID, so that's probably legit

donovan_dmc said:
You're taking snowflake (timestamp) based IDs and assuming they're sequential
They aren't
Twitter has nowhere close to 2 Sextillion posts, according to this twitter gets about 200 Billion tweets per year
If twitter got 200 Billion tweets every year since inception (2006, which it obviously hasn't), that'd be ~3.6 Trillion posts
For reference:
2,000,000,000,000,000,000,000
3,600,000,000,000
That first number is 555,555,555x greater

Same goes for tumblr, according to this tumblr had 171.5 Billion posts total in 2019
For reference:
7,000,000,000,000,000,000
171,500,000,000
That first number is 40,800,000x greater

Deviantart feels high, but it's also too low to be a timestamp based ID, so that's probably legit

Tumblr is a great example of why Archiving is so important, imagine how many amazing artworks got lost to time after Tumblr committed Tumblr and banned NSFW

snake-girl said:
for only roughly 4 million posts, it's not farfetched as furaffinity's 50 million posts or deviantart's 1 billion posts.

Even sites like FA have a lot of deleted posts. Artists post WIPs and delete them later, and so on.

The best "independent" estimate I know of for things like this is at Fluffle's status page, https://fluffle.xyz/status/ . At the moment, it thinks FA has about 32.5 million posts, while FA's latest submission number is about 56.5 million. In other words, FA only has about 57.5% of the posts you think it would, if you just looked at the submission number. Similarly, Fluffle has Weasyl at about 1.94 million posts, while the latest submission number is about 2.38 million, which is 81.5%.

Fluffle also has an estimate of 25.5 million posts on Twitter, but Fluffle's developer said it works (worked?) by following a whitelist of users, not by searching all of Twitter for furry art. The developer also later said it had stopped indexing Twitter, due to the well-known difficulties there. I don't know what the current status is.

kora_viridian said:
Even sites like FA have a lot of deleted posts. Artists post WIPs and delete them later, and so on.

The best "independent" estimate I know of for things like this is at Fluffle's status page, https://fluffle.xyz/status/ . At the moment, it thinks FA has about 32.5 million posts, while FA's latest submission number is about 56.5 million. In other words, FA only has about 57.5% of the posts you think it would, if you just looked at the submission number. Similarly, Fluffle has Weasyl at about 1.94 million posts, while the latest submission number is about 2.38 million, which is 81.5%.

I imagine their bot isn't re-scraping to check for files that are later deleted after being scraped, so the figure is likely even lower than that again. It's kinda crazy that almost 50% of all FA posts are actually deleted. It honestly wouldn't surprise me though with all the WIPs, streaming notifications, auction reminders, et cetera.

I remember quite a few years back FA didn't have any rate limiting between submissions. They probably thought people's bandwidth speed would be limiting enough considering the size of an average image, but neglected to consider that you can have a valid PNG file with as little as 67 bytes. I think that only accounts for some tens of thousands of missing posts, but it's still interesting.

  • 1