Topic: How big would an offline tag and post database be?

Posted under e621 Tools and Applications

I'd like to relearn SQL via SQLite, and figured if it's only (on average) a handful of kilobytes per image, that would be a preferred dataset to play with. Plus I'd might be able to make something useful.

If the data itself could fit comfortably on a single layer DVD, would it practical for someone to host a somewhat up to date (more like up to month) version on Dropbox or something?

Updated

Well, using the Mobile App (You have to get the E926 version from the play store and download the E621 version from that) you can download stuff to the phone--If you use that alongside Bluestacks (A mobile emulator for the PC) you could check just how much space it takes up and figure out that way how large a storage device you would need.

Updated by anonymous

Back of envelope calculation.

Total number of posts 8e5 (https://e621.net/stats)
Total number of distinct tags 3e5 apparently (omg wtf)

Avg tags per post, well let's take 30 (load https://e621.net/post/random several times, count tags and average that for the lack of better ideas)

Avg tag length, something like 10? again, take average over several random posts.

Assuming 4 byte post and tag ids, this yields 8e5*30*8 ~ 180MB postid-tagid linkage data and 3e5*(10+4+2) ~ 5MB of tags. If you want to link md5 to tags, that would add 8e5*(4+16) ~ 15MB more.

There will be some db overhead atop of that, like indexes and such, but that should take less space than the data.

Mobile app would be a really really slow and painful way to obtain the data, e6 has proper API for that. It's going to be less efficient than on-disk storage, but still I would expect less than 1GB in requests and replies.

Updated by anonymous

hslugs said:
Back of envelope calculation.

Total number of posts 8e5 (https://e621.net/stats)
Total number of distinct tags 3e5 apparently (omg wtf)

Avg tags per post, well let's take 30 (load https://e621.net/post/random several times, count tags and average that for the lack of better ideas)

Avg tag length, something like 10? again, take average over several random posts.

Assuming 4 byte post and tag ids, this yields 8e5*30*8 ~ 180MB postid-tagid linkage data and 3e5*(10+4+2) ~ 5MB of tags. If you want to link md5 to tags, that would add 8e5*(4+16) ~ 15MB more.

There will be some db overhead atop of that, like indexes and such, but that should take less space than the data.

Mobile app would be a really really slow and painful way to obtain the data, e6 has proper API for that. It's going to be less efficient than on-disk storage, but still I would expect less than 1GB in requests and replies.

Would rather download raw MySQL tables (or whatever RDB they use) instead of wasting server CPU power on individual api queries. Surely there is a decent enough backup api for individual, complete tables?

Updated by anonymous

  • 1