Topic: Is it possible to download content with metadata as if to re-construct the database offline?

Posted under General

Hello, I do not like the complete chaos of one folder full of unsorted media.

I was wondering if there was any way to have an "offline collection" of E621 content. Meaning, the content keeps its tags and I can search through my content as though I was using the website normally. If such a thing did exist, then you could download, say in bulk, an entire artists catalogue off of E621 for later offline use.

I'm not sure if this is a "feature request" as this could be a community thing. It could also be an "e621 tool" so idk. Feel free to link to another thread if such an idea was already covered, and/or move my thread to the correct subforum. Thank you.

There's also the daily database exports for posts ordered by id (the download URL can is based on the md5 hash), pools, tags, implications, aliases, and wiki pages. I think the only things that aren't included are descriptions and sets.

I haven't used it yet, but gallery-dl is a command line tool that has an option to write all of an image's tags to a text file.

Hydrus Network is probably the much more convenient than either of those, but this info is useful if you'd like to write your own programs or scripts.

hsauq said:
There's also the daily database exports for posts ordered by id (the download URL can is based on the md5 hash), pools, tags, implications, aliases, and wiki pages. I think the only things that aren't included are descriptions and sets.

Descriptions are included in the posts file. Sets aren't publically available, because they can be made private.

One notable missing export is replacements history, which I really wish there was a better way to get one's hands on.

wat8548 said:
Descriptions are included in the posts file. Sets aren't publically available, because they can be made private.

I somehow forgot post and pool descriptions were included.

Considering how often various parts of the e621 are updated, how the current exports run the risk of including illicit sources for DNP content, and how it's currently possible to retrieve post data for a set via the API, the privacy flag seems like a strange reason to not include them.

It's unfortunate. I'll admit, I don't view them very often, but sets can represent ideas tags won't (ever) be useful for.

wat8548 said:
One notable missing export is replacements history, which I really wish there was a better way to get one's hands on.

There should probably be an "is_replaced"/"is_replacement", "replacement_id"/"replaced_id", or "flag_reason" (potential values: "inferior", "takedown", "dnp", "human_only") category in the posts. The latter could be combined with "is_deleted" on a parent id. An export of flag reasons would also helpful.

Strangely, there's a "Replacements" page under a post's "History" section, but this doesn't ever appear to be filled out.

I just realized there's no export of notes for a post, though I'm not sure how useful that'd be without coordinates or the original text for translations.

Updated

hsauq said:
I didn't know the content of a post ever got replaced directly. Usually the content is removed and a child post is linked to, which is what I checked.

Yeah, replacements are a different system. They've been "in beta" for forever and are showing no signs of being rolled out more widely. Currently only a select few users can request them, the rest have to go through the old reupload/flag routine.

The thing that makes replacements annoying is that the MD5s of replaced files are still maintained in the database for duplicate tracking, but do not show up anywhere in the posts export (which only reflects the most recent version of a file).

  • 1