Topic: How to get the md5 of a deleted post?

Posted under General

The API doesn't return the MD5 hash of deleted posts.
Example of a deleted post:
https://e621.net/post/show.xml?id=1360652
Example of an active post:
https://e621.net/post/show.xml?id=367001

When a file is downloaded from e621, the name defaults to the MD5 hash of the file (for more info see https://e621.net/forum/show/243902 ).
Considering the wave of deletion regarding https://e621.net/forum/show/244542 (and just in general) I thought it might be useful to be able to tell whether I have a local copy of a deleted post. This would be especially useful for finding holes in pools (for example https://e621.net/pool/show/3482 ).
The easiest way of getting the MD5 hash of a post would be the API, but unfortunately it doesn't offer that data.

I know that the MD5 is still stored in the database because searching for a deleted post's MD5 still returns that post: https://e621.net/post/index/1/status:any%20md5:f3c1a9a124b98fef83dddf2921329885
So I'd say that the API should return the MD5 hash too.

Also: Why are deleted posts only returned by /post/index/ and /post/index.html, but not by /post/index.xml and /post/index.json?
Example: https://e621.net/post/index.xml?tags=status:any%20md5:f3c1a9a124b98fef83dddf2921329885
This is documented in https://e621.net/help/show/api#posts_list but it strikes me as weird because now there is no way get the ID of a deleted post found in a search.

Updated by Fifteen

I know it used to be a thing when that first got added to the REST API by TonyLemur (I was actually planning to use this at some point to help sorting through the stuff I've archived over the years via MD5 lookup, possibly upload the stuff that's missing), but I did notice how that's been disabled at least a couple of months ago, and I couldn't find a reason for that.

I'd be interested to know what the reason for that feature removal was.

Updated by anonymous

Having the md5 available would allow anyone to view and download posts that have been deleted.

Updated by anonymous

We don't really want to allow users to get the md5 of posts because it will allow people to get the md5 then do a google search for it, which has a chance of bringing it up.

Ijerk said:
Having the md5 available would allow anyone to view and download posts that have been deleted.

Not anymore. This issue was patched long ago, and was changed so that even if someone has the md5, they can't view it on our servers.

For example, here is one of the test images: 2d93d813ddbc39271d3e2c327dfa44bb
You can navigate to it by going to: https://e621.net/post/show?md5=2d93d813ddbc39271d3e2c327dfa44bb
but you cannot view it even if you have the md5: https://static1.e621.net/data/2d/93/2d93d813ddbc39271d3e2c327dfa44bb3.png

Updated by anonymous

Chaser
Not anymore. This issue was patched long ago, and was changed so that even if someone has the md5, they can't view it on our servers.

Does 'long ago' mean within the last six months or so? Thought I remembered bookmarking a FFD post around that time and downloading it later after my daytime bandwidth throttling was lifted (was some large flash thing).

Updated by anonymous

Ijerk said:
Does 'long ago' mean within the last six months or so? Thought I remembered bookmarking a FFD post around that time and downloading it later after my daytime bandwidth throttling was lifted (was some large flash thing).

It was likely cached on your end.

Updated by anonymous

Either way, does that mean the checksum can be added back to the API if it no longer allows people to view deleted posts?

Updated by anonymous

Fifteen said:
Either way, does that mean the checksum can be added back to the API if it no longer allows people to view deleted posts?

Chaser said:
We don't really want to allow users to get the md5 of posts because it will allow people to get the md5 then do a google search for it, which has a chance of bringing it up.

Updated by anonymous

Chaser said:
We don't really want to allow users to get the md5 of posts because it will allow people to get the md5 then do a google search for it, which has a chance of bringing it up.

Whoops, missed that part. I guess it's understandable.

OP's use case only targets MD5 lookup, though, not getting the checksum of a previously deleted file. So long as you can search a file by md5 sum using the API the same way you can with a normal search, so that the API can still be used for reverse file search even on deleted files, that should be enough.

Updated by anonymous

Ijerk said:
Having the md5 available would allow anyone to view and download posts that have been deleted.

Chaser said:
Not anymore.

Isn't wasn't that a bit hypocritical? Saying that you (not "you" as a person but "you" meaning "you on behalf of e621") don't want people to get their hands on files which have been deleted (particularly takedowns) but then not deleting the files from the server.

In any case: Not making the MD5 accessible because that file might be somewhere else on the internet seems unreasonable to me. It doesn't change the fact that files are potentially searchable by MD5 via Google. And it removes a useful feature from the API. The fact that people might use that feature to possibly find a file which was deleted (where the reason might have been that an artist took it down) is neither your fault nor your problem. I mean... nothing is stopping me from writing a program which looks up the MD5 of posts as they are uploaded to e621 and stores them, along with the ID, somewhere locally. (It could even download the whole file, making this discussion pointless, but for the sake of argument let's just ignore that.) And you shouldn't feel obligated to take actions against that because weighing the pros of preventing a tiny fraction of people from doing something marginally questionable, against the cons of heavily chastising the website in order to actually be effective at preventing that, it's obviously a big minus.

If e621 wants to go the route of being "100% politically correct" for the sake of artists (the new DNP rule goes to show that too) then I can respect that decision (and who am I to tell you how to manage your website), but it has always annoyed me that perfectly good art gets deleted for bogus reasons. I have always thought about writing a program which, as hinted above, immediately downloads all files so I have a backup when posts gets deleted, but because hard drives are were expensive, because "it wasn't that bad" and because I am lazy I didn't do it yet. Reevaluating the situation I'm considering doing it now.

Fifteen said:
OP's use case only targets MD5 lookup, though, not getting the checksum of a previously deleted file.

Nope, that's exactly my use case. Pools like the one linked at the very top have holes in them and I would like to know whether I have those files stored locally.
I could take every file I have stored locally, put the filename in a search like status:deleted md5:<filename> until I find a result with the ID of the hole I am looking for, but that would be insane.

Updated by anonymous

I also (for some reason just now) realized that saying you don't show the MD5 of deleted posts because those deleted posts could be found by googling the MD5 doesn't make any sense because the source links are kept available too (not via XML and JSON for some reason, but definitely via HTML).

Updated by anonymous

That'd be for specific cases like pay content where there is no source other than the pay site or piracy site(which I think get removed, I could be wrong?).

Updated by anonymous

Chaser said:
That'd be for specific cases like pay content where there is no source other than the pay site or piracy site(which I think get removed, I could be wrong?).

These have to still be manually removed, I do that, but can't say for others.

Either way, as MD5 should be unique to the file alone, neither I can think of legimitate reason for users having this information and if they do want to compare if there's MD5 on site, they can use MD5:<MD5> status:any

Updated by anonymous

Mario69 said:
These have to still be manually removed, I do that, but can't say for others.

Either way, as MD5 should be unique to the file alone, neither I can think of legimitate reason for users having this information and if they do want to compare if there's MD5 on site, they can use MD5:<MD5> status:any

Well, file lookup and classification is a very valid reason to have access to MD5, and not having it be queryable for deleted posts via the API just makes it arbitrarily harder to do that for large collections of images.

Updated by anonymous

As Chaser just pointed out to me, you can apparently still do md5 lookup using the API, just not MD5 searches. For instance, this link works fine :

https://e621.net/post/show.xml?md5=f3c1a9a124b98fef83dddf2921329885

So it's still possible to get tag lookups from the API, it's just that the search API doesn't work well with deleted posts, which should probably be filed as a bug.

Regarding @Calimero000 poll listing issue, I guess the logical solution would be to add an API feature in /pool/show.xml to allow listing deleted posts, and possibly make a cross-search with the locally stored files (though you'd still need to look them all up on the post API first to get their IDs).

Updated by anonymous

I'm still waiting on an answer for the last question in the first comment, by the way:

Why are deleted posts only returned by /post/index/ and /post/index.html, but not by /post/index.xml and /post/index.json?

--

Mario69 said:
These have to still be manually removed, I do that, but can't say for others.

You do realize e621 has a history on everything? You are just adding another unnecessary step to go through.

Either way, as MD5 should be unique to the file alone

Not sure what you mean there.

neither I can think of legimitate reason for users having this information

See very first comment of this thread.

and if they do want to compare if there's MD5 on site, they can use MD5:<MD5> status:any

See bottom of https://e621.net/forum/show/244885 about why this is a bad workaround.
--

Fifteen said:
not having it be queryable for deleted posts via the API just makes it arbitrarily harder to do that for large collections of images.

I agree. It's not that we can't do it, it's that it's unnecessary effort... both for us to program and for e621's server to process the requests. And it takes orders of magnitude more time to get results.

Updated by anonymous

  • 1