Topic: why are same-res same-size duplicates sometimes not detected by MD5 hashes?

Posted under General

I generally scroll a series/character tag before uploading to make sure it's not there but do sometimes miss stuff because of difficulty I have focusing so kinda rely on the MD5 to occasionally correct me when I overlook something.

I get that there are different MD5 sums which happen when people edit an image, or stuff like when Pixiv automatically downsizes it to a lower resolution...

But I get confused about why when the image isn't changed and the resolution and file size appear to be the same why the MD5 didn't detect it and reject the upload.

If it did that then we wouldn't have to worry about having a "deleted file" count against us forever :(

For example:
Nov 25 https://e621.net/posts/2503246
Nov 27 https://e621.net/posts/2504961

I'm not sure what the mental hiccough is where I didn't notice it was there, but the same res and file size is displayed for both:
2427x3000 (2.12 MB)

Anyone have an idea what might've differed between the file I saved from the 8chan.moe /delicious/ thread where it was delivered to my request?
as opposed to the upload saved from Baraag.net ?

Something about one site or the other might somehow be tweaking the files submitted to them (I assume the artist attempted to upload the same file to both sites) which results in MD5 changes?

tyciol said:
I generally scroll a series/character tag before uploading to make sure it's not there but do sometimes miss stuff because of difficulty I have focusing so kinda rely on the MD5 to occasionally correct me when I overlook something.

I get that there are different MD5 sums which happen when people edit an image, or stuff like when Pixiv automatically downsizes it to a lower resolution...

But I get confused about why when the image isn't changed and the resolution and file size appear to be the same why the MD5 didn't detect it and reject the upload.

If it did that then we wouldn't have to worry about having a "deleted file" count against us forever :(

For example:
Nov 25 https://e621.net/posts/2503246
Nov 27 https://e621.net/posts/2504961

I'm not sure what the mental hiccough is where I didn't notice it was there, but the same res and file size is displayed for both:
2427x3000 (2.12 MB)

Anyone have an idea what might've differed between the file I saved from the 8chan.moe /delicious/ thread where it was delivered to my request?
as opposed to the upload saved from Baraag.net ?

Something about one site or the other might somehow be tweaking the files submitted to them (I assume the artist attempted to upload the same file to both sites) which results in MD5 changes?

You wouldn't happen to have a copy of the deleted file, would you?

My first thought is that whatever image software the artist used might have embedded metadata (name of software, color profile, timestamp, ???), and one site might have modified or stripped it.

deleuzian_cattery said:
You wouldn't happen to have a copy of the deleted file, would you?

My first thought is that whatever image software the artist used might have embedded metadata (name of software, color profile, timestamp, ???), and one site might have modified or stripped it.

I'm not able to access the older drawthread (5845.html) via the catalog anymore so the closest thing I can think of is https://booru.allthefallen.moe/posts/304689 copy:

https://booru.allthefallen.moe/data/sample/sample-e0dfb37f88d6bd94d4f6eb861b58c16e.jpg

I believe I also uploaded on the 27th from that drawthread. I was just 2 days late in checking it I guess.

Whoever added it here on 25th added it to e631 instead of ATF even though that's the standard location for those threads' deliveries now: but sending furry lolis to e631 is probably the logical thing to do to partition your loli upload quotas.

I guess Paheal would be make sense too since they only ban human lolis so far. Plus are prob less picky about quality than here.

I'd recommend using the similar images search over using the md5 to check if the file already exists, changing even one byte of the file will produce a different checksum so it's quite unreliable especially when some websites reprocess the file or add metadata.

  • 1