Topic: How do I prevent from uploading sample posts?

Posted under General

I got a negative record about uploading "sample posts", and those having to be replaced by others. (I sorry...)

How do I even upload sample posts?

You have already received a DMail explaining how to avoid uploading sample versions from Twitter.

To summarize:
1) Navigate to a Twitter post, such as https://twitter.com/Chokeylover/status/1691071211055521792 (NSFW)
2) Open the image in a new tab, such as https://pbs.twimg.com/media/F3fkODha0AA-0Z3?format=jpg&name=large
3) Change the URL format to follow this exact schema and reload the tab: https://pbs.twimg.com/media/F3fkODha0AA-0Z3?format=jpg&name=orig - notice how the name parameter was changed from "large" to "orig". This changes the maximum image dimensions from 2048x2048 to 4096x4096 (and rarely higher than this).

See also the Twitter entry in the Sites and Sources wiki page for more info: https://e621.net/wiki_pages/26055#twitter
See also the userscript that automatically changes the image link to the highest quality for you: https://e621.net/forum_topics/39960

faucet said:
Additionally to Song's advice: Use the URL input box on the uploader instead of saving the file and uploading it, for example attempting to upload https://pbs.twimg.com/media/F3fkODha0AA-0Z3?format=jpg&name=large will result in the following message:

That's actually a nice thing that it tells you that. Same with i.redd.it versus preview.redd.it URLs. If you're not 100% sure you're getting the original (or best they provide), that's best way. Also, there's specific tricks per site so if you're not familiar with a source site, be prepared to use something like SauceNAO and other reverse image searches. Annoyingly, some of my sources are not on the whitelist, so I have to do it the old-fashioned way. Given that sometimes I find better sources (even original PNGs!), this is not always a bad thing. While it sounds trivial to remember to change to 'orig' link, there's dozens of sites. I can't remember if there's a single post in forums that lists all the currently known ways.

song said:
You have already received a DMail explaining how to avoid uploading sample versions from Twitter.

To summarize:
1) Navigate to a Twitter post, such as https://twitter.com/Chokeylover/status/1691071211055521792 (NSFW)
2) Open the image in a new tab, such as https://pbs.twimg.com/media/F3fkODha0AA-0Z3?format=jpg&name=large
3) Change the URL format to follow this exact schema and reload the tab: https://pbs.twimg.com/media/F3fkODha0AA-0Z3?format=jpg&name=orig - notice how the name parameter was changed from "large" to "orig". This changes the maximum image dimensions from 2048x2048 to 4096x4096 (and rarely higher than this).

See also the Twitter entry in the Sites and Sources wiki page for more info: https://e621.net/wiki_pages/26055#twitter
See also the userscript that automatically changes the image link to the highest quality for you: https://e621.net/forum_topics/39960

What happens if I upload it at "large" version? Still coints as sample or is valid?

nanomecho said:
What happens if I upload it at "large" version? Still coints as sample or is valid?

You should really open up the "large" and "orig" versions and compare their sizes.
Which is to say, yes, even the "large" is an inferior sample.

lafcadio said:
You should really open up the "large" and "orig" versions and compare their sizes.
Which is to say, yes, even the "large" is an inferior sample.

Wasn't there a non-zero chance of getting lucky and the large IS the original image? I think very small source images (say, 320x240 or 640x480) don't get changed. HashCheck or similar is your friend when wanting to be 100% sure two files are identical.
In Windows Explorer
In Nautilus

There's a ton of tools for cross-checking and listing identical hashes. A basic script goes like:
1) Generate a file with a list of all filenames/paths and MD5/SHA1 or whatever hash per line (The hash goes first!). "0123456789abcdef:*filename*" on every line.
2) Sort the text file.
3) Highlight lines next to each other where the first characters upto the colon (:) match. HashCheck (and others) use ' *' instead of ':' to delimit the hash/filename.

BTW: A nice trick if verifying a LOT of files, use something like QuickPAR (obsolete, though) or "Par2"https://www.google.com/search?q=par2+linux Also, wanted to mention this cool thing .

Updated

alphamule said:
Wasn't there a non-zero chance of getting lucky and the large IS the original image? I think very small source images (say, 320x240 or 640x480) don't get changed. HashCheck or similar is your friend when wanting to be 100% sure two files are identical.

TBH using a file deduplication tool still seems simpler than this. For example fdupes is fast and known to compare on the basis of hash, but it handles the whole process (files of a different size are obviously not identical -> matching size? then hash files -> files with same hash may be identical -> compare bytes of files that have same hashes) of making sure you really did find duplicates.

savageorange said:
TBH using a file deduplication tool still seems simpler than this. For example fdupes is fast and known to compare on the basis of hash, but it handles the whole process (files of a different size are obviously not identical -> matching size? then hash files -> files with same hash may be identical -> compare bytes of files that have same hashes) of making sure you really did find duplicates.

Well, my idea was more for source links. I have been manually doing this for sources of stuff I upload. Yeah, TBH, that's basically what fdupes or the like is for. There are also really clever things that will just link multiple (identical) files to the same sectors. That way, unless one of them changes, it only needs one copy of the sectors used in the file. AFAIK, ext3 and NTFS both support this officially, and you can get away with it on read-only filesystems like on an ISO(9660) image. Oddly, FAT supports this unofficially, but not for writable media. It'll work fine until someone (or something) corrupts it. ;)

I wish I could just automate all this but it feels like trend is to break any kind of automation. :(
What I should do is have the standard .MD5 file format for list of filenames, then another for URLs, and to NOT combine these, or, to use a CSV/JSON/etc. library and just merge them both. I would only be using this for a specific artist or character tag, I guess. I remember a big caution for MD5 deduping was to use it to eliminate 99% of the non-duplicates from your search, and then manually compare sectors.

Funny thing is, the P2P programs had to get better and better at making sure someone didn't tamper with files on the networks, and Bit Torrent uses a rather nice way. Hypothetically, someone could intentionally force matching hashes to poison sources.

  • 1