Topic: SuperiorBOT

Posted under e621 Tools and Applications

Hey there. This is a project I've been working on for a few weeks getting the infrastructure set up, and I've just recently finished getting it to the point where I can actually have it do something other than use up bandwidth and CPU time.

The project is, for lack of a better name, known as SuperiorBOT.

What does it do?

Currently, not much. Much of the time was spent setting up data collection from popular furry sites (e621, SoFurry, and FurAffinity only currently), and the only thing I've had it do so far was flag exact pixel-for-pixel matches of images. You might have noticed this if you looked at the flag history over the past few days.

What is it going to do?

Quite a bit, hopefully. There's a bunch of things on E621 that I've felt could be automated (or at least made easier) with the help of having lots of data available. In other words, basically I've built myself a local copy of the databases of e621, SoFurry, and FurAffinity. The following is a list of potential ideas that may or may not be implemented.

Current work

6/28/16 - Replacing image cache system with zip files instead of storing them directly on the filesystem. Disk I/O is beginning to be a problem when I have millions of 20kb png files scattered around...

Unimplemented Dangerous Stuff

If something's listed in this section, it means I'll consult with and get approval from admins before implementing any of it.

  • Add year tags to posts missing them (2008, 2013, etc) based on matching decoded image MD5 sums with images found on FurAffinity.
  • Add artist tags to images based on their source links.
  • Automatically generate new artists when posts by them are uploaded for the first time.
  • Add sources to images when more are found on FurAffinity or SoFurry.
  • Remove/fix dead sources (for example data.furaffinity.net)
  • Transfer applicable tags from inferior posts to superior posts.
  • Automatically add color-based tags like greyscale, monochrome, sepia, black_and_white, restricted_palette, alpha_channel
  • Add ratio and resolution tags to new posts
Unimplemented Safe Stuff

Some less interesting stuff that poses zero danger to e621 site operation.

  • Fix decoded image MD5 sums for gif images so it takes into account more than just the very first frame.
  • Add support for data collection from more furry sites!
  • Build a list of images that are visually similar to each other without being parented or linked in any way (using the same technique that powers http://iqdb.harry.lu/) Still unknown if any action taken on these posts will be automated or fully manual.
Changelog
  • 6/20/16 Post below now automatically updates with status information about the bot.
  • 6/20/16 Created this post.

Updated by user 59725

Active modules:

  • E621: Skipped 127300 (13.7%), Processed 802362 (86.3%), Indexed 100.0% of 929661
  • SoFurry: Skipped 535520 (51.6%), Processed 501510 (48.4%), Indexed 100.0% of 1037030

Last updated at 6/27/2016 11:11:39 PM UTC

Updated by anonymous

asw_xxx said:
In other words, basically I've built myself a local copy of the databases of e621, SoFurry, and FurAffinity.

Sweet jebus, how big is this database(In disk size)?

Updated by anonymous

Chaser said:
Sweet jebus, how big is this database(In disk size)?

Not all that large actually. E621's sqlite database is ~1.06 GB, SoFurry's is ~373 MB, and FurAffinity's is going to be the largest (~1.5 GB and I've only indexed about 17% of their site).

I'm not storing full resolution images, but I am keeping resized 128x128 pixel PNGs of each image for future image comparison use (because I know I'm going to have an issue in that code somewhere, and I'd rather not redownload over 2TB of images again). This cache of images consists (so far) of 3.1 million PNGs, coming out to 73 GB.

Updated by anonymous

Looks very promising. There are just a few caveats I can see:

  • Add year tags to posts missing them (2008, 2013, etc) based on matching decoded image MD5 sums with images found on FurAffinity.
  • Transfer applicable tags from inferior posts to superior posts.
  • Add sources to images when more are found on FurAffinity or SoFurry.

Sounds good to me.

  • Add artist tags to images based on their source links.
  • Automatically generate new artists when posts by them are uploaded for the first time.

Note that commissioner reposts are very common, and could easily confound this.

  • Remove/fix dead sources (for example data.furaffinity.net)

Note that this is sometimes used for source hunting.

Updated by anonymous

  • 1