Topic: e621 updater - tag local files!

Posted under e621 Tools and Applications

Just found this tool that does exactly what I was looking for: uploading saved images to favourites. Nice job!

Just one suggestion mentioned earlier in this thread but not addressed - logging of files that aren't found on e621. It'd be really helpful to know which few files out of hundreds haven't been favourited.

Updated by anonymous

Dirtpaw said:
It'd be really helpful to know which few files out of hundreds haven't been favourited.

It's actually not that easy to do because AFAIK server returns nothing after POST request. I'd have to check if amount of favourited pictures has changed after each request or batch check them after process finishes... I'll think about it. :)

Updated by anonymous

I implemented something like that as part of my system.
Basically, it does the fav:USERNAME query, fetching all pages, and then extracting just the URIs. It then reads a list of md5sums corresponding to on-disk files (this is cached, and updated as needed by digup )
It iterates over the list of URIs; for each one it extracts the md5sum. The URI in question is output if its md5sum is not in the list of on-disk files.

So what I implemented is basically the opposite procedure, but is simple to invert. Hope that helps give you some ideas.

Updated by anonymous

We are probably not affected by "Crawlers, Bots, Page Numbers and You." change (unless someone has more than 750 pages of favorites, huh), but feel free to report if something is not working right.

Updated by anonymous

Hi Keito,

Thanks for putting this together, but I don't seem to be able to get tags on my images. Also, it seemed to rename one of my files into oblivion and it disappeared (not the "NotFound" file, but the one starting with hash "00d").

Wednesday, July 11, 2018
14:26:25 : Proccess started
14:26:25 : Tagging mode
14:26:40 : After 0:00:15: 22/22 images processed. 0/22 images updated
Wednesday, July 11, 2018
14:26:57 : Proccess started
14:26:57 : Tagging mode
14:27:36 : After 0:00:39: 22/22 images processed. 1 images not found
Wednesday, July 11, 2018
14:32:14 : Proccess started
14:32:14 : Tagging mode
14:32:28 : After 0:00:14: 21/21 images processed. 0/21 images not found
Renamed A:\Users\Ket\Desktop\new pron copy\00d89c259ac3083408350a08ec5f7c94.jpg to .jpg
Renamed A:\Users\Ket\Desktop\new pron copy\044197f543a763be4a583d75c8cb76a7.png to A:\Users\Ket\Desktop\new pron copy\044197F543A763BE4A583D75C8CB76A7.png
Renamed A:\Users\Ket\Desktop\new pron copy\0a6d72a5a68f3cb67556f2ebfb59b290.jpg to A:\Users\Ket\Desktop\new pron copy\0A6D72A5A68F3CB67556F2EBFB59B290.jpg
[...]
Renamed A:\Users\Ket\Desktop\new pron copy\ed4d8102d520504513630014d8565479.png to A:\Users\Ket\Desktop\new pron copy\ED4D8102D520504513630014D8565479.png
Renamed A:\Users\Ket\Desktop\new pron copy\f1c586633a553be461851e818436fe42.jpg to A:\Users\Ket\Desktop\new pron copy\F1C586633A553BE461851E818436FE42.jpg

I notice you have a GitHub for this now. How do I compile these AHK files? I've never heard of this extension.

Thanks! =}

Updated by anonymous

Better late, than never!

Ket-Ralus said:
Thanks for putting this together, but I don't seem to be able to get tags on my images. Also, it seemed to rename one of my files into oblivion and it disappeared (not the "NotFound" file, but the one starting with hash "00d").

For me the tagger still works fine, are you sure your connection to e621 is direct? Check windows proxy settings and if you use proxy\vpn to access e621. It should be working if there's nothing wrong. Make sure you're not searching for tags in PNG images - only few programs support them (but they are actually there). Check if Windows sees tags in JPGs. And regarding your "renamed" file - i can't reproduce it with the same file. It was only one?

Ket-Ralus said:
I notice you have a GitHub for this now. How do I compile these AHK files? I've never heard of this extension.

You can get a compiler and read more about it here: https://autohotkey.com/

-----------------

Sync mode not working fixed in version 9.3.
I don't login often anymore, so PM me if you have any problems - i'll get a email then.

Updated by anonymous

i still cant get sync mode working, tried on multiple accounts and multiple folders.

Updated by anonymous

If sync mode is not working, try checking your API key field: it should only contain your API key and nothing else.

Updated by anonymous

It seems it IS possible to store EXIF data in PNG. ftp-osl.osuosl.org/pub/libpng/documents/pngext-1.5.0.html#C.eXIf

Updated by anonymous

milanise7en said:
It seems it IS possible to store EXIF data in PNG. ftp-osl.osuosl.org/pub/libpng/documents/pngext-1.5.0.html#C.eXIf

EXIF is already added by my tool: the problem is that not all programs can understand that. Many just don't even try to search for tags, even when they are still there. Windows Explorer is a good example. It won't even show "tags" field for PNGs, while Picasa and Photoshop know that they are there.
You can tag PNGs, you just have to use proper software to make use of them.

Updated by anonymous

Hydrus seems to be my replacement of choice for this, along with it replacing Picasa.

(I know I'm necroing a 2 month old thread but it's worth looking into for anyone needing tag sync / subscription puller solution)

Updated by anonymous

Can this download pictures based on tags I choose and not based on someone's favorites list?

Updated by anonymous

Serkan said:
Can this download pictures based on tags I choose and not based on someone's favorites list?

Nope. I've used e621dl when i needed that functionality. I feel like coding it into my program is excessive, since there is quite a lot of programs out there that can do it for you.

LibertarianHorseFukr said:
Hydrus seems to be my replacement of choice for this, along with it replacing Picasa.

Looking good, i'll stick to using mine and image viewer of choice to avoid locking myself into one specific program. It seems that it's using some kind of internal database, while my program is tagging files themselves. Thanks for the bump! ;)

Updated by anonymous

I am getting an error when I try to do anything. Everything was working perfectly a couple months ago.

Error:Expecting JSON value (string,number,true,false,null,object or array)

Line: 1
Col: 1
Char: 1

specifically: <

line#
134: Throw,Exception(msg, offset, SubStr(text, pos, len))

Updated by anonymous

9.4 IE Doomsday

  • Requests from old IE are now denied by e621.net (it seems like this)
    • Had to change all requests to use curl and send custom useragent to get API data.
  • Here's what this means for you:
    • MD5 mode is disabled (can't get MD5 from files properly, program only works with MD5 in filenames, at least for now). It was turned off by default, but if you've used it, that's bad news for you.
    • Tagger tested on limited amount of picture and works OK
    • Updater tested on even less amount of pictures and works OK (but that's not 100%)
    • Sync and Downloader modes are completely disabled for now. There's a lot of code to rewrite to make them work, i'm not ready to do that now.
    • Overall speed is reduced.

Downloads are in OP post.

Updated by anonymous

Tool is awesome! What app would you recomend for keeping viewing your stash/gallery?

Updated by anonymous

Keito said:
Nope. I've used e621dl when i needed that functionality. I feel like coding it into my program is excessive, since there is quite a lot of programs out there that can do it for you.

Looking good, i'll stick to using mine and image viewer of choice to avoid locking myself into one specific program. It seems that it's using some kind of internal database, while my program is tagging files themselves. Thanks for the bump! ;)

Yeah, it's using SQLite for it's database files to match file hashes with associated tags. All readable if you really want to go rooting around in them.

The API system is super primitive still. Mainly gets used for pushing URLs to the client software from browser extensions.

Updated by anonymous

notaroundhere said:
Tool is awesome! What app would you recomend for keeping viewing your stash/gallery?

Picasa or Lightroom.

Updated by anonymous

Alert caution warning, incoming noob question:
Downloader mode and Sync mode appear greyed out, what do i do?
I tried manually adding Username and API to the .ini
MEGA 9p4

Updated by anonymous

Keito said:
Picasa or Lightroom.

Picasa has been killed by Google and Lightroom cannot be purchased on certain countries.

Updated by anonymous

Fedfed said:
Alert caution warning, incoming noob question:
Downloader mode and Sync mode appear greyed out, what do i do?
I tried manually adding Username and API to the .ini
MEGA 9p4

Unfortunately, they are not working now. I've had to go back from using IE to all-CURL, so half of the tool should be rewritten. I don't have time for that now.

milanise7en said:
Picasa has been killed by Google and Lightroom cannot be purchased on certain countries.

You can still download it if you really want.

Updated by anonymous

Hope this gets updated for the new site at some point, very useful program for organization. Just curious if there is any progress on that. I can wait though, I'm patient.

adnf2012 said:
Hope this gets updated for the new site at some point, very useful program for organization. Just curious if there is any progress on that. I can wait though, I'm patient.

I'm working on a similar tool because this doesn't work anyore, mine is still in progress tho. I finished it shortly before the api rework lol..

I fixed tagger mode through some dirty haks. https://mega.nz/folder/IMYQkQyT#kR-iboAlVIp4WIQYo99waw
My version write tags from artist, general, species, character and copyright groups. Other modes probably still broken. Maybe will look into it later.
-----edit-----
Probably fixed downloader mode.
-----edit2-----
Probably fixed all modes. Tagger, downloader and sync modes was tested on small amount of files. Updater mode not yet tested.
-----edit3-----
"Get API" button in sync mode is back and should work.
-----edit4-----
Fixed and tested updater mode.

Updated

graiden said:
I fixed tagger mode through some dirty haks. https://mega.nz/folder/IMYQkQyT#kR-iboAlVIp4WIQYo99waw
My version write tags from artist, general, species, character and copyright groups. Other modes probably still broken. Maybe will look into it later.
-----edit-----
Probably fixed downloader mode.
-----edit2-----
Probably fixed all modes. Tagger, downloader and sync modes was tested on small amount of files. Updater mode not yet tested.
-----edit3-----
"Get API" button in sync mode is back and should work.

It worked! Thanks for the update!

graiden said:
I fixed tagger mode through some dirty haks. https://mega.nz/folder/IMYQkQyT#kR-iboAlVIp4WIQYo99waw
My version write tags from artist, general, species, character and copyright groups. Other modes probably still broken. Maybe will look into it later.
-----edit-----
Probably fixed downloader mode.
-----edit2-----
Probably fixed all modes. Tagger, downloader and sync modes was tested on small amount of files. Updater mode not yet tested.
-----edit3-----
"Get API" button in sync mode is back and should work.

Thanks! I've kinda lost interest in the genre for now so couldn't find spare time to work on it myself.
I'm updating Github and OP post with your version.

graiden said:
I fixed tagger mode through some dirty haks. https://mega.nz/folder/IMYQkQyT#kR-iboAlVIp4WIQYo99waw
My version write tags from artist, general, species, character and copyright groups. Other modes probably still broken. Maybe will look into it later.
-----edit-----
Probably fixed downloader mode.
-----edit2-----
Probably fixed all modes. Tagger, downloader and sync modes was tested on small amount of files. Updater mode not yet tested.
-----edit3-----
"Get API" button in sync mode is back and should work.
-----edit4-----
Fixed and tested updater mode.

I noticed that artist tags, lore tags, and meta tags don't show up on my pictures. Species and character do though. Is there an issue with the other tags?

Hi, I got a problem.
Most Img I download on iPad. IMG name is PostID.
When I use 9P5 try to get the tag with IMG. I use Tagger mode and check the [Force remove old tags], and select my IMG folder.
Status says Tagged! and no IMG not found.
But I use a few photo viewers to try reading it. I can't see any tag.

I have already tried to move all files and updater to new disk space. And all paths without a spacebar.

Any suggestion? Thank you.

mt657993 said:
Hi, I got a problem.
Most Img I download on iPad. IMG name is PostID.
When I use 9P5 try to get the tag with IMG. I use Tagger mode and check the [Force remove old tags], and select my IMG folder.
Status says Tagged! and no IMG not found.
But I use a few photo viewers to try reading it. I can't see any tag.

I have already tried to move all files and updater to new disk space. And all paths without a spacebar.

Any suggestion? Thank you.

Tagger expects file name to be equal to it's MD5 summ (md5 of original file without changed tags). I can try to look if i can change it to post id.

I had given up on this after e621's site update, but I checked back a few days ago and saw this got updates. Thanks for that! I managed to "sync" my local files to my favs like I used to. I had left a note saying my last sync was up to files with date created August 9, 2018. About +6500 favs.

issues

I downloaded 32-bit curl from https://curl.se/windows/. The link provided on this program's github page doesn't do such a great job of directing regular users to a download that actually has curl.exe (without extra steps?).

Updater mode doesn't seem to work. Error log

The output of Images Not Found doesn't seem accurate, because it always stayed at 0. I know some of my files were never posted to e621.

Get MD5 is greyed out and unusable. I think you know that. The Extracted MD5 field in sync mode has no output, which might explain sync mode always giving the status "Post found and favorited!"

Deleted posts were favorited. That's definitely better than failing (but still telling me "Post found and favorited!") because then I can look up what happened to those files through e621's deletion records and sourcing and find what which files they even are.

Also, lower quality deleted files are being favorited, and I don't think new favorites to such deleted files transfer to their superior quality versions. I think favorite transfer is a one-time step as part of e621's post deletion process. This search shows what I'm talking about fav:abadbird status:deleted delreason:*inferior*, but it's hacky in that delreason relies on the current standardized deletion reason text but older such deletions used non-standard, manually written reasons. I can manually update my favorites and local copies with that search, but the point is do to this in bulk, which I thought Updater mode helps with.

E621 is also transitioning to a more formal Replacements system. These higher quality replacements have the same post ID number but a new MD5.

It would be really nice to know which files sync mode succeeded in fav'ing, which files were fav'd but deleted at all, which files were fav'd but deleted for a better replacement, anything Replacement-aware, and which files are not on e621 at all.

To my knowledge this is the only program that even comes close to providing this functionality, and it's half-broken lol.

renaming files to their MD5s

Since this software no longer seems to calculate MD5s, I figured something out with Bulk Rename Utility (BRU). It's a comprehensive renaming program with UI and jargon that's overwhelming at first glance. What I did was put all my files into a temporary working directory, point BRU to that, and set Name (2) to Name: Remove and Add (7) to Prefix: <(hash:md5)>. You can see the changes in the output field (Name, New Name, etc) of highlighted files. Files with no changes would not be processed.

My working directory had files named with just their MD5 hash mixed with other files from various other sources. I wanted to separate the files that needed processing from the rest, so I found and enabled the setting Display Options > Sorting > Group Affected Files (highlight all files before sorting!). If you have extra options enabled that would affect every highlighted file then this group sort doesn't work, but those options can be enabled after the sort. For instance, I also copied these files to a different directory with Copy/Move to Location (13), which affects all highlighted files. Next, I selected the freshly grouped files needing renames: scroll to end of the group to be renamed, select the bottom file to be renamed, scroll to the top, and hold Shift and select the first file to be renamed. That should select only the files that need renaming for e621 updater. Press the Rename button in the bottom right corner to begin the process, optionally preview the changes, and press OK. That made workable files for e621 updater.

I'm sure there's easier ways of doing this, but this worked for me and wasn't too technical.

post #77516

Recent AI advancements made me look for something to download e621 tags into TXT files.
So, today i've returned to working on this app.

First crude tagger update is already done and seems to be working. It can tag the files themselves (as it did before) and it can download tags into TXT files named after the file.
I'll publish it this month if everything goes well.

Changes:
The only mode that will work in the initial release is Tagging Mode. All the other modes are disabled, for now.
Database is now offline, app is no longer DDOSing e621. That means increased RAM requirements but otherwise more stable and faster workflow since we no longer need network for tagging. Yet, you will have to download the latest tag and posts databases, which are... A little over 1GB in size as of today (~2,5GB non-compressed).

TO-DO:
(probably in order of importance)
Fix MD5 extraction
Allow files to be (re)named following e621 ID's
Fix networking (i'll need to process what changed on e621 side and how limits are affecting us now)
Maybe - depending on the previous item - return online mode without local databases
CURL, exiftool and databases (auto)updates in-app
Restore Updater Mode (download updated images)
Restore Downloader Mode (download favorited images)
Restore Sync Mode (favorite downloaded images)

Updated

I've released a first beta release of 100% Python code:
https://github.com/AyoKeito/e621updater-python

There's a compiled exe release for people who don't want to deal with python and it's dependencies. Others are welcome to run the scripts themselves.

Please take some time to read the github page. A lot of things have changed.
Especially since we no longer have GUI and are no longer using API. We just dump the database when needed.
This is a lot faster but each database update will eat up to 1GB of traffic.

I'm aware that some antivirus software considers "database.exe" a virus "VHO:Trojan-Dropper.Win32.Convagent.gen".
This code IS, in fact, a dropper, by definition. But it's not downloading any malware. Instead, it's downloading e621 database from db_export.
You are welcome to use py scripts instead of EXE files if you don't believe me. Praise be open source!

All contributions welcome.

Hello and thank you for your work with the tagger.

It works quite well. The download of the DBs always breaks off the first time I update the DBs but I assume that's because of my connection. Nevertheless it is very stable.
What I have found out now is that the program does not go into subfolders. You have to enter the direct path to the images.
Will there be an update so that it does this automatically? What I mean is that you only have to enter the main folder where all the folders with the pictures are?

Thanks anyways.

Hey!

mr_g said:
What I have found out now is that the program does not go into subfolders. You have to enter the direct path to the images.

That's actually intentional, but i guess i'll change it! Initially i made it not go into subfolders because it moves all not found images to "NotFound" subfolder. But i guess i can just ignore that one directory and the problem is solved!

Hello,

It would make it easier to mark everything.
I have now tried this with ~4000 images.
What can I say, the preparation took me hours. The execution of the program does too, but it doesn't matter. Maybe also because I've never done this before XD.
It didn't read my test folder, so I used CMD to get all the paths and wrote them into a file. With this, I created a BAT file that calls the program every time for each path.
If the program would simply read all subfolders, it would be easier to use. It doesn't have to be standard. But it should be there if you have a large offline collection. That's why I just wanted to mention it.

The option that the program creates a folder in which all the pictures that are not found are placed would then be specified by means of an "option". If you want it, you specify the option when entering the command. Wouldn't that be a possibility?

Ty

Updated

mr_g said:
If the program would simply read all subfolders, it would be easier to use. It doesn't have to be standard. But it should be there if you have a large offline collection. That's why I just wanted to mention it.

I've added an optional --subfolders flag, you can try it :)
It follows all subfolders of a target folder.
"NotFound" folder is ignored.

ayokeito said:
I've added an optional --subfolders flag, you can try it :)
It follows all subfolders of a target folder.
"NotFound" folder is ignored.

Hello,

Is it the EXE variant or the pure script variant?

mr_g said:
Is it the EXE variant or the pure script variant?

Both!
UP: oh my bad :D
Wait a sec~

Which one are you using BTW?

ayokeito said:
Both!
UP: oh my bad :D
Wait a sec~

Which one are you using BTW?

Hello,

I did the first test run with e621updater-python_v1.zip. (Win EXE)
I have just fetched v1.1.
I assume that the *.parquet files do not have to be fetched again.
I'll look for a folder and test it directly.

mr_g said:
Hello,

I did the first test run with e621updater-python_v1.zip. (Win EXE)
I have just fetched v1.1.
I assume that the *.parquet files do not have to be fetched again.
I'll look for a folder and test it directly.

Don't do that yet, it's a bit messed up at the moment :)
I'll update you.

Hello,

I should perhaps have indicated what such a path might look like to an image.
My paths are longer than just "one" subfolder. XD

main\artist1\species1\MD5.extension
main\artist1\species1\subfolder1\MD5.extension
main\artist1\species1\subfolder2\MD5.extension
main\artist1\species2\MD5.extension
main\artist2\species1\MD5.extension
main\artist2\species2\MD5.extension
main\artist2\species2\subfolder1\MD5.extension

Without a system for keeping the tags, you have to get creative somehow to even know what is where. My solution was a relatively simple sorting by species.

ayokeito said:
Don't do that yet, it's a bit messed up at the moment :)
I'll update you.

Too late. Just tried to make 318 files.
Of 318 in total, 274 were found and 29 were "missing".

mr_g said:
Too late. Just tried to make 318 files.
Of 318 in total, 274 were found and 29 were "missing".

It will actually take me some time to code that.
I've reverted both python and exe versions to v1 with no --subfolders flag.
I'll update when it's done properly.

Updated

ayokeito said:
It will actually take me some time to code that.
I've reverted both python and exe versions to v1 with no --subfolders flag.
I'll update when it's done properly.

Hey,

it's just an idea for your program as I'm sure there are people, including myself, who have a collection that is already large.
It makes it easier, because otherwise the already existing sorting has to be removed or preparations have to be made.

In my case, the preparation would be stupid.
with CMD: dir /s /b >> path.txt
open path.txt and remove all paths where the MD5 is in it and leave the rest.
Insert the final result into Calc as a multi-line but single-column table.
Command tagger.exe -f -p " in front of path and " behind. For each line.
Save as .BAT
Execute .BAT in CMD.
Wait a long time.

It works this way. Your program does something that I am far too lazy to do with over 100k files.
I am happy that your program exists at all

Take time whenever you want.
There is no hurry because of me. XD

ayokeito said:
This still works! :)
I've just updated the script with multi-threading for database operations and added a batch script to simplify running on Windows:
https://github.com/AyoKeito/e621updater-python/

Hello! I tried running the script posted 8 days ago and got an error. Here's output:

Aha, found the solution. Several things went wrong:
1) A prior version of python did not clear its environment variables so I had to manually uninstall python, clear the environment variables, and install pythong
2) The program relies on imports that are deprecated in 3.11 and missing in 3.12, so will not run on python 3.12
3) there's an import for async_timeout, but its missing from the dependencies so it needs to be manually taken care of: pip install async_timeout, then copy async_timeout from appdata\local\programs\python\python311\lib\site-packages to \venv\Lib\site-packages.

Now my next error:

←[31m0.62% 1542437241723_1435764720.spiderdragon_oriandtheblindforest_small.jpg MISSING MISSING
Traceback (most recent call last):
File "G:\Tagging experiment\e621updater-python\tagger.py", line 121, in <module>
write_to_exif(file_path, tag_string)
File "G:\Tagging experiment\e621updater-python\tagger.py", line 66, in write_to_exif
et.execute(f"-xmp-dc:subject={subject_tags}", "-overwrite_original_in_place", file_path)
File "G:\Tagging experiment\e621updater-python\venv\Lib\site-packages\exiftool\exiftool.py", line 1020, in execute
cmd_params.append(p.encode(self._encoding))
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pmaso\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode characters in position 65-68: character maps to <undefined>

Closer!

Updated

Okay, found the cause of *that* error. I had a picture whose file name contained some non-standard characters and on windows if the character isn't in cp1252 then its gonna give errors.

I dislike the behavior where its renaming found pictures to their md5 hash, but for the -n flag, you give this warning: you WON'T be able to tag them again. What does this mean?

Also, is there a flat that'll put the missing ones in a separate folder?

Updated

ragarth said:
Okay, found the cause of *that* error. I had a picture whose file name contained some non-standard characters and on windows if the character isn't in cp1252 then its gonna give errors.

I dislike the behavior where its renaming found pictures to their md5 hash, but for the -n flag, you give this warning: you WON'T be able to tag them again. What does this mean?

Also, is there a flat that'll put the missing ones in a separate folder?

Better late than never, i guess:
Thank you for testing this on newer Python. I still use 3.10 and didn't think to check if it runs on newer versions. I'll figure out how to set up a separate new env to test 3.11 and 3.12.

Regarding special characters, i'll see what i can do. Can you give me an example of such filename?

As for -n flag, when you use my tool to tag files, they are tagged in-file by default, using EXIF. EXIF is considered a part of file, so this changes the file's MD5 checksum. If filename is MD5 checksum (standard name for files downloaded from e621), my script will use this name as checksum to find the file in database. IF filename is different from checksum (e.g. you renamed it or got it from somewhere else) checksum will be calculated by the script itself. But if you've already tagged the file (changing it's MD5) AND filename is not MD5, there is no other way to find original MD5 for that file. Therefore, it won't be found in database, and it will be "NotFound" for all the later runs of this script. I thought about using some other EXIF field to save md5 on tagging, but hesitant to add any "trash" data.

As for the separate folder, files that are not found should be moved to "NotFound" directory by default. Did i break it somehow?

Sorry for not responding in time. Please ping me in DMs if you have any questions, so i'll get a mail notification :)

Updated

Bumping, also i've just confirmed this working under Windows 11 running both Python 3.11 and Python 3.12.
Python 3.12 requires a bit of manual installation but works fine after.
For Python 3.11, all the dependencies are installed automatically via venv if you run start.bat
You only need to install Python 3.11.* and add it to PATH for this to work.

Now, for anyone using it, would you like a better in-browser UI for the process? Or are you fine with just the console output?

ayokeito said:
Bumping, also i've just confirmed this working under Windows 11 running both Python 3.11 and Python 3.12.
Python 3.12 requires a bit of manual installation but works fine after.
For Python 3.11, all the dependencies are installed automatically via venv if you run start.bat
You only need to install Python 3.11.* and add it to PATH for this to work.

Now, for anyone using it, would you like a better in-browser UI for the process? Or are you fine with just the console output?

I stumbled upon this tool a few weeks ago and have been using it as an opportunity to brush up on my Python.

I've been slowly rewriting things for my own use with the help of ChatGPT (probably not worth publishing), but I wanted to share a few improvements I made and some you might consider:

- In tagger.py, instead of creating a list of the MD5 hashes from the dataframe and searching it, set the dataframe's MD5 column as the index and search it directly. On my system, this took a search of 100 images from 120 seconds down to ~10.
- Include WebM as a supported filetype since e621 provides MD5s for them and lots of gallery apps support XMP sidecars for them.
- Use asyncio's async functions to run exiftool on images as they're found. This lets you write tags without pausing the search since exiftool can be slow when you need to tag dozens (or hundreds) of images.
- Instead of a long string of comma separated tags, write each tag as it's own keyword in the metadata. This lets you include/exclude individual tags in Windows and various gallery apps. Bonus points if you implement aliases and implications.
- For database.py, read the CSV files in smaller chunks and write a chunk or two to disk before reading another. It shouldn't take more than a gig of RAM, much less 8gb.
- Maybe parse e621's description field and save it to the metadata as well, since sometimes it contains a story or useful info.
- Parse the source column of the posts database for FA URLs that contain filenames and search images by those. I imagine, like myself, most people probably don't rename the stuff they save, so lots of likely matches here where MD5 fails.
- Consider using fluffle.xyz for reverse image searching files that can't be found by MD5 or FA filename. The API is easy to use and doesn't require an account or have a ratelimit from what I can tell, you just need to filter the results to e621.
- Save a new parquet file containing the results of the search (e621 URL, tags, description, filename, etc.) so you can stop/resume or only search for new files.
- A pyinstaller compiled executable might also be a good option to make using the tool more accessible. It's easy to decompile if someone is concerned you're making changes from source, and it saves people from having to setup a python environment.

All in all the tool is solid, but some polish would go a long way. I personally don't mind the console output, but having a nice GUI might legitimize it a bit for those that can't read the code.

Updated

  • 1
  • 2