Hello all!
In the past, when tagging posts, I've found tags that existed in both plural and singular forms, or in both two-word and one-word forms. Often, I would propose an alias or a BUR to standardize on one or the other. This is a recent example.
It occurred to me that it might be possible to find tags like this in an automated way. I wrote a script to analyze the tags file, looking for general-category tags, that are currently on at least one post, that only differ by a final s, or only differ by having or not having underscores.
Result: there are about 1,800 tags that only differ from another tag by a final s, and about 900 tags that only differ from another tag by underscores.
Most of the differences are pretty lopsided - there are dozens to hundreds of posts tagged one way, and fewer than 10 posts tagged the other way. There are some that are closer to being even.
Also, my script doesn't check for existing aliases, so some of the pairs it found may be covered by an existing alias.
My question is: what do I do with this information?
Fix them all manually?
File aliases (either individually or as a series of BURs) for all of them?
Fix some of them manually and some of them with an alias? If so, what are the criteria for choosing one or the other?
For either a manual fix or an alias/BUR, should I always prefer the more-popular tag, or always prefer the singular/plural or underscore/no-underscore version, or "it depends"?
Thanks!
KV