Hello all!

In the past, when tagging posts, I've found tags that existed in both plural and singular forms, or in both two-word and one-word forms. Often, I would propose an alias or a BUR to standardize on one or the other. This is a recent example.

It occurred to me that it might be possible to find tags like this in an automated way. I wrote a script to analyze the tags file, looking for general-category tags, that are currently on at least one post, that only differ by a final s, or only differ by having or not having underscores.

Result: there are about 1,800 tags that only differ from another tag by a final s, and about 900 tags that only differ from another tag by underscores.

Most of the differences are pretty lopsided - there are dozens to hundreds of posts tagged one way, and fewer than 10 posts tagged the other way. There are some that are closer to being even.

Also, my script doesn't check for existing aliases, so some of the pairs it found may be covered by an existing alias.

My question is: what do I do with this information?

Fix them all manually?

File aliases (either individually or as a series of BURs) for all of them?

Fix some of them manually and some of them with an alias? If so, what are the criteria for choosing one or the other?

For either a manual fix or an alias/BUR, should I always prefer the more-popular tag, or always prefer the singular/plural or underscore/no-underscore version, or "it depends"?

Follow esix's advice?



I would say make a BUR. It's best to avoid the tags getting populated again down the road.

Result: there are about 1,800 tags that only differ from another tag by a final s, and about 900 tags that only differ from another tag by underscores.

Post the result



There are some that are closer to being even.

*Danger danger*
There are going to be some like that, where it's intentional.

I usually just correct them as I see them in the 'only 5-10 examples out of 15K' cases.
A lot of time, the aliases don't make sense if the tagging suggestions are going to show the only populated variation, anyways. There are some cases though where you can't fix stupid, and it's a mistake that keeps coming back. Those you definitely should have aliased. It's kind of hard to see the future, though, you know? :shrug:

*Danger danger*
There are going to be some like that, where it's intentional.

I know. I wasn't just going to cut and paste the script output into a bunch of BURs without a little more investigation. :)

There are some where the tag name or the lopsided-ness tells me that the two tags probably really do mean the same thing. There are others I'd want to research before chucking them into a BUR.

Version 1.0 of my script operated on all categories of tags, not just general, but that went astray a few times. I wondered why people kept spelling tentacles with an extra s, until I figured out that tentacless is an artist tag.

There are some cases though where you can't fix stupid, and it's a mistake that keeps coming back. Those you definitely should have aliased.

In the past, I've found a few tags that I only had to fix on two or three posts, but doing a "post changes" search showed that the tags involved get added and removed a lot over time, so I went ahead and filed an alias.

Post the result

To look at, or to work from, or...?

If somebody wants to work from it, I can add tag-search links instead of just bare tag names, if that would be helpful.

I'd rather not post several hundred items and then litigate them at random in a forum thread, though. :)

To look at, or to work from, or...?

If somebody wants to work from it, I can add tag-search links instead of just bare tag names, if that would be helpful.

I'd rather not post several hundred items and then litigate them at random in a forum thread, though. :)

Put the whole list on, I'll report my findings, any alias requests I'll make here, and link this thread in them.

