Topic: [REJECTED] CJK Character BUR

Posted under Tag Alias and Implication Suggestions

The bulk update request #2897 has been rejected.

mass update kanji -> cjk_ideograph
remove alias hanzi (0) -> chinese_text (4876)
create alias chinese_character (0) -> cjk_ideograph (0)
create alias kanji (0) -> cjk_ideograph (0)
create alias hanja (0) -> cjk_ideograph (0)

Reason: https://en.wikipedia.org/wiki/CJK_characters

https://en.wikipedia.org/wiki/CJK_Unified_Ideographs

https://unicode.org/faq/han_cjk.html

The kanji tag is inadequately named. Its current usage covers hanzi and hanja. "CJK character" is a neutral way to refer to the group that includes hanzi, kanji and hanja. Kana and hangul are not included by definition.

Please try to at least skim the linked sources before insulting me in the comments and do read the wiki page in full.

alias hanzi -> cjk_character afterwards

EDIT: The bulk update request #2897 (forum #340496) has been rejected by @gattonero2001.

Updated by auto moderator

gattonero2001 said:
Kana and hangul are not included by definition.

That's not obvious by the naming, a lay person can easily interpret "cjk_character" to include hangul/korean_text and plain kana, even if it's not technically included by the Unicode definition (wikipedia even says: "Collectively, the CJK characters often include Hànzì in Chinese, Kanji and Kana in Japanese, Hanja and Hangul in Korean"). "cjk_ideogaph" or something along those lines was the suggestion.

Updated

watsit said:
That's not obvious by the naming, a lay person can easily interpret "cjk_character" to include hangul/korean_text and plain kana, even if it's not technically included by the Unicode definition (wikipedia even says: "Collectively, the CJK characters often include Hànzì in Chinese, Kanji and Kana in Japanese, Hanja and Hangul in Korean"). "cjk_ideogaph" or something along those lines was the suggestion.

changed cjk_character to cjk_ideograph

Idk, I think we're all going so still have some wires crossed on this for a while.
Skimming things it looks like the thing that specifies the Chinese characters that are used across the languages is the unified part of the term,while cjk character/ideograph on its own would include all kanji, katakana, and hangul but also the unified grouping specifically excludes any characters which are not shared, and plugging kanji into that would be incorrect.
I'm not sure this is ever heading for the specific solution you were looking for. Mashing them together gets rid of the mistag problem at the expense of removing any distinction at all between the languages. I think what I thought was this was gonna be a family tag for the entirety of the script grouping we were talking about, so that at least that would be correct.
I'm gonna mirror my analogy that we don't nuke and merge species tags just because sometimes people mistag similar-looking species as each other.

Updated

magnuseffect said:
but also the unified grouping specifically excludes any characters which are not shared, and plugging kanji into that would be incorrect.

The tag isn't for cjk_unified_ideograph, though, just cjk_ideograph (meaning ideographs from the CJK regions, which includes shared kanji as well as language-specific kanji).

magnuseffect said:
Mashing them together gets rid of the mistag problem at the expense of removing any distinction at all between the languages.

The issue is primarily that "kanji" is currently used to refer to CJK ideographs, irrespective of exact origin, and some people take the word "kanji" to be only those ideographs found in Japanese (and should exclude ones not found/used in Japanese). I personally think "kanji" is well enough understood by English speakers to refer to Chinese, Japanese, or Korean ideographs in general, and it's almost assured most people couldn't tell if a given kanji character is Japanese-only, Chinese-only, Korean-only, Japanese-excluded, Korean-excluded, or Chinese-excluded (maybe throw in Vietnamese while we're at it?). But either way, chinese_text and japanese_text and such are still going to be valid for when it's known to be Chinese, Japanese, etc.

magnuseffect said:
I'm gonna mirror my analogy that we don't nuke and merge species tags just because sometimes people mistag similar-looking species as each other.

More that people don't know what taxonomic level the tag is meant to refer to, and most people couldn't properly assign a more specific taxonomic group anyway, so this is trying to clarify it's for the general grouping.

magnuseffect said:
Idk, I think we're all going so still have some wires crossed on this for a while.
Skimming things it looks like the thing that specifies the Chinese characters that are used across the languages is the unified part of the term,while cjk character on its own would include all kanji, katakana, and hangul but also the unified grouping specifically excludes any characters which are not shared, and plugging kanji into that would be incorrect.
I'm not sure this is ever heading for the specific solution you were looking for. Mashing them together gets rid of the mistag problem at the expense of removing any distinction at all between the languages. I think what I thought was this was gonna be a family tag for the entirety of the script grouping we were talking about, so that at least that would be correct.
I'm gonna mirror my analogy that we don't nuke and merge species tags just because sometimes people mistag similar-looking species as each other.

I changed the aliases to implications, but left the update. If anyone wants to tag kanji, they ought to go through the existing posts and make sure it isn't a hanzi (post #3444922) or hanja.

gattonero2001 said:
I changed the aliases to implications, but left the update. If anyone wants to tag kanji, they ought to go through the existing posts and make sure it isn't a hanzi (post #3444922) or hanja.

I do not expect that to work well. People will not properly separate kanji and hanzi (and hanja) as they're too stylistically similar, and especially as most (but not all) kanji are also hanzi. You will basically need to be a language expert to get that right.

watsit said:
I do not expect that to work well. People will not properly separate kanji and hanzi (and hanja), especially as most (but not all) kanji are also hanzi.

changed it back

for the love of god i just want to get this over with

if you agree with the way it is now please leave an upvote so the admins can measure public opinion

gattonero2001 said:
for the love of god i just want to get this over with

Rushing something that people are having a hard time grasping and coming to a conclusion on won't make for a good request.

gattonero2001 said:
if you agree with the way it is now please leave an upvote so the admins can measure public opinion

I have a bit of trepidation with upvoting if you're so willing to change it on a whim just to get it over with. As I said, I think "kanji" works just fine for the whole grouping of kanji/hanzi/hanja characters, and while I would be okayish with cjk_ideograph in place of kanji/hanzi/hanja if it's a slightly less ambiguous term, I don't want to upvote if you're so eager to get it done that you'll change it back and forth from the slightest response and me not have time to change my vote and explain why I may not like the change.

watsit said:
The tag isn't for cjk_unified_ideograph, though, just cjk_ideograph (meaning ideographs from the CJK regions, which includes shared kanji as well as language-specific kanji).

gattonero2001 said:
"CJK character" is a neutral way to refer to the group that includes hanzi, kanji and hanja. Kana and hangul are not included by definition.

So which is it?
Was your gripe there that using character would cause people to interpret "cjk_character" to include hangul/korean_text and plain kana 'incorrectly' despite hangul and katakana characters being included in that linked wikipedia definition of 'CJK character'?

One or more people in this discussion are misunderstanding one or more other people in this discussion.

You know, this isn't entirely a bad idea as it solves the mistag problem (https://e621.net/posts/3371000?q=kanji being tagged as kanji when the text is clearly Chinese through context is an example) and although a better solution would be to not use those tags at all as they seem unnecessary, this could work to remove ambiguity, especially for those that don't know the difference.
That being said, even though kana and hangul are removed by definition, I think you should rather look at practicality rather than definition.
"Other scripts used for these languages, such as bopomofo and the Latin-based pinyin for Chinese, hiragana and katakana for Japanese, and hangul for Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of the target languages."
I think this is evidence enough that kana and hangul would belong in the alias, but this is probably another can of worms.

watsit said:

The issue is primarily that "kanji" is currently used to refer to CJK ideographs, irrespective of exact origin, and some people take the word "kanji" to be only those ideographs found in Japanese (and should exclude ones not found/used in Japanese). I personally think "kanji" is well enough understood by English speakers to refer to Chinese, Japanese, or Korean ideographs in general, and it's almost assured most people couldn't tell if a given kanji character is Japanese-only, Chinese-only, Korean-only, Japanese-excluded, Korean-excluded, or Chinese-excluded (maybe throw in Vietnamese while we're at it?). But either way, chinese_text and japanese_text and such are still going to be valid for when it's known to be Chinese, Japanese, etc.

While they're the same thing but in different languages, the key thing to remember here is that they're still different languages... "Kanji" should exclusively be used for Japanese, you shouldn't lump Korean or Chinese with it because they're not the same language. Someone who knows Japanese can't read Hanzi and someone who knows Chinese can't read Kanji.

spnshnhg said:
You know, this isn't entirely a bad idea as it solves the mistag problem (https://e621.net/posts/3371000?q=kanji being tagged as kanji when the text is clearly Chinese through context is an example)

Is it "clearly Chinese"? I'm not seeing any context to say it is Chinese, unless you're saying because it's referencing the Chinese New Year/Year of the Tiger, it must be Chinese? That's a bad presumption.

spnshnhg said:
That being said, even though kana and hangul are removed by definition, I think you should rather look at practicality rather than definition.

Kana and hangul are different, both in terms of function (they're phonetic) and visual style[1]. People looking for kanji characters most likely aren't interested in finding kana or hangul.

[1] There are some kanji which have been repurposed to also be katakana, like ニ, but that's rare.

spnshnhg said:
While they're the same thing but in different languages, the key thing to remember here is that they're still different languages...

This isn't for languages, it's for a set of characters. japanese_text, chinese_text, korean_text, etc, are for indicating the language when known, while kanji is for indicating kanji characters irrespective of language (generally due to the language being ambiguous, though not strictly for that reason).

watsit said:
Is it "clearly Chinese"? I'm not seeing any context to say it is Chinese, unless you're saying because it's referencing the Chinese New Year/Year of the Tiger, it must be Chinese? That's a bad presumption.

What's a bad presumption about assuming that a Chinese festivity related artwork has Chinese words on it? I don't get it, it's just the logical assumption.

This isn't for languages, it's for a set of characters. japanese_text, chinese_text, korean_text, etc, are for indicating the language when known, while kanji is for indicating kanji characters irrespective of language (generally due to the language being ambiguous, though not strictly for that reason).

First of all you keep incorrectly calling them all Kanji, second of all "kanji is for indicating kanji characters irrespective of language (generally due to the language being ambiguous, though not strictly for that reason)" so are you agreeing or not? Because that's what I said but in favor of the alias. Two sides of the same coin. The alias would remove ambiguity even further. Also I'm pretty sure the tag is mostly supposed to be used when the characters exist within the artwork rather than when it's in the text on the artwork, so that'd be another reason for the alias since language becomes mostly irrelevant.

Then again, people are lazy and "Kanji" is a catchy word that most people know thanks to anime, so suddenly changing the tag would surely be unwanted all things considered.

spnshnhg said:
What's a bad presumption about assuming that a Chinese festivity related artwork has Chinese words on it?

Because there are non-Chinese people that observe it too (or people with Chinese heritage), and people can write in another language.

spnshnhg said:
First of all you keep incorrectly calling them all Kanji

Because people know well enough what's meant by the word, and saying "Kanji/Hànzì/Hanja/Chữ Nôm" all the time to reference them instead would be annoying.

spnshnhg said:
second of all "kanji is for indicating kanji characters irrespective of language (generally due to the language being ambiguous, though not strictly for that reason)" so are you agreeing or not? Because that's what I said but in favor of the alias.

I said before, I think "kanji" works just fine for the whole grouping of kanji/hanzi/hanja characters, but I would be okayish with cjk_ideograph in place of kanji/hanzi/hanja if it's a slightly less ambiguous term that enough people really want.

spnshnhg said:
Then again, people are lazy and "Kanji" is a catchy word that most people know thanks to anime

Yes, anime. Nothing to do with the US's heavy involvement in Japan after World War 2, or the overall cultural impact Japan has had following the war thanks to its technological and economic boom, but it's lazy people who like anime that caused the word to spread.

You know what, if what's being stuck to is kanji as-is being used as a term for all CJK logographs then fine, let's dump it.
Though I'd be more comfortable with _logograph than _ideograph.

  • 1