Wednesday, June 12, 2013

Dictionary of Chinese Character Variants 教育部異體字字典

The Dictionary of Chinese Character Variants (教育部異體字字典) put out by the Republic of China (Taiwan)'s Ministry of Education is an invaluable resource, but, if you do use it use the beta version because the older edition has a lot of errors. Probably because of issues with font.

Dictionary of Chinese Character Variants 教育部異體字字典 (old edition)

Dictionary of Chinese Character Variants 教育部異體字字典 (new trial edition)

For instance, here is the explanation behind the variant character 将 for the more "traditional" 將.


As you can see they've got the wrong character in the explanation behind their research. They've got the character 婔 instead of 将.




There are a lot of these errors in the current edition of the dictionary. I compiled a long list of examples, but, lost them all because of a catastrophic hard disk failure.

So, if you're doing research on character variants, be aware and use the trial version. Though it probably has errors as well. Wonder how many errors are being introduced because of issues with techonology, 

Kobo wrote:
So, if you're doing research on character variants, be aware and use the trial version. Though it probably has errors as well. Wonder how many errors are being introduced because of issues with techonology, 
 I just realized that if you don't have the proper fonts installed 将 will look like either 


The variant that is now the standard on Japan. Or it'll look like this.



This means that the new trial version is going to have a lot of errors because of fonts.

Because if I recall correctly from Ken Lunde book published by O'Reilly on CJKV fonts. When they came up with the encoding scheme they consulted on the codepoints between the various nations and regions that used Chinese characters at one time or another in the various languages. The mainland would use the same codepoints but use "simplified" while the other regions used "traditional" according to their chosen "standard". A core character set for writing. It was only later that they decided to include every character variant under the sun. So they screwed up with the original set. Where the same character codepoint was used for the various variants then in the core set.

Difficult to explain. Anyway, with the Dictionary of Character Variants going for character encoding for the variants instead of graphics, this is going to introduce a whole new variety of error to their dictionary.
 

No comments:

Post a Comment