Talk:

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Common variant form of 益[edit]

When printing books and writing, I have noticed that many people in Taipei give the character 益 as 益. Because of the font problem, you may not be able to tell the difference between these two, but basically the two uppermost strokes aren't written the same. You can see what I'm talking about here https://ctext.org/dictionary.pl?if=gb&char=益 and here http://www.zdic.net/z/41/js/FA17.htm. When I tried to add 益 to zh-forms, I couldn't tell any difference between 益 and 益. How can we add this alternate form in a way that can make it's special characteristics visible? --Geographyinitiative (talk) 10:54, 25 April 2019 (UTC)[reply]

Instead of 䒑八皿 it basically looks like 八一八皿. --Geographyinitiative (talk) 21:34, 25 April 2019 (UTC)[reply]
@Geographyinitiative: It's a Han unification problem. @KevinUp usually writes an alternative forms section in the translingual section explaining this kind of difference. — justin(r)leung (t...) | c=› } 22:26, 25 April 2019 (UTC)[reply]
You are some amazing people --Geographyinitiative (talk) 03:59, 26 April 2019 (UTC)[reply]
@KevinUp, Justinrleung I'm sorry my tech is so backwards, but I still can't see any 八一八皿 version of the character on the page. On the school computers, I can see it. --Geographyinitiative (talk) 09:22, 26 April 2019 (UTC)[reply]
U+FA17, 益 CJK COMPATIBILITY IDEOGRAPH-FA17 displays as 八一八皿 on school computers but 䒑八皿 on my laptop --Geographyinitiative (talk) 09:23, 26 April 2019 (UTC)[reply]
But the point is, I CAN (even on my old laptop) see the 八一八皿 version when I go to https://ctext.org/dictionary.pl?if=gb&char=益 --Geographyinitiative (talk) 09:25, 26 April 2019 (UTC)[reply]
This is caused by fonts that don't follow the Unicode standard. Some font vendors use the same glyph from U+76CA for its compatibility forms U+FA17 and U+FAA6 without checking the Unicode charts. I recommend using Google Noto Sans CJK TC ([1]), which follows the Unicode standard. You can also download fonts for other regions. KevinUp (talk) 10:11, 26 April 2019 (UTC)[reply]
@KevinUp Thanks for your work here. What I mean to suggest is not that I personally have a problem that needs a solution, but that people who have primitive computers may not be getting the full picture on Wiktionary that they can and would get on Ctext.org . For that reason, I think some kind of more comprehensive solution is in order. --Geographyinitiative (talk) 11:12, 26 April 2019 (UTC)[reply]
@Geographyinitiative: One possible solution is to use web fonts so that the browser would use an online font rather than a locally installed font. MediaWiki has an extension for this: mw:Help:Extension:WebFonts (replaced by ULS extension). However, CJK fonts are not currently supported due to their large file size. Perhaps you can try opening a ticket to suggest for the implementation of a lightweight CJK font containing the 472 glyphs required for Appendix:Unicode/CJK Compatibility Ideographs to display properly. KevinUp (talk) 12:13, 26 April 2019 (UTC)[reply]
@KevinUp, Geographyinitiative: I want to point out that the glyph in Unicode charts aren't prescribed: "The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts." Fonts that don't follow the reference glyphs aren't non-compliant to the Unicode Standard. AFAIK, compatibility characters are not meant to have a particular glyph associated with them. It might be better to mention different IVSs associated with the main character. — justin(r)leung (t...) | c=› } 18:48, 26 April 2019 (UTC)[reply]
@Justinrleung: I'd like to point out that compatibility ideographs are actually much more relevant for Japanese and Korean computing. In Japanese computing, up to 67 kyūjitai are encoded using compatibility forms (see the list I have compiled here in the extended content box) while in South Korean computing, some glyphs have compatibility forms assigned to specific Hangeul readings despite no difference in glyph appearance, e.g. (U+F914) for (nak), (U+F95C) for (rak), 樂 (U+F9BF) for (yo), so these compatibility characters have particular glyphs that are associated with them. Of course, Chinese computing doesn't make use of compatibility forms, so it doesn't affect Chinese users. KevinUp (talk) 21:47, 26 April 2019 (UTC)[reply]
@KevinUp: I see. However, in the case of in particular, do you have any sources to say those compatibility characters were encoded because of glyph shape problems? U+FA17 only has a U source, so how do we know it is necessarily associated with the 八 form? U+FAA6 has a KP source, but the chart for the main CJK block doesn't have KP glyphs, so how do we know it is used specifically for the 丷 form (instead of 八)? Also, @Suzukaze-c, any comments? — justin(r)leung (t...) | c=› } 15:19, 27 April 2019 (UTC)[reply]
According to Unicode Standard 4.1 in 2005 [2], (U+FA17) was encoded as one of 32 IBM compatibility ideographs whereas (U+FAA6) was encoded as one of 106 DPRK compatibility ideographs. It seems probable that there were multiple glyphs within the dataset used by IBM and DPRK respectively which prevented unification with (U+76CA). However, the 2005 Unicode chart did not state the glyph sources. Glyph sources for these compatibility ideographs only appeared in Unicode Standard 6.0.0 in 2010 [3].
I managed to trace the U-source glyph for (U+FA17) to this PDF file created in 2010 [4] which shows that UTC-00921 aka U+FA17 is associated specifically with the form containing . Not much can be found regarding UTC-00921, except that it was one of 46 UTC-glyphs submitted by Unicode (see field 6 of this text file - the source tag "TUS" refers to Unicode [5]). Unfortunately, the original IBM conversion mapping tables [6] are not accessible anymore.
As for (U+FAA6), although the main CJK block does not have the KP-glyph, the KP-codepoint for (U+76CA) is KP0-FCC0 [7] which is different from the KP-codepoint of (U+FAA6), KP1-5D48 [8], so these are two different glyphs in North Korean computing. By the way, Korean (, ik) only has one reading. KevinUp (talk) 04:50, 28 April 2019 (UTC)[reply]