Cjkinfo is a project mainly written in ..., it's free.
Information about East-Asian characters, mostly Chinese
Over many years I have collected various bits and pieces of information about East-Asian, especially Chinese characters. There is really no reason to keep this secret, except for the fact that:
This work is licensed under the Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Here is what there is so far:
** Pinyin stuff Sometime in August 2006, I tried to improve my pinyin tables by introducing frequency information, that would allow a program to choose the most frequent one if an automatic decision has to be made. The resulting file is pinyinstat.tab, while most of the other files are intermediate files that I used to produce this. All this is in the pinyin directory.
pinyinstat.tab
the file currently has 81986 lines, including characters from CJK-Extension B, so be careful when handling it.
charratio.tab
pyflat.tab
pybigram.tab
pyprob.txt
pinyin-merge.tab
pinyintable.tab
** Variant stuff
univardb.xml
This file basically encodes the information from the Unihan database as of ca. 2006 into XML.
twjp-vardb.xml
This file groups characters together that can appear exchangeable in certain context. Characters within each group are flagged either with @type='reg' as regular characters (whatever that means) or as 'shinji', which means these are the simplified characters in modern use in Japan (which could be considered regular by some). There is also a @type='tw', which signals a character that would be seen by users in Taiwan or using Traditional Chinese characters. In this group, there is typically another character flagged as @subtype='jp', which means, that this is the form used in Japan.
** More to come
Whenever I have time to dig it out and describe it here.
Questions?
Please ask me at cwittern (at) gmail (dot) com
Christian Wittern