This is a slow ctype library for sjis characters. 高速ではない シフトジス用の ctype 文字類識別ライブラリです。
Revisão | 75c12472459a3f6b60550a58432dcf3b9faac5a2 (tree) |
---|---|
Hora | 2013-07-29 13:07:16 |
Autor | Joel Matthew Rees <reiisi@user...> |
Commiter | Joel Matthew Rees |
Added the makefile, minor corrections to compile on Unix.
@@ -0,0 +1,39 @@ | ||
1 | +# Makefile for a library for handling SJIS characters in C. | |
2 | +# Patterned more or less after the standard ctype library. | |
3 | +# Does not use C's wide or multibyte characters. | |
4 | +# See <http://reiisi.homedns.org/~joel/sannet/sjctype/testproj.html>, | |
5 | +# a copy of which should be in this archive. | |
6 | +# Written by Joel Matthew Rees | |
7 | +# Copyright 2013, Joel Matthew Rees | |
8 | +# Distribution and use permitted under several licenses, see the in-file comments. | |
9 | +# See LICENSE.TXT for the GPL conditions if you don't like the original license. | |
10 | +# GPL also available at <http://www.gnu.org/licenses/>. | |
11 | + | |
12 | + | |
13 | +# Keeping things relatively simple for now. | |
14 | +# Use "sjc"thing, to keep the namespace clean. | |
15 | + | |
16 | +CFLAGS = -Wall | |
17 | + | |
18 | +sjcobjects = slowsjctype.o sjisctypetest.o showch16.o showch8.o port.o | |
19 | + | |
20 | + | |
21 | +all: slowsjctype.o sjisctypetest showch16 showch8 | |
22 | + | |
23 | +slowsjctype.o: slowsjctype.h sj16bitChars.h sj8bitChars.h sjctypenv.h | |
24 | + | |
25 | +sjisctypetest: sjisctypetest.o port.o slowsjctype.o sj16bitChars.h sj8bitChars.h sjctypenv.h | |
26 | + | |
27 | +showch16: showch16.o port.o sj16bitChars.h sjctypenv.h | |
28 | + | |
29 | +showch8: showch8.o port.o | |
30 | + | |
31 | +port.o: port.h | |
32 | + | |
33 | + | |
34 | + | |
35 | + | |
36 | +.PHCLEAN: sjcclean | |
37 | +sjcclean: | |
38 | + -rm $(sjcobjects) | |
39 | + |
@@ -1 +1 @@ | ||
1 | -/* port.h v00.00.00.jmr // Porting constants for the sjis character typing project. // Written by Joel Matthew Rees, January 2001, Hyogo, Japan. // joel_rees@sannet.ne.jp // Derived from work done in Takino, March 2000. // // Copyright 2001 Joel Matthew Rees. // All rights reserved. // // Assignment of Stewardship, or Terms of Use: // // The author grants permission to use and/or redistribute the code in this // file, in either source or translated form, under the following conditions: // 1. When redistributing the source code, the copyright notices and terms of // use must be neither removed nor modified. // 2. When redistributing in a form not generally read by humans, the // copyright notices and terms of use, with proper indication of elements // covered, must be reproduced in the accompanying documentation and/or // other materials provided with the redistribution. In addition, if the // source includes statements designed to compile a copyright notice // into the output object code, the redistributor is required to take // such steps as necessary to preserve the notice in the translated // object code. // 3. Modifications must be annotated, with attribution, including the name(s) // of the author(s) and the contributor(s) thereof, the conditions for // distribution of the modification, and full indication of the date(s) // and scope of the modification. Rights to the modification itself // shall necessarily be retained by the author(s) thereof. // 4. These grants shall not be construed as an assignment or assumption of // liability of any sort or to any degree. Neither shall these grants be // construed as endorsement or represented as such. Any party using this // code in any way does so under the agreement to entirely indemnify the // author and any contributors concerning the code and any use thereof. // Specifically, THIS SOFTWARE IS PROVIDED AT NO COST, AS IT IS, WITHOUT // ANY EXPRESS OR IMPLIED WARRANTY OF ANY SORT, INCLUDING, BUT NOT LIMITED // TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. // UNDER NO CIRCUMSTANCES SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR // ANY DAMAGES WHATSOEVER ARISING FROM ITS USE OR MISUSE, EVEN IF ADVISED // OF THE EXISTENCE OF THE POSSIBILITY OF SUCH DAMAGE. // 5. This code should not be used for any illegal or immoral purpose, // including, but not limited to, the theft of property or services, // deliberate communication of false information, the distribution of drugs // for purposes other than medical, the distribution of pornography, the // provision of illicit sexual services, the maintenance of oppressive // governments or organizations, or the imposture of false religion and // false science. // Any illegal or immoral use incurs natural and legal penalties, which the // author invokes in full force upon the heads of those who so use it. // 6. Alternative redistribution arrangements: // a. If the above conditions are unacceptable, redistribution under the // following commonly used public licenses is expressly permitted: // i. The GNU General Public License (GPL) of the Free Software // Foundation. // ii. The Perl Artistic License, only as a part of Perl. // iii. The Apple Public Source License, only as a part of Darwin or // a Macintosh Operating System using Darwin. // b. No other alternative redistribution arrangement is permitted. // (The original author reserves the right to add to this list.) // c. When redistributing this code under an alternative license, the // specific license being invoked shall be noted immediately beneath // the body of the terms of use. The terms of the license so specified // shall apply only to the redistribution of the source so noted. // 7. In no case shall the rights of the original author to the original work // be impaired by any distribution or redistribution arrangement. // // End of the Assignment of Stewardship, or terms of use. // // License invoked: Assignment of Stewardship. // Notes concerning license: // Compiler directives are strongly encouraged as a means of meeting // the attribution requirements in the Assignment of Stewardship. */ #ifndef PORT_H #define PORT_H #define FOR_MACINTOSH /* #define FOR_COCOA */ /* For example? */ #define ON_CODEWARRIOR /* Encapsulate the burden of starting the user interface, // and hide the conditional compile noise. */ extern void commandLine( int * pArgc, char *** pArgv ); #endif /* ifndef PORT_H */ | |
\ No newline at end of file | ||
1 | +/* port.h v00.00.00.jmr // Porting constants for the sjis character typing project. // Written by Joel Matthew Rees, January 2001, Hyogo, Japan. // joel_rees@sannet.ne.jp // Derived from work done in Takino, March 2000. // // Copyright 2001 Joel Matthew Rees. // All rights reserved. // // Assignment of Stewardship, or Terms of Use: // // The author grants permission to use and/or redistribute the code in this // file, in either source or translated form, under the following conditions: // 1. When redistributing the source code, the copyright notices and terms of // use must be neither removed nor modified. // 2. When redistributing in a form not generally read by humans, the // copyright notices and terms of use, with proper indication of elements // covered, must be reproduced in the accompanying documentation and/or // other materials provided with the redistribution. In addition, if the // source includes statements designed to compile a copyright notice // into the output object code, the redistributor is required to take // such steps as necessary to preserve the notice in the translated // object code. // 3. Modifications must be annotated, with attribution, including the name(s) // of the author(s) and the contributor(s) thereof, the conditions for // distribution of the modification, and full indication of the date(s) // and scope of the modification. Rights to the modification itself // shall necessarily be retained by the author(s) thereof. // 4. These grants shall not be construed as an assignment or assumption of // liability of any sort or to any degree. Neither shall these grants be // construed as endorsement or represented as such. Any party using this // code in any way does so under the agreement to entirely indemnify the // author and any contributors concerning the code and any use thereof. // Specifically, THIS SOFTWARE IS PROVIDED AT NO COST, AS IT IS, WITHOUT // ANY EXPRESS OR IMPLIED WARRANTY OF ANY SORT, INCLUDING, BUT NOT LIMITED // TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. // UNDER NO CIRCUMSTANCES SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR // ANY DAMAGES WHATSOEVER ARISING FROM ITS USE OR MISUSE, EVEN IF ADVISED // OF THE EXISTENCE OF THE POSSIBILITY OF SUCH DAMAGE. // 5. This code should not be used for any illegal or immoral purpose, // including, but not limited to, the theft of property or services, // deliberate communication of false information, the distribution of drugs // for purposes other than medical, the distribution of pornography, the // provision of illicit sexual services, the maintenance of oppressive // governments or organizations, or the imposture of false religion and // false science. // Any illegal or immoral use incurs natural and legal penalties, which the // author invokes in full force upon the heads of those who so use it. // 6. Alternative redistribution arrangements: // a. If the above conditions are unacceptable, redistribution under the // following commonly used public licenses is expressly permitted: // i. The GNU General Public License (GPL) of the Free Software // Foundation. // ii. The Perl Artistic License, only as a part of Perl. // iii. The Apple Public Source License, only as a part of Darwin or // a Macintosh Operating System using Darwin. // b. No other alternative redistribution arrangement is permitted. // (The original author reserves the right to add to this list.) // c. When redistributing this code under an alternative license, the // specific license being invoked shall be noted immediately beneath // the body of the terms of use. The terms of the license so specified // shall apply only to the redistribution of the source so noted. // 7. In no case shall the rights of the original author to the original work // be impaired by any distribution or redistribution arrangement. // // End of the Assignment of Stewardship, or terms of use. // // License invoked: Assignment of Stewardship. // Notes concerning license: // Compiler directives are strongly encouraged as a means of meeting // the attribution requirements in the Assignment of Stewardship. */ #ifndef PORT_H #define PORT_H /* #define FOR_MACINTOSH */ /* #define FOR_COCOA */ /* For example? */ /* #define ON_CODEWARRIOR */ /* Encapsulate the burden of starting the user interface, // and hide the conditional compile noise. */ extern void commandLine( int * pArgc, char *** pArgv ); #endif /* ifndef PORT_H */ | |
\ No newline at end of file |
@@ -1 +1 @@ | ||
1 | -<html> <head> <meta http-equiv="Content-Type" content="text/html;CHARSET=x-sjis"> <title>ctype for Kanji (JIS) Project</title> <meta name="author" content="Joel Matthew Rees - joel_rees@sannet.ne.jp"> <meta name="keywords" content="kanji character classification, ctype, jis, euc, 漢字類, 文字類"> <meta name="description" content="Some things Joel Rees wants to share with people."> </head> <body bgcolor="#FFFFFF" text="#221133" link="#dd5555" vlink="#2087bf" alink="#ff0000"> <p align="center"><b> C Character Classification for Kanji -- shift-JIS </b></p> <div align="center"><table> <tr> <td valign="top"> I have started with a shift-JIS version of this library because my machinery has fonts and input methods for shift-JIS. </td> <td valign="top"> 機械の 書体や 入力式が シフトJIS です。 因って、 ライブラリの 初版が シフトJIS なのです。 </td> </tr> <tr> <td valign="top"> <excuse>What with the move and the new job and all, I have only been able to work on the library an average of less than an hour a day.</excuse> So I am uploading what I have done now, and I will continue updating as I complete the tests. I am also uploading the test project and two intermediate projects which I used to build parts of the library source. </td> <td valign="top"> <申訳>引っ越し した事と 新しい仕事が できた などがあって、 ライブラリの 一日の 平均作業工数を 一時間ほども 捗っていません。</申訳> 取りあえず 今できた分を 載せて 置くことに しました。 テストの 過程が 進める度、 更新して おきます。 テスト プロジェクトも ライブラリの ソースを 組み合わせる 作業に 利用した 2つの 中間プロジェクト も載せて 置きました。 </td> </tr> <tr> <td valign="top"> "Free source" is something of a simplistic explanation, if it is not a myth. (The folks at the <a href="http://www.fsf.org">Free Software Foundation</a> basically say as much.) All program code has costs and contracts/covenants. Some is bound more to the rules of money, illusion, and arbitrary obligations. Some is bound rather to rules of personal effort and responsibility, giving freer reign to the hand of God. </td> <td valign="top"> 「フリー ソース」 というものが ちょっと 簡単過ぎな 説明で なければ、 迷信話し です。 (<a href="http://www.fsf.org">フリー ソフトウェア 財団</a> の方々も こんな類いの 説明を してくれます。) プログラム コードは 何れも 費用を かかるし、 約定や 契約に 係わるのです。 あるものは 金銭的な 規定や 幻の制約 及び 独断義理に 縛られているのに、 また違うものが 個人努力責務の 制約に 結び付けられています。 後方は 神様の 御手が もっと自由に その御業を 進ませられるのが ございます。 </td> </tr> <tr> <td valign="top"> All sources and local headers specified for these projects are released under God's assignment of stewardship, as detailed in the comments in the tops of the files. For those who don't want to deal with the concepts of God and stewardship, re-distribution is also permitted under the GNU General Public License or certain other arrangements, as explained in the same comments. </td> <td valign="top"> この プロジェクトの ソース ファイル 及び ローカル インクルード ファイルが 全て、 各々の ファイルの 上部にある コメント文に (英語で) 記した 通りに、 神様の 「assignment of stewardship」 という 許諾制約を 指定して 公開しています。 神様や 「stewardship」 については 納得できない 方々のために 同コメント内に 「GNU General Public License」 その他、 私が認める 公許諾制約を 指定しました。 </td> </tr> <tr> <td valign="top"> No other re-distribution is permitted. </td> <td valign="top"> 指定した やり方以外は 再配布は 許可しません。 </td> </tr> <tr> <td valign="top"> If you have questions, contact me at joel_rees@sannet.ne.jp. </td> <td valign="top"> ご 問い合わせは joel_rees@sannet.ne.jp へどうぞ。 日本語の ご相談は とにかく 努力いたします。 </td> </tr> </table></div> <br /> <br /> <div align="center"><table> <tr> <td valign="top"> Since I am doing this in Codewarrior (Mac OS X final is UNICODE!), I haven't made makefiles. So I put the dependency information in the tables below. This is easier for me for now. Each file is hyper-linked below. </td> <td valign="top"> コード ウォーリアー で作って いますから (Mac OS X ってユニコードだよん!) メーク ファイルを 作りません でした。 ですから、 以下の 表に 依存関係の ことを 記しました。 申し訳 ございませんが 今は このままが (僕には) 簡単ですから。 以下の ファイルが ハイパー リンク されています。 </td> </tr> </table></div> <br /> <br /> <div align="center"> <table border="4"> <caption> Using slow version library: </caption> <tr> <td align="center">library object</td> <td align="center">library header</td> <td align="center">included files</td> <td align="center">notes</td> </tr> <tr> <td>slowsjctype.o<br /> (name depends on environment)</td> <td><a href="slowsjctype.h">slowsjctype.h</a></td> <td><strike>sjctypenv.h</strike></td> <td>Indirect dependency on sjctypenv.h was a slip. Removed.</td> </tr> </table> </div> <br /> <br /> <div align="center"> <table border="4"> <caption> Building slow version library: </caption> <tr> <td align="center">library source</td> <td align="center">library header</td> <td align="center">included files</td> <td align="center">notes</td> </tr> <tr> <td><a href="slowsjctype.c">slowsjctype.c</a></td> <td><a href="slowsjctype.h">slowsjctype.h</a></td> <td><a href="sjctypenv.h">sjctypenv.h</a><br /> <a href="sj16bitChars.h">sj16bitChars.h</a><br /> <a href="sj8bitChars.h">sj8bitChars.h</a><br /> </td> <td>--</td> </tr> </table> </div> <br /> <br /> <div align="center"> <table border="4"> <caption> Testing slow version: </caption> <tr> <td align="center">source</td> <td align="center">header</td> <td align="center">included files</td> <td align="center">notes</td> </tr> <tr> <td><a href="sjisctypetest.c">sjisctypetest.c</a></td> <td>--</td> <td><a href="slowsjctype.h">slowsjctype.h</a><br /> <a href="sjctypenv.h">sjctypenv.h</a><br /> <a href="sj16bitChars.h">sj16bitChars.h</a><br /> <a href="sj8bitChars.h">sj8bitChars.h</a><br /> <a href="port.h">port.h</a><br /> </td> <td>--</td> </tr> <tr> <td><a href="slowsjctype.c">slowsjctype.c</a></td> <td><a href="slowsjctype.h">slowsjctype.h</a></td> <td><a href="sjctypenv.h">sjctypenv.h</a><br /> <a href="sj16bitChars.h">sj16bitChars.h</a><br /> <a href="sj8bitChars.h">sj8bitChars.h</a><br /> </td> <td>--</td> </tr> <tr> <td><a href="port.c">port.c</a></td> <td><a href="port.h">port.h</a></td> <td>--</td> <td>--</td> </tr> </table> </div> <br /> <br /> <div align="center"> <table border="4"> <caption> Building 8-bit helper: </caption> <tr> <td align="center">source</td> <td align="center">header</td> <td align="center">included files</td> <td align="center">helps</td> <td align="center">notes</td> </tr> <tr> <td><a href="showch8.c">showch8.c</a></td> <td>--</td> <td><a href="port.h">port.h</a></td> <td><a href="sj8bitChars.h">sj8bitChars.h</a></td> <td>Lack of self inclusion clears dependency loop.</td> </tr> <tr> <td><a href="port.c">port.c</a></td> <td><a href="port.h">port.h</a></td> <td>--</td> <td>--</td> <td>--</td> </tr> </table> </div> <br /> <br /> <div align="center"> <table border="4"> <caption> Building 16-bit helper: </caption> <tr> <td align="center">source</td> <td align="center">header</td> <td align="center">included files</td> <td align="center">helps</td> <td align="center">notes</td> </tr> <tr> <td><a href="showch16.c">showch16.c</a></td> <td>--</td> <td><a href="sj16bitChars.h">sj16bitChars.h</a><br /> <a href="sjctypenv.h">sjctypenv.h</a><br /> <a href="port.h">port.h</a><br /> </td> <td><a href="sj16bitChars.h">sj16bitChars.h</a></td> <td>Self inclusion performs some checks.</td> </tr> <tr> <td><a href="port.c">port.c</a></td> <td><a href="port.h">port.h</a></td> <td>--</td> <td>--</td> <td>--</td> </tr> </table> </div> <br /> <br /> <dl> <dt>slowsjctype.o</dt> <dd>Library object file. You have to build this yourself. The actual name of the library object file will depend on the operating system and the programming environment and settings you are using. I may have time to pre-build a few after I finish testing, but if you have reason to use this library, you should be able to build it yourself. </dd> <dt><a href="slowsjctype.h">slowsjctype.h</a></dt> <dd>Library header file. To actually use the library, all you need is this header and the object library (which you must build). </dd> <dt><a href="slowsjctype.c">slowsjctype.c</a></dt> <dd>Source for the library.</dd> <dt><a href="sjctypenv.h">sjctypenv.h</a></dt> <dd>This is mostly a place to keep the ubyte definition I used internally for keeping the code clean (thereby reducing the test burden). It is probably best to ignore the bool typedef. </dd> <dt><a href="sj16bitChars.h">sj16bitChars.h</a></dt> <dd>Isolates 16-bit character constants and makes them legible in an eight bit world. Also somewhat alleviates the problems of not having a Japanese font. </dd> <dt><a href="sj8bitChars.h">sj8bitChars.h</a></dt> <dd>Isolates 8-bit character constants and makes them visible in a world where a Japanese font may not be available for use with the source editor. </dd> <dt><a href="sjisctypetest.c">sjisctypetest.c</a></dt> <dd>This is the source of the test program. Visually checking the output of a simple loop turns out to really burn time and disk space (and put one to sleep). Testing the output as it is produced is not a perfect method, but it helps me focus on the necessary places for the visual check. To put as much distance as possible between the tests and the library, I have directly used hexadecimal numeric and character constants in the tests. </dd> <dt><a href="port.h">port.h</a></dt> <dd>This is where I have tried to hide declaration aspects of system dependencies. Porting should focus here second, and leave other source alone as much as possible. </dd> <dt><a href="port.c">port.c</a></dt> <dd>This is where I have tried to hide definition aspects of system dependencies. Porting should focus here first, and leave other source alone as much as possible. </dd> <dt><a href="showch8.c">showch8.c</a></dt> <dd>This file generates most of the 8-bit character constants automatically. It's a bit of a kludge. I refined my methods as I went, and didn't take time to ripple the refinements back. I apologize for allowing this to be seen in public. ;-/ </dd> <dt><a href="showch16.c">showch16.c</a></dt> <dd>This file generates most of the 16-bit character constants automatically. It reflects a lot of ideas I picked up while building showch8.c, so it is hopefully a little less of a kludge than showch8.c. </dd> <dt>project files and make files</dt> <dd>Again, once I get past the testing, I may have time to make some project files and makefiles available. The dependencies are shown above, you should be able to slap your own together without too much effort. On the other hand, if you haven't learned how yet, now is a good time. A small project like this might be good to practice on. ;-> </dd> </dl> <p>全部を日本語になおす時間がなく、お詫びいたします。 ライセンスについて質問がございましたら、ぜひ、ご連絡ください。 </p> <center><a href="../index.html">Home</a></center><br /> <br /> <div align="right">^v:jtype c00.00.0ej 2001.05.30</div><br /> </body> </html> | |
\ No newline at end of file | ||
1 | +<html> <head> <meta http-equiv="Content-Type" content="text/html;CHARSET=x-sjis"> <title>ctype for Kanji (JIS) Project</title> <meta name="author" content="Joel Matthew Rees - joel_rees@sannet.ne.jp"> <meta name="keywords" content="kanji character classification, ctype, jis, euc, 漢字類, 文字類"> <meta name="description" content="Some things Joel Rees wants to share with people."> </head> <body bgcolor="#FFFFFF" text="#221133" link="#dd5555" vlink="#2087bf" alink="#ff0000"> <p align="center"><b> C Character Classification for Kanji -- shift-JIS </b></p> <div align="center"><table> <tr> <td valign="top"> I have started with a shift-JIS version of this library because my machinery has fonts and input methods for shift-JIS. </td> <td valign="top"> 機械の 書体や 入力式が シフトJIS です。 因って、 ライブラリの 初版が シフトJIS なのです。 </td> </tr> <tr> <td valign="top"> <excuse>What with the move and the new job and all, I have only been able to work on the library an average of less than an hour a day.</excuse> So I am uploading what I have done now, and I will continue updating as I complete the tests. I am also uploading the test project and two intermediate projects which I used to build parts of the library source. </td> <td valign="top"> <申訳>引っ越し した事と 新しい仕事が できた などがあって、 ライブラリの 一日の 平均作業工数を 一時間ほども 捗っていません。</申訳> 取りあえず 今できた分を 載せて 置くことに しました。 テストの 過程が 進める度、 更新して おきます。 テスト プロジェクトも ライブラリの ソースを 組み合わせる 作業に 利用した 2つの 中間プロジェクト も載せて 置きました。 </td> </tr> <tr> <td valign="top"> "Free source" is something of a simplistic explanation, if it is not a myth. (The folks at the <a href="http://www.fsf.org">Free Software Foundation</a> basically say as much.) All program code has costs and contracts/covenants. Some is bound more to the rules of money, illusion, and arbitrary obligations. Some is bound rather to rules of personal effort and responsibility, giving freer reign to the hand of God. </td> <td valign="top"> 「フリー ソース」 というものが ちょっと 簡単過ぎな 説明で なければ、 迷信話し です。 (<a href="http://www.fsf.org">フリー ソフトウェア 財団</a> の方々も こんな類いの 説明を してくれます。) プログラム コードは 何れも 費用を かかるし、 約定や 契約に 係わるのです。 あるものは 金銭的な 規定や 幻の制約 及び 独断義理に 縛られているのに、 また違うものが 個人努力責務の 制約に 結び付けられています。 後方は 神様の 御手が もっと自由に その御業を 進ませられるのが ございます。 </td> </tr> <tr> <td valign="top"> All sources and local headers specified for these projects are released under God's assignment of stewardship, as detailed in the comments in the tops of the files. For those who don't want to deal with the concepts of God and stewardship, re-distribution is also permitted under the GNU General Public License or certain other arrangements, as explained in the same comments. </td> <td valign="top"> この プロジェクトの ソース ファイル 及び ローカル インクルード ファイルが 全て、 各々の ファイルの 上部にある コメント文に (英語で) 記した 通りに、 神様の 「assignment of stewardship」 という 許諾制約を 指定して 公開しています。 神様や 「stewardship」 については 納得できない 方々のために 同コメント内に 「GNU General Public License」 その他、 私が認める 公許諾制約を 指定しました。 </td> </tr> <tr> <td valign="top"> No other re-distribution is permitted. </td> <td valign="top"> 指定した やり方以外は 再配布は 許可しません。 </td> </tr> <tr> <td valign="top"> If you have questions, contact me at joel_rees@sannet.ne.jp. </td> <td valign="top"> ご 問い合わせは joel_rees@sannet.ne.jp へどうぞ。 日本語の ご相談は とにかく 努力いたします。 </td> </tr> </table></div> <br /> <br /> <div align="center"><table> <tr> <td valign="top"> Since I am doing this in Codewarrior (Mac OS X final is UNICODE!), I haven't made makefiles. So I put the dependency information in the tables below. This is easier for me for now. Each file is hyper-linked below. </td> <td valign="top"> コード ウォーリアー で作って いますから (Mac OS X ってユニコードだよん!) メーク ファイルを 作りません でした。 ですから、 以下の 表に 依存関係の ことを 記しました。 申し訳 ございませんが 今は このままが (僕には) 簡単ですから。 以下の ファイルが ハイパー リンク されています。 </td> </tr> </table></div> <br /> <br /> <div align="center"> <table border="4"> <caption> Using slow version library: </caption> <tr> <td align="center">library object</td> <td align="center">library header</td> <td align="center">included files</td> <td align="center">notes</td> </tr> <tr> <td>slowsjctype.o<br /> (name depends on environment)</td> <td><a href="slowsjctype.h">slowsjctype.h</a></td> <td><strike>sjctypenv.h</strike></td> <td>Indirect dependency on sjctypenv.h was a slip. Removed.</td> </tr> </table> </div> <br /> <br /> <div align="center"> <table border="4"> <caption> Building slow version library: </caption> <tr> <td align="center">library source</td> <td align="center">library header</td> <td align="center">included files</td> <td align="center">notes</td> </tr> <tr> <td><a href="slowsjctype.c">slowsjctype.c</a></td> <td><a href="slowsjctype.h">slowsjctype.h</a></td> <td><a href="sjctypenv.h">sjctypenv.h</a><br /> <a href="sj16bitChars.h">sj16bitChars.h</a><br /> <a href="sj8bitChars.h">sj8bitChars.h</a><br /> </td> <td>--</td> </tr> </table> </div> <br /> <br /> <div align="center"> <table border="4"> <caption> Testing slow version: </caption> <tr> <td align="center">source</td> <td align="center">header</td> <td align="center">included files</td> <td align="center">notes</td> </tr> <tr> <td><a href="sjisctypetest.c">sjisctypetest.c</a></td> <td>--</td> <td><a href="slowsjctype.h">slowsjctype.h</a><br /> <a href="sjctypenv.h">sjctypenv.h</a><br /> <a href="sj16bitChars.h">sj16bitChars.h</a><br /> <a href="sj8bitChars.h">sj8bitChars.h</a><br /> <a href="port.h">port.h</a><br /> </td> <td>--</td> </tr> <tr> <td><a href="slowsjctype.c">slowsjctype.c</a></td> <td><a href="slowsjctype.h">slowsjctype.h</a></td> <td><a href="sjctypenv.h">sjctypenv.h</a><br /> <a href="sj16bitChars.h">sj16bitChars.h</a><br /> <a href="sj8bitChars.h">sj8bitChars.h</a><br /> </td> <td>--</td> </tr> <tr> <td><a href="port.c">port.c</a></td> <td><a href="port.h">port.h</a></td> <td>--</td> <td>--</td> </tr> </table> </div> <br /> <br /> <div align="center"> <table border="4"> <caption> Building 8-bit helper: </caption> <tr> <td align="center">source</td> <td align="center">header</td> <td align="center">included files</td> <td align="center">helps</td> <td align="center">notes</td> </tr> <tr> <td><a href="showch8.c">showch8.c</a></td> <td>--</td> <td><a href="port.h">port.h</a></td> <td><a href="sj8bitChars.h">sj8bitChars.h</a></td> <td>Lack of self inclusion clears dependency loop.</td> </tr> <tr> <td><a href="port.c">port.c</a></td> <td><a href="port.h">port.h</a></td> <td>--</td> <td>--</td> <td>--</td> </tr> </table> </div> <br /> <br /> <div align="center"> <table border="4"> <caption> Building 16-bit helper: </caption> <tr> <td align="center">source</td> <td align="center">header</td> <td align="center">included files</td> <td align="center">helps</td> <td align="center">notes</td> </tr> <tr> <td><a href="showch16.c">showch16.c</a></td> <td>--</td> <td><a href="sj16bitChars.h">sj16bitChars.h</a><br /> <a href="sjctypenv.h">sjctypenv.h</a><br /> <a href="port.h">port.h</a><br /> </td> <td><a href="sj16bitChars.h">sj16bitChars.h</a></td> <td>Self inclusion performs some checks.</td> </tr> <tr> <td><a href="port.c">port.c</a></td> <td><a href="port.h">port.h</a></td> <td>--</td> <td>--</td> <td>--</td> </tr> </table> </div> <br /> <br /> <dl> <dt>slowsjctype.o</dt> <dd>Library object file. You have to build this yourself. The actual name of the library object file will depend on the operating system and the programming environment and settings you are using. I may have time to pre-build a few after I finish testing, but if you have reason to use this library, you should be able to build it yourself. </dd> <dt><a href="slowsjctype.h">slowsjctype.h</a></dt> <dd>Library header file. To actually use the library, all you need is this header and the object library (which you must build). </dd> <dt><a href="slowsjctype.c">slowsjctype.c</a></dt> <dd>Source for the library.</dd> <dt><a href="sjctypenv.h">sjctypenv.h</a></dt> <dd>This is mostly a place to keep the ubyte definition I used internally for keeping the code clean (thereby reducing the test burden). It is probably best to ignore the bool typedef. </dd> <dt><a href="sj16bitChars.h">sj16bitChars.h</a></dt> <dd>Isolates 16-bit character constants and makes them legible in an eight bit world. Also somewhat alleviates the problems of not having a Japanese font. </dd> <dt><a href="sj8bitChars.h">sj8bitChars.h</a></dt> <dd>Isolates 8-bit character constants and makes them visible in a world where a Japanese font may not be available for use with the source editor. </dd> <dt><a href="sjisctypetest.c">sjisctypetest.c</a></dt> <dd>This is the source of the test program. Visually checking the output of a simple loop turns out to really burn time and disk space (and put one to sleep). Testing the output as it is produced is not a perfect method, but it helps me focus on the necessary places for the visual check. To put as much distance as possible between the tests and the library, I have directly used hexadecimal numeric and character constants in the tests. </dd> <dt><a href="port.h">port.h</a></dt> <dd>This is where I have tried to hide declaration aspects of system dependencies. Porting should focus here second, and leave other source alone as much as possible. </dd> <dt><a href="port.c">port.c</a></dt> <dd>This is where I have tried to hide definition aspects of system dependencies. Porting should focus here first, and leave other source alone as much as possible. </dd> <dt><a href="showch8.c">showch8.c</a></dt> <dd>This file generates most of the 8-bit character constants automatically. It's a bit of a kludge. I refined my methods as I went, and didn't take time to ripple the refinements back. I apologize for allowing this to be seen in public. ;-/ </dd> <dt><a href="showch16.c">showch16.c</a></dt> <dd>This file generates most of the 16-bit character constants automatically. It reflects a lot of ideas I picked up while building showch8.c, so it is hopefully a little less of a kludge than showch8.c. </dd> <dt>project files and make files</dt> <dd>Again, once I get past the testing, I may have time to make some project files and makefiles available. The dependencies are shown above, you should be able to slap your own together without too much effort. On the other hand, if you haven't learned how yet, now is a good time. A small project like this might be good to practice on. ;-> </dd> </dl> <p>全部を日本語になおす時間がなく、お詫びいたします。 ライセンスについて質問がございましたら、ぜひ、ご連絡ください。 </p> <center><a href="http://reiisi.homedns.org/~joel/sannet/">Home</a></center><br /> <br /> <div align="right">^v:jtype c00.00.0ej 2001.05.30</div><br /> </body> </html> | |
\ No newline at end of file |
@@ -1 +1 @@ | ||
1 | -/* slowsjctype.c v00.00.01.jmr // Near-ctype functions for shift-JIS characters, slow version. // Written by Joel Matthew Rees, Amagasaki, Hyogo, Japan, beginning April 2001. // joel_rees@sannet.ne.jp // // Shifting strategy for usability in current C environments: // pass char pointers instead of unsigned char pointers. // Also, adding P to names to emphasize pointer usage. // // Copyright 2000, 2001 Joel Matthew Rees. // All rights reserved. // // Assignment of Stewardship, or Terms of Use: // // The author grants permission to use and/or redistribute the code in this // file, in either source or translated form, under the following conditions: // 1. When redistributing the source code, the copyright notices and terms of // use must be neither removed nor modified. // 2. When redistributing in a form not generally read by humans, the // copyright notices and terms of use, with proper indication of elements // covered, must be reproduced in the accompanying documentation and/or // other materials provided with the redistribution. In addition, if the // source includes statements designed to compile a copyright notice // into the output object code, the redistributor is required to take // such steps as necessary to preserve the notice in the translated // object code. // 3. Modifications must be annotated, with attribution, including the name(s) // of the author(s) and the contributor(s) thereof, the conditions for // distribution of the modification, and full indication of the date(s) // and scope of the modification. Rights to the modification itself // shall necessarily be retained by the author(s) thereof. // 4. These grants shall not be construed as an assignment or assumption of // liability of any sort or to any degree. Neither shall these grants be // construed as endorsement or represented as such. Any party using this // code in any way does so under the agreement to entirely indemnify the // author and any contributors concerning the code and any use thereof. // Specifically, THIS SOFTWARE IS PROVIDED AT NO COST, AS IT IS, WITHOUT // ANY EXPRESS OR IMPLIED WARRANTY OF ANY SORT, INCLUDING, BUT NOT LIMITED // TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. // UNDER NO CIRCUMSTANCES SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR // ANY DAMAGES WHATSOEVER ARISING FROM ITS USE OR MISUSE, EVEN IF ADVISED // OF THE EXISTENCE OF THE POSSIBILITY OF SUCH DAMAGE. // 5. This code should not be used for any illegal or immoral purpose, // including, but not limited to, the theft of property or services, // deliberate communication of false information, the distribution of drugs // for purposes other than medical, the distribution of pornography, the // provision of illicit sexual services, the maintenance of oppressive // governments or organizations, or the imposture of false religion and // false science. // Any illegal or immoral use incurs natural and legal penalties, which the // author invokes in full force upon the heads of those who so use it. // 6. Alternative redistribution arrangements: // a. If the above conditions are unacceptable, redistribution under the // following commonly used public licenses is expressly permitted: // i. The GNU General Public License (GPL) of the Free Software // Foundation. // ii. The Perl Artistic License, only as a part of Perl. // iii. The Apple Public Source License, only as a part of Darwin or // a Macintosh Operating System using Darwin. // b. No other alternative redistribution arrangement is permitted. // (The original author reserves the right to add to this list.) // c. When redistributing this code under an alternative license, the // specific license being invoked shall be noted immediately beneath // the body of the terms of use. The terms of the license so specified // shall apply only to the redistribution of the source so noted. // 7. In no case shall the rights of the original author to the original work // be impaired by any distribution or redistribution arrangement. // // End of the Assignment of Stewardship, or terms of use. // // License invoked: Assignment of Stewardship. // Notes concerning license: // Compiler directives are strongly encouraged as a means of meeting // the attribution requirements in the Assignment of Stewardship. */ /* Primary references for the ranges chosen below: // // Character palette from Apple's Kotoeri input method, systems 7/8/9. // Publisher: Apple, included with Apple's Macintosh operating systems. // The character palettes since sys. 8.0 or 8.1 have included primary pronunciations, // as well as JIS, kuten, and UNICODE assignments, in a detailed view. // Since at least sys. 8.5 or 8.6, a flag appears when a non-standard character is selected. // Newer versions track the changes to the various standards. // // Pasokon/Waapuro Kanji Jiten, 1987 Edition // Compiler: Tsutomu Uegaki; Publisher: Natsume-sha (Chiyouda-ku). // Lists and tables of Kanji and other JIS characters and character codes. // Contains a nice rectanglular arrangement of Kanji on pages 588-599. // // Waapuro/Pasokon Saishin Kanji Jiten, 1st Edition (1994) // Compiler: Shougakukan Dictionary Editors Department; // Publisher: Shougakukan (Chiyouda-ku). // Lists and tables of Kanji and other JIS characters and character codes. // Includes a list of the proposed annex characters, with annex numbers. // The annex characters have been assigned actual codes since this edition was published. // // Pasokon Yougo Jiten, 1992-93 Edition // Authors: Shigeru Okamoto, Ichirou Senba, Yoshiaki Nakamura, Kazuko Takahashi; // Publisher: Gijutsu Hyouron-sha (Shinjuku-ku). // Dictionary of personal computer terminology, // particularly referenced the JIS/ISO/ANSI 8-bit character tables starting page 409. */ #include "sjctypenv.h" #include "sj8bitchars.h" #include "sj16bitchars.h" #include "slowsjctype.h" /* Because char is probably signed, // it is usually liable to induce errors to use escaped char constant notation. // '\x80' may well be something like 0xffffff80, rather than 0x80. // Hopefully, I have been consistent about this. <erg/> // Note the problems when comparing a char variable with a character constant: // char scan; . . . while ( scan <= 0x9f ) // will produce an infinite loop, which is probably not the desired effect. // 0x9f is an integer equal to decimal 159. // '\x9f' is a char and promotes to integer with sign extension: // ( -( 256 - 159 ) ) == ( -97 ) // Two's complement. // . . . while ( scan <= 'x9f' ) // will probably produce the desired result, but by an un-expected calculation. // For instance, // scan = 0x9e; if ( scan < '\x9f' ) // yields true because -98 is less than -97, not because 158 is less than 159. // I tend to forget which is which in the middle of loops, // so I usually use long integers in loops (which is a good idea anyway) // and avoid comparing to integer constants. // This is also a reason I use symbolic constants instead of directly using characters. // // This shows one of the many reasons for having some means of dialect control, // instead of constraining the one-and-only standard in ways that turn out to be non-optimal. */ /* Cleared the unwanted dependency on sjctypenv.h (bool) -- JMR2001.05.31 // This required changing the bool typed functions to int typed functions, as noted below. // This mod by Joel Matthew Rees, released under original terms of use. */ int slowsjIsPOneByte( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int b = * ( (ubyte *) chp ); return b < 0x80 || ( b >= 0xa1 && b <= 0xdf ); } int slowsjIsPHighByte( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int bHi = ( (ubyte *) chp )[ 0 ]; return ( bHi >= 0x81 && bHi <= 0x9f ) || ( bHi >= 0xe0 && bHi <= 0xfc ); } int slowsjIsPLowByte( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int bLo = * ( (ubyte *) chp ); return bLo >= 0x40 && bLo <= 0xfc && bLo != 0x7f; } int slowsjIsP7bit( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int bLo = * ( (ubyte *) chp ); return bLo < 0x80; } int slowsjPGuessCount( char * chp ) { return ( slowsjIsPHighByte( chp ) && slowsjIsPLowByte( chp + 1 ) ) ? 2 : slowsjIsPOneByte( chp ) ? 1 : 0; } int slowsjIsPCntrl( char * chp ) { int uch = (ubyte) chp[ 0 ]; return ( uch <= 0x1f || uch == 0x7f ) ? 1 : 0; /* DEL added JMR2001.05.23 */ /* The standard doesn't know for unit separator. */ } int slowsjIsPSpace( char * chp ) { ubyte * uchp = (ubyte *) chp; switch ( * uchp ) { case b7_HT: case b7_LF: case b7_VT: case b7_FF: case b7_CR: case b7_SP: return 1; default: return ( uchp[ 0 ] == b16_SP[ 0 ] && uchp[ 1 ] == b16_SP[ 1 ] ) ? 2 : 0; /* 0x8140 is sjis 2-byte space */ } } int slowsjIsPDigit( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_ZERO[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_ZERO[ 1 ] && b <= b16_NINE[ 1 ] ) ? 2 : 0; } else { return ( b >= b7_ZERO && b <= b7_NINE ) ? 1 : 0; } } int slowsjIsPXDigit( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_A[ 0 ] ) { b = uchp[ 1 ]; return ( ( b >= b16_A[ 1 ] && b <= b16_F[ 1 ] ) || ( b >= b16_a[ 1 ] && b <= b16_f[ 1 ] ) ) ? 2 : slowsjIsPDigit( chp ); } else { return ( ( b >= b7_A && b <= b7_F ) || ( b >= b7_a && b <= b7_f ) ) ? 1 : slowsjIsPDigit( chp ); } } int slowsjIsPRomanLower( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_a[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_a[ 1 ] && b <= b16_z[ 1 ] && b != 0x7f ) ? 2 : 0; } else { return ( b >= b7_a && b <= b7_z ) ? 1 : 0; } } int slowsjIsPRomanUpper( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_A[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_A[ 1 ] && b <= b16_Z[ 1 ] && b != 0x7f ) ? 2 : 0; } else { return ( b >= b7_A && b <= b7_Z ) ? 1 : 0; } } /* Time biased against upper case, but we don't care on the slow version. */ int slowsjIsPRoman( char * chp ) { int result = slowsjIsPRomanLower( chp ); if ( result == 0 ) result = slowsjIsPRomanUpper( chp ); return result; } int slowsjIsPGreekLower( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_alpha[ 0 ] ) && ( b >= b16_alpha[ 1 ] && b <= b16_omega[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPGreekUpper( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_ALPHA[ 0 ] ) && ( b >= b16_ALPHA[ 1 ] && b <= b16_OMEGA[ 1 ] && b != 0x7f ) ) ? 2 : 0; } /* Time biased against upper case, but we don't care on the slow version. */ int slowsjIsPGreek( char * chp ) { int result = slowsjIsPGreekLower( chp ); if ( result == 0 ) slowsjIsPGreekUpper( chp ); return result; } int slowsjIsPRussianLower( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_Russian_a[ 0 ] ) && ( b >= b16_Russian_a[ 1 ] && b <= b16_Russian_ya[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPRussianUpper( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_Russian_A[ 0 ] ) && ( b >= b16_Russian_A[ 1 ] && b <= b16_Russian_YA[ 1 ] && b != 0x7f ) ) ? 2 : 0; } /* Time biased against upper case, but we don't care on the slow version. */ int slowsjIsPRussian( char * chp ) { int result = slowsjIsPRussianLower( chp ); if ( result == 0 ) slowsjIsPRussianUpper( chp ); return result; } /* Time biased against Greek and Russian, but we don't care on the slow version. */ int slowsjIsPUpper( char * chp ) { int result = slowsjIsPRomanUpper( chp ); if ( result == 0 ) result = slowsjIsPGreekUpper( chp ); if ( result == 0 ) result = slowsjIsPRussianUpper( chp ); return result; } /* Time biased against Greek and Russian, but we don't care on the slow version. */ int slowsjIsPLower( char * chp ) { int result = slowsjIsPRomanLower( chp ); if ( result == 0 ) result = slowsjIsPGreekLower( chp ); if ( result == 0 ) result = slowsjIsPRussianLower( chp ); return result; } /* Time biased against Greek and Russian, but we don't care on the slow version. */ int slowsjIsPEurAsianAlpha( char * chp ) { int result = slowsjIsPRoman( chp ); if ( result == 0 ) result = slowsjIsPGreek( chp ); if ( result == 0 ) result = slowsjIsPRussian( chp ); return result; } int slowsjIsPQuasiEurAsianAlpha( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_AccentAcute_Prime[ 0 ] ) { b = uchp[ 1 ]; return ( b == b16_AccentAcute_Prime[ 1 ] || b == b16_AccentGrave[ 1 ] || b == b16_Umlaut[ 1 ] || b == b16_AccentCircumflex[ 1 ] || b == b16_Overline_Negate[ 1 ] || b == b16_QuarterDash_Hyphen[ 1 ] || b == b16_WavyDash_Tilde[ 1 ] ) ? 2 : 0; } else { return ( b == b7_HYPHEN || b == b7_ACCENTGRAVE || b == b7_TILDE || b == b7_CARET ) ? 1 : 0; } } int slowsjIsPHiragana( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_hiraganaSub_a[ 0 ] ) && ( b >= b16_hiraganaSub_a[ 1 ] && b <= b16_hiragana_ng[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPKatakana( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_katakanaSub_a[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_katakanaSub_a[ 1 ] && b <= b16_katakanaSub_ke[ 1 ] && b != 0x7f ) ? 2 : 0; } else { return ( ( b >= b8_katakana_wo && b <= b8_katakanaSub_tu ) || ( b >= b8_katakana_a && b <= b8_katakana_ng ) ) ? 1 : 0; } } /* Time biased against katakana, but we don't care on the slow version. */ int slowsjIsPKana( char * chp ) { int result = slowsjIsPHiragana( chp ); if ( result == 0 ) result = slowsjIsPKatakana( chp ); return result; } int slowsjIsPQuasiKana( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_DakuTen[ 0 ] ) { b = uchp[ 1 ]; return ( b == b16_DakuTen[ 1 ] || b == b16_HanDakuTen[ 1 ] || b == b16_KatakanaRepeat[ 1 ] || b == b16_KatakanaRepeatVoiced[ 1 ] || b == b16_HiraganaRepeat[ 1 ] || b == b16_HiraganaRepeatVoiced[ 1 ] || b == b16_ChoOn[ 1 ] ) ? 2 : 0; } else { return ( b == b8_ChoOn || b == b8_DakuTen || b == b8_HandakuTen ) ? 1 : 0; } } /* This has even time-bias for JIS level 1. */ int slowsjIsPKanji( char * chp ) { ubyte * uchp = (ubyte *) chp; int bHi = uchp[ 0 ]; int bLo = uchp[ 1 ]; if ( slowsjIsPHighByte( chp ) && slowsjIsPLowByte( chp + 1 ) && ( ( bHi == b16_kanji1Low_a[ 0 ] && bLo >= b16_kanji1Low_a[ 1 ] ) || ( bHi > b16_kanji1Low_a[ 0 ] && bHi < b16_kanji1High_ude[ 0 ] ) || ( bHi == b16_kanji1High_ude[ 0 ] && bLo <= b16_kanji1High_ude[ 1 ] ) || ( bHi == b16_kanji2aLow_ichi[ 0 ] && bLo >= b16_kanji2aLow_ichi[ 1 ] ) || ( bHi > b16_kanji2aLow_ichi[ 0 ] && bHi <= b16_kanji2aHigh_jou[ 0 ] ) /* The rows at the end of 2a and beginning of 2b are complete. */ || ( bHi >= b16_kanji2bLow_you[ 0 ] && bHi <= b16_kanji2bHigh_hikaru[ 0 ] ) || ( bHi == b16_kanji2bHigh_hikaru[ 0 ] && bLo <= b16_kanji2bHigh_hikaru[ 1 ] ) ) ) return 2; else return 0; } /* This is completely time-biased against kanji, and a little harder to mentally verify. { ubyte * uchp = (ubyte *) chp; int bHi = uchp[ 0 ]; int bLo = uchp[ 1 ]; if ( !slowsjIsPHighByte( chp ) || !slowsjIsPLowByte( chp + 1 ) || bHi < b16_kanji1Low_a_sub[ 0 ] || ( bHi == b16_kanji1Low_a_sub[ 0 ] && bLo < b16_kanji1Low_a_sub[ 1 ] ) || ( bHi == b16_kanji1High_ude_arm[ 0 ] && bLo > b16_kanji1High_ude_arm[ 1 ] && bLo < b16_kanji2aLow_ichi_formalOne[ 1 ] ) || ( bHi > b16_kanji2aHigh_ude_arm[ 0 ] && bHi < b16_kanji2bLow_yo_e040[ 0 ] ) || ( bHi == b16_kanji2bHigh_hikaru_eaa4[ 0 ] && bLo > b16_kanji2bHigh_hikaru_eaa4[ 1 ] ) || bHi > b16_kanji2bHigh_hikaru_eaa4[ 0 ] ) return 0; else return 2; } */ int slowsjIsPQuasiKanji( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_KanjiIbid[ 0 ] ) && ( b >= b16_KanjiIbid[ 1 ] /* This might be a proper Kanji? */ || b <= b16_Ditto[ 1 ] /* Should this be only with European mods? */ || b <= b16_Shime[ 1 ] /* Probably not Kanji? */ || b <= b16_KanjiZero[ 1 ] /* Should this be Kanji? */ || b <= b16_OpenCircle_Maru[ 1 ] /* Often used as fill-in-th-blank. */ || b <= b16_KanjiRepeat[ 1 ] ) )? 2 : 0; } /* Run-time bias against everybody. // Should give fairly even timing in general use // and give best timing for generating tables. */ int slowsjIsPAlpha( char * chp ) { int result = slowsjIsPKanji( chp ); if ( result == 0 ) result = slowsjIsPKana( chp ); if ( result == 0 ) result = slowsjIsPEurAsianAlpha( chp ); return result; } /* Use the same bias as alpha, just to be obnoxious. */ int slowsjIsPQuasiAlpha( char * chp ) { int result = slowsjIsPQuasiKanji( chp ); if ( result == 0 ) result = slowsjIsPQuasiKana( chp ); if ( result == 0 ) result = slowsjIsPQuasiEurAsianAlpha( chp ); return result; } /* Bias? What bias? */ int slowsjIsPAlNum( char * chp ) { int result = slowsjIsPDigit( chp ); if ( result == 0 ) result = slowsjIsPAlpha( chp ); return result; } /* Bias? What bias? */ int slowsjIsPAlNumQuasi( char * chp ) { int result = slowsjIsPQuasiAlpha( chp ); if ( result == 0 ) result = slowsjIsPAlNum( chp ); return result; } int slowsjIsPLineDraw( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_LineDraw_1H[ 0 ] ) && ( b >= b16_LineDraw_1H[ 1 ] && b <= b16_LineDraw_1H2V[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPPunct( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_ToTen[ 0 ] ) /* Nice of the JIS comittee to put them all together. */ { b = uchp[ 1 ]; return ( b != 0x7f /* Check and excuse later */ && ( ( b >= b16_ToTen[ 1 ] && b <= b16_Geta[ 1 ] ) || ( b >= b16_Element[ 1 ] && b <= b16_Intersection[ 1 ] ) || ( b >= b16_Conjunction_And[ 1 ] && b <= b16_Exists[ 1 ] ) || ( b >= b16_Angle[ 1 ] && b <= b16_DoubleIntegral[ 1 ] ) || ( b >= b16_Angstrom[ 1 ] && b <= b16_Paragraph[ 1 ] ) || ( b == b16_CompositionCircle[ 1 ] ) ) ) ? 2 : 0; } else { return ( ( b >= b7_EXCLAIM && b <= b7_SLASH ) || ( b >= b7_COLON && b <= b7_ATEACH ) || ( b >= b7_LEFTBRACKET && b <= b7_ACCENTGRAVE ) || ( b >= b7_LEFTBRACE && b <= b7_TILDE ) || ( b >= b8_Kuten && b <= b8_ChuTen ) || ( b == b8_ChoOn ) || ( b >= b8_DakuTen && b <= b8_HandakuTen ) ) ? 1 : 0; } } int slowsjIsPGraph( char * chp ) { int result = slowsjIsPAlNum( chp ); if ( result == 0 ) result = slowsjIsPPunct( chp ); return result; } int slowsjIsPPrint( char * chp ) { ubyte * uchp = (ubyte *) chp; if ( * uchp == b7_SP ) return 1; else if ( uchp[ 0 ] == b16_SP[ 0 ] && uchp[ 1 ] == b16_SP[ 1 ] ) return 2; else return slowsjIsPGraph( chp ); } /* Macro to isprint() works just fine because there are no two-byte control characters. int slowsjIsP2Byte( char * chp ) {} */ /* ToLower/Upper will have to test the 7f gap specifically for each range that suffers it. // Some are entirely above and some entirely below. // JIS Roman/Greek/Russian doesn't include any caseless characters in my materials. // But if they did I could test the converted character for validity before returning it. // Just for fun, I'll include the test anyway. */ int slowsjPToLowerRoman( char * chpin, char * chpout ) { int count = slowsjIsPRomanUpper( chpin ); char temp[ 4 ] = { 0 }; switch ( count ) { case 1: temp[ 0 ] = chpin[ 0 ] + ( b7_a - b7_A ); break; case 2: temp[ 0 ] = chpin[ 0 ]; temp[ 1 ] = chpin[ 1 ] + ( b16_a[ 1 ] - b16_A[ 1 ] ); /* No gap */ break; } if ( count > 0 && slowsjIsPRomanLower( temp ) == count ) { chpout[ 0 ] = temp[ 0 ]; if ( count > 1 ) chpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToUpperRoman( char * chpin, char * chpout ) { int count = slowsjIsPRomanLower( chpin ); char temp[ 4 ] = { 0 }; switch ( count ) { case 1: temp[ 0 ] = chpin[ 0 ] - ( b7_a - b7_A ); break; case 2: temp[ 0 ] = chpin[ 0 ]; temp[ 1 ] = chpin[ 1 ] - ( b16_a[ 1 ] - b16_A[ 1 ] ); /* No gap */ break; } if ( count > 0 && slowsjIsPRomanUpper( temp ) == count ) { chpout[ 0 ] = temp[ 0 ]; if ( count > 1 ) chpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToLowerGreek( char * chpin, char * chpout ) { int count = slowsjIsPGreekUpper( chpin ); char temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = chpin[ 0 ]; temp[ 1 ] = chpin[ 1 ] + ( b16_alpha[ 1 ] - b16_ALPHA[ 1 ] ); /* No gap */ } if ( count == 2 && slowsjIsPGreekLower( temp ) == count ) { chpout[ 0 ] = temp[ 0 ]; chpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToUpperGreek( char * chpin, char * chpout ) { int count = slowsjIsPGreekLower( chpin ); char temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = chpin[ 0 ]; temp[ 1 ] = chpin[ 1 ] - ( b16_alpha[ 1 ] - b16_ALPHA[ 1 ] ); /* No gap */ } if ( count == 2 && slowsjIsPGreekUpper( temp ) == count ) { chpout[ 0 ] = temp[ 0 ]; chpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToLowerRussian( char * chpin, char * chpout ) { int count = slowsjIsPRussianUpper( chpin ); char temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = chpin[ 0 ]; temp[ 1 ] = chpin[ 1 ] + ( b16_Russian_a[ 1 ] - b16_Russian_A[ 1 ] ); if ( temp[ 1 ] >= 0x7f ) /* Adjust for the gap. */ temp[ 1 ] += 1; } if ( count == 2 && slowsjIsPRussianLower( temp ) == count ) { chpout[ 0 ] = temp[ 0 ]; chpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToUpperRussian( char * chpin, char * chpout ) { int count = slowsjIsPRussianLower( chpin ); /* Checks the gap. */ char temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = chpin[ 0 ]; temp[ 1 ] = chpin[ 1 ] - ( b16_Russian_a[ 1 ] - b16_Russian_A[ 1 ] ); if ( chpin[ 1 ] > 0x7f ) /* Adjust for the gap (0x7f already filtered above). */ temp[ 1 ] -= 1; } if ( count == 2 && slowsjIsPRussianUpper( temp ) == count ) { chpout[ 0 ] = temp[ 0 ]; chpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } /* Again, time-biased in favor of the most likely. (Russian and Greek are not as commonly used.) // Would be faster to test directly, but that increases logical coupling // (increases the chance for algorithmic errors). // Reducing errors is a higher priority than speed. */ int slowsjPToLower( char * chpin, char * chpout ) { int count = slowsjPToLowerRoman( chpin, chpout ); if ( count == 0 ) count = slowsjPToLowerGreek( chpin, chpout ); if ( count == 0 ) count = slowsjPToLowerRussian( chpin, chpout ); return count; } int slowsjPToUpper( char * chpin, char * chpout ) { int count = slowsjPToUpperRoman( chpin, chpout ); if ( count == 0 ) count = slowsjPToUpperGreek( chpin, chpout ); if ( count == 0 ) count = slowsjPToUpperRussian( chpin, chpout ); return count; } /* ToLower/Upper will have to test the 7f gap specifically for each range that suffers it. Some are entirely above and some entirely below. JIS Roman/Greek/Russian doesn't include caseless. For converting katakana to hiragana, I can test whether the result is valid before returning it. int slowsjToUpper( unsigned char * mbcin, unsigned char * mbcout ) Converts cased word forming characters to upper case, including 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count converted or zero. */ /* So, the initial, standard function headers: int slowsjIsCntrl( unsigned char * mbc ) As near as I can tell, all one byte, between 0 and 0x1f, inclusive. Returns byte count. int slowsjIsSpace( unsigned char * mbc ) Adds one two byte version of the space character. Returns byte count. int slowsjIsPrint( unsigned char * mbc ) All graphic characters, including non-control space characters. Returns byte count. int slowsjIsGraph( unsigned char * mbc ) All graphic non-space characters. Returns byte count. int slowsjIsPunct( unsigned char * mbc ) All non-word-forming characters. Will later be subdivided for the richer JIS set. Returns byte count. int slowsjIsDigit( unsigned char * mbc ) The standard digits 0..9, as specified in ANSI/ISO ctype. Includes both one and two byte digits. Does not include kanji numbers. Returns byte count. int slowsjIsXDigit( unsigned char * mbc ) The standard hexadecimal digits specified in ANSI/ISO ctype. Includes both one and two byte digits. Does not include kanji numbers. Returns byte count. int slowsjIsAlpha( unsigned char * mbc ) Characters used to form words, as used by non-programmers. Does not include the standard decimal digits, but does include the kanji numbers. Includes a lot of caseless characters, of course. Returns byte count. int slowsjIsAlNum( unsigned char * mbc ) Characters used to form words, as used by programmers, thus including digits. Returns byte count. int slowsjIsUpper( unsigned char * mbc ) Upper cased characters, includes 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count. int slowsjIsLower( unsigned char * mbc ) Lower cased characters, includes 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count. int slowsjToLower( unsigned char * mbcin, unsigned char * mbcout ) Converts cased word forming characters to lower case, including 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count converted or zero. int slowsjToUpper( unsigned char * mbcin, unsigned char * mbcout ) Converts cased word forming characters to upper case, including 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count converted or zero. int slowsjIs1Byte( unsigned char * mbc ) Valid one byte character. Returns byte count. int slowsjIs2Byte( unsigned char * mbc ) Valid two byte character? Returns byte count. int slowsjCouldBe2Byte( unsigned char * mbc ) A combination of valid lead byte and valid tail byte? Returns byte count. The second, or fast version slowsjIsXX() functions will use constants of the pattern slowsjIsXX_k. The constants and the general call will also be provided in the source header, as mentioned above, for optimization: int slowsjCType( unsigned long type, unsigned char * mbc ) Test the type formed by the bit-or of the type constants passed as the first parameter. Returns byte count on test true or zero on test false. The initial slow version functions will have names of the pattern slow_slowsjIsXX() so they can co-exist during debugging. slowsjrIsXX()? Now, some of the foreseeable necessary extensions: int slowsjIsMath( unsigned char * mbc ) The plethora of math and logic symbols in JIS. Returns byte count. int slowsjIsUnit( unsigned char * mbc ) The plethora of unit symbols in JIS, but not system specific extensions like m2. Does not include kanji. Returns byte count. int slowsjIsQuote( unsigned char * mbc ) The plethora of quoting and parenthetic characters in JIS. Returns byte count. int slowsjIsKanji( unsigned char * mbc ) All the proper kanji characters. Returns byte count. int isNumberKanji( unsigned char * mbc ) All the number kanji, including the special ones used, for example, on currency and bank notes. Returns byte count. int slowsjIsKana( unsigned char * mbc ) All the katakana and hiragana characters, including the one byte katakana. Also including the free-standing voicing and plosive symbols, dakuten and handakuten. Returns byte count. int slowsjIsKata( unsigned char * mbc ) All the katakana, including the SJIS one byte katakana, but not the free-standing voicing and plosive symbols, dakuten and handakuten. Returns byte count. int slowsjIsHira( unsigned char * mbc ) All the hiragana, not including the free-standing voicing and plosive symbols, dakuten and handakuten. Returns byte count. int slowsjToKata( unsigned char * mbcin, unsigned char * mbcout ) Converts hiragana to katakana. Returns byte count converted or zero. int slowsjToHira( unsigned char * mbcin, unsigned char * mbcout ) Converts katakana to hiragana, where possible. Moves the unconvertable katakana as they are. Does not convert the one byte katakana. Returns byte count converted or zero. int slowsjTo16Kata( unsigned char * mbcin, unsigned char * mbcout ) Converts the one byte katakana to two byte katakana. Round trip slowsjTo16Kata() -> slowsjTo8Kata() should be guaranteeable. Returns byte count converted or zero. int slowsjTo8Kata( unsigned char * mbcin, unsigned char * mbcout ) Converts two byte katakana to one byte katakana, where possible. Round trip slowsjTo8Kata() -> slowsjTo16Kata() may be guaranteeable, I'm not sure yet. Returns byte count converted or zero. Some of the hypothetical extensions: int slowsjIsMusic( unsigned char * mbc ) The music symbols in JIS. Returns byte count. int slowsjIsKanjiUnit( unsigned char * mbc ) The kanji version of units, including also ten, hundred, thousand, ten-thousand, etc. Returns byte count. int slowsjIsRoman( unsigned char * mbc ) All the JIS Roman (two byte Latin) characters. Returns byte count. int slowsjIsGreek( unsigned char * mbc ) All the JIS Greek characters. Returns byte count. int slowsjIsRussian( unsigned char * mbc ) All the JIS Russian characters. Returns byte count. int slowsjIsLatin( unsigned char * mbc ) All the Latin characters, including the two byte Roman (Latin) and one byte Latin. Returns byte count. int slowsjToRoman( unsigned char * mbcin, unsigned char * mbcout ) Convert one byte Latin to two byte JIS Roman (Latin). Returns byte count converted or zero. int slowsjToLatin( unsigned char * mbcin, unsigned char * mbcout ) Convert two byte JIS Roman (Latin) to one byte Latin. Returns byte count converted or zero. */ | |
\ No newline at end of file | ||
1 | +/* slowsjctype.c v00.00.01.jmr // Near-ctype functions for shift-JIS characters, slow version. // Written by Joel Matthew Rees, Amagasaki, Hyogo, Japan, beginning April 2001. // joel_rees@sannet.ne.jp // // Shifting strategy for usability in current C environments: // pass char pointers instead of unsigned char pointers. // Also, adding P to names to emphasize pointer usage. // // Copyright 2000, 2001 Joel Matthew Rees. // All rights reserved. // // Assignment of Stewardship, or Terms of Use: // // The author grants permission to use and/or redistribute the code in this // file, in either source or translated form, under the following conditions: // 1. When redistributing the source code, the copyright notices and terms of // use must be neither removed nor modified. // 2. When redistributing in a form not generally read by humans, the // copyright notices and terms of use, with proper indication of elements // covered, must be reproduced in the accompanying documentation and/or // other materials provided with the redistribution. In addition, if the // source includes statements designed to compile a copyright notice // into the output object code, the redistributor is required to take // such steps as necessary to preserve the notice in the translated // object code. // 3. Modifications must be annotated, with attribution, including the name(s) // of the author(s) and the contributor(s) thereof, the conditions for // distribution of the modification, and full indication of the date(s) // and scope of the modification. Rights to the modification itself // shall necessarily be retained by the author(s) thereof. // 4. These grants shall not be construed as an assignment or assumption of // liability of any sort or to any degree. Neither shall these grants be // construed as endorsement or represented as such. Any party using this // code in any way does so under the agreement to entirely indemnify the // author and any contributors concerning the code and any use thereof. // Specifically, THIS SOFTWARE IS PROVIDED AT NO COST, AS IT IS, WITHOUT // ANY EXPRESS OR IMPLIED WARRANTY OF ANY SORT, INCLUDING, BUT NOT LIMITED // TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. // UNDER NO CIRCUMSTANCES SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR // ANY DAMAGES WHATSOEVER ARISING FROM ITS USE OR MISUSE, EVEN IF ADVISED // OF THE EXISTENCE OF THE POSSIBILITY OF SUCH DAMAGE. // 5. This code should not be used for any illegal or immoral purpose, // including, but not limited to, the theft of property or services, // deliberate communication of false information, the distribution of drugs // for purposes other than medical, the distribution of pornography, the // provision of illicit sexual services, the maintenance of oppressive // governments or organizations, or the imposture of false religion and // false science. // Any illegal or immoral use incurs natural and legal penalties, which the // author invokes in full force upon the heads of those who so use it. // 6. Alternative redistribution arrangements: // a. If the above conditions are unacceptable, redistribution under the // following commonly used public licenses is expressly permitted: // i. The GNU General Public License (GPL) of the Free Software // Foundation. // ii. The Perl Artistic License, only as a part of Perl. // iii. The Apple Public Source License, only as a part of Darwin or // a Macintosh Operating System using Darwin. // b. No other alternative redistribution arrangement is permitted. // (The original author reserves the right to add to this list.) // c. When redistributing this code under an alternative license, the // specific license being invoked shall be noted immediately beneath // the body of the terms of use. The terms of the license so specified // shall apply only to the redistribution of the source so noted. // 7. In no case shall the rights of the original author to the original work // be impaired by any distribution or redistribution arrangement. // // End of the Assignment of Stewardship, or terms of use. // // License invoked: Assignment of Stewardship. // Notes concerning license: // Compiler directives are strongly encouraged as a means of meeting // the attribution requirements in the Assignment of Stewardship. */ /* Primary references for the ranges chosen below: // // Character palette from Apple's Kotoeri input method, systems 7/8/9. // Publisher: Apple, included with Apple's Macintosh operating systems. // The character palettes since sys. 8.0 or 8.1 have included primary pronunciations, // as well as JIS, kuten, and UNICODE assignments, in a detailed view. // Since at least sys. 8.5 or 8.6, a flag appears when a non-standard character is selected. // Newer versions track the changes to the various standards. // // Pasokon/Waapuro Kanji Jiten, 1987 Edition // Compiler: Tsutomu Uegaki; Publisher: Natsume-sha (Chiyouda-ku). // Lists and tables of Kanji and other JIS characters and character codes. // Contains a nice rectanglular arrangement of Kanji on pages 588-599. // // Waapuro/Pasokon Saishin Kanji Jiten, 1st Edition (1994) // Compiler: Shougakukan Dictionary Editors Department; // Publisher: Shougakukan (Chiyouda-ku). // Lists and tables of Kanji and other JIS characters and character codes. // Includes a list of the proposed annex characters, with annex numbers. // The annex characters have been assigned actual codes since this edition was published. // // Pasokon Yougo Jiten, 1992-93 Edition // Authors: Shigeru Okamoto, Ichirou Senba, Yoshiaki Nakamura, Kazuko Takahashi; // Publisher: Gijutsu Hyouron-sha (Shinjuku-ku). // Dictionary of personal computer terminology, // particularly referenced the JIS/ISO/ANSI 8-bit character tables starting page 409. */ #include "sjctypenv.h" #include "sj8bitChars.h" #include "sj16bitChars.h" #include "slowsjctype.h" /* Because char is probably signed, // it is usually liable to induce errors to use escaped char constant notation. // '\x80' may well be something like 0xffffff80, rather than 0x80. // Hopefully, I have been consistent about this. <erg/> // Note the problems when comparing a char variable with a character constant: // char scan; . . . while ( scan <= 0x9f ) // will produce an infinite loop, which is probably not the desired effect. // 0x9f is an integer equal to decimal 159. // '\x9f' is a char and promotes to integer with sign extension: // ( -( 256 - 159 ) ) == ( -97 ) // Two's complement. // . . . while ( scan <= 'x9f' ) // will probably produce the desired result, but by an un-expected calculation. // For instance, // scan = 0x9e; if ( scan < '\x9f' ) // yields true because -98 is less than -97, not because 158 is less than 159. // I tend to forget which is which in the middle of loops, // so I usually use long integers in loops (which is a good idea anyway) // and avoid comparing to integer constants. // This is also a reason I use symbolic constants instead of directly using characters. // // This shows one of the many reasons for having some means of dialect control, // instead of constraining the one-and-only standard in ways that turn out to be non-optimal. */ /* Cleared the unwanted dependency on sjctypenv.h (bool) -- JMR2001.05.31 // This required changing the bool typed functions to int typed functions, as noted below. // This mod by Joel Matthew Rees, released under original terms of use. */ int slowsjIsPOneByte( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int b = * ( (ubyte *) chp ); return b < 0x80 || ( b >= 0xa1 && b <= 0xdf ); } int slowsjIsPHighByte( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int bHi = ( (ubyte *) chp )[ 0 ]; return ( bHi >= 0x81 && bHi <= 0x9f ) || ( bHi >= 0xe0 && bHi <= 0xfc ); } int slowsjIsPLowByte( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int bLo = * ( (ubyte *) chp ); return bLo >= 0x40 && bLo <= 0xfc && bLo != 0x7f; } int slowsjIsP7bit( char * chp ) /* changed from bool to int JMR2001.05.31 */ { int bLo = * ( (ubyte *) chp ); return bLo < 0x80; } int slowsjPGuessCount( char * chp ) { return ( slowsjIsPHighByte( chp ) && slowsjIsPLowByte( chp + 1 ) ) ? 2 : slowsjIsPOneByte( chp ) ? 1 : 0; } int slowsjIsPCntrl( char * chp ) { int uch = (ubyte) chp[ 0 ]; return ( uch <= 0x1f || uch == 0x7f ) ? 1 : 0; /* DEL added JMR2001.05.23 */ /* The standard doesn't know for unit separator. */ } int slowsjIsPSpace( char * chp ) { ubyte * uchp = (ubyte *) chp; switch ( * uchp ) { case b7_HT: case b7_LF: case b7_VT: case b7_FF: case b7_CR: case b7_SP: return 1; default: return ( uchp[ 0 ] == b16_SP[ 0 ] && uchp[ 1 ] == b16_SP[ 1 ] ) ? 2 : 0; /* 0x8140 is sjis 2-byte space */ } } int slowsjIsPDigit( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_ZERO[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_ZERO[ 1 ] && b <= b16_NINE[ 1 ] ) ? 2 : 0; } else { return ( b >= b7_ZERO && b <= b7_NINE ) ? 1 : 0; } } int slowsjIsPXDigit( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_A[ 0 ] ) { b = uchp[ 1 ]; return ( ( b >= b16_A[ 1 ] && b <= b16_F[ 1 ] ) || ( b >= b16_a[ 1 ] && b <= b16_f[ 1 ] ) ) ? 2 : slowsjIsPDigit( chp ); } else { return ( ( b >= b7_A && b <= b7_F ) || ( b >= b7_a && b <= b7_f ) ) ? 1 : slowsjIsPDigit( chp ); } } int slowsjIsPRomanLower( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_a[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_a[ 1 ] && b <= b16_z[ 1 ] && b != 0x7f ) ? 2 : 0; } else { return ( b >= b7_a && b <= b7_z ) ? 1 : 0; } } int slowsjIsPRomanUpper( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_A[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_A[ 1 ] && b <= b16_Z[ 1 ] && b != 0x7f ) ? 2 : 0; } else { return ( b >= b7_A && b <= b7_Z ) ? 1 : 0; } } /* Time biased against upper case, but we don't care on the slow version. */ int slowsjIsPRoman( char * chp ) { int result = slowsjIsPRomanLower( chp ); if ( result == 0 ) result = slowsjIsPRomanUpper( chp ); return result; } int slowsjIsPGreekLower( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_alpha[ 0 ] ) && ( b >= b16_alpha[ 1 ] && b <= b16_omega[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPGreekUpper( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_ALPHA[ 0 ] ) && ( b >= b16_ALPHA[ 1 ] && b <= b16_OMEGA[ 1 ] && b != 0x7f ) ) ? 2 : 0; } /* Time biased against upper case, but we don't care on the slow version. */ int slowsjIsPGreek( char * chp ) { int result = slowsjIsPGreekLower( chp ); if ( result == 0 ) slowsjIsPGreekUpper( chp ); return result; } int slowsjIsPRussianLower( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_Russian_a[ 0 ] ) && ( b >= b16_Russian_a[ 1 ] && b <= b16_Russian_ya[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPRussianUpper( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_Russian_A[ 0 ] ) && ( b >= b16_Russian_A[ 1 ] && b <= b16_Russian_YA[ 1 ] && b != 0x7f ) ) ? 2 : 0; } /* Time biased against upper case, but we don't care on the slow version. */ int slowsjIsPRussian( char * chp ) { int result = slowsjIsPRussianLower( chp ); if ( result == 0 ) slowsjIsPRussianUpper( chp ); return result; } /* Time biased against Greek and Russian, but we don't care on the slow version. */ int slowsjIsPUpper( char * chp ) { int result = slowsjIsPRomanUpper( chp ); if ( result == 0 ) result = slowsjIsPGreekUpper( chp ); if ( result == 0 ) result = slowsjIsPRussianUpper( chp ); return result; } /* Time biased against Greek and Russian, but we don't care on the slow version. */ int slowsjIsPLower( char * chp ) { int result = slowsjIsPRomanLower( chp ); if ( result == 0 ) result = slowsjIsPGreekLower( chp ); if ( result == 0 ) result = slowsjIsPRussianLower( chp ); return result; } /* Time biased against Greek and Russian, but we don't care on the slow version. */ int slowsjIsPEurAsianAlpha( char * chp ) { int result = slowsjIsPRoman( chp ); if ( result == 0 ) result = slowsjIsPGreek( chp ); if ( result == 0 ) result = slowsjIsPRussian( chp ); return result; } int slowsjIsPQuasiEurAsianAlpha( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_AccentAcute_Prime[ 0 ] ) { b = uchp[ 1 ]; return ( b == b16_AccentAcute_Prime[ 1 ] || b == b16_AccentGrave[ 1 ] || b == b16_Umlaut[ 1 ] || b == b16_AccentCircumflex[ 1 ] || b == b16_Overline_Negate[ 1 ] || b == b16_QuarterDash_Hyphen[ 1 ] || b == b16_WavyDash_Tilde[ 1 ] ) ? 2 : 0; } else { return ( b == b7_HYPHEN || b == b7_ACCENTGRAVE || b == b7_TILDE || b == b7_CARET ) ? 1 : 0; } } int slowsjIsPHiragana( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_hiraganaSub_a[ 0 ] ) && ( b >= b16_hiraganaSub_a[ 1 ] && b <= b16_hiragana_ng[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPKatakana( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_katakanaSub_a[ 0 ] ) { b = uchp[ 1 ]; return ( b >= b16_katakanaSub_a[ 1 ] && b <= b16_katakanaSub_ke[ 1 ] && b != 0x7f ) ? 2 : 0; } else { return ( ( b >= b8_katakana_wo && b <= b8_katakanaSub_tu ) || ( b >= b8_katakana_a && b <= b8_katakana_ng ) ) ? 1 : 0; } } /* Time biased against katakana, but we don't care on the slow version. */ int slowsjIsPKana( char * chp ) { int result = slowsjIsPHiragana( chp ); if ( result == 0 ) result = slowsjIsPKatakana( chp ); return result; } int slowsjIsPQuasiKana( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_DakuTen[ 0 ] ) { b = uchp[ 1 ]; return ( b == b16_DakuTen[ 1 ] || b == b16_HanDakuTen[ 1 ] || b == b16_KatakanaRepeat[ 1 ] || b == b16_KatakanaRepeatVoiced[ 1 ] || b == b16_HiraganaRepeat[ 1 ] || b == b16_HiraganaRepeatVoiced[ 1 ] || b == b16_ChoOn[ 1 ] ) ? 2 : 0; } else { return ( b == b8_ChoOn || b == b8_DakuTen || b == b8_HandakuTen ) ? 1 : 0; } } /* This has even time-bias for JIS level 1. */ int slowsjIsPKanji( char * chp ) { ubyte * uchp = (ubyte *) chp; int bHi = uchp[ 0 ]; int bLo = uchp[ 1 ]; if ( slowsjIsPHighByte( chp ) && slowsjIsPLowByte( chp + 1 ) && ( ( bHi == b16_kanji1Low_a[ 0 ] && bLo >= b16_kanji1Low_a[ 1 ] ) || ( bHi > b16_kanji1Low_a[ 0 ] && bHi < b16_kanji1High_ude[ 0 ] ) || ( bHi == b16_kanji1High_ude[ 0 ] && bLo <= b16_kanji1High_ude[ 1 ] ) || ( bHi == b16_kanji2aLow_ichi[ 0 ] && bLo >= b16_kanji2aLow_ichi[ 1 ] ) || ( bHi > b16_kanji2aLow_ichi[ 0 ] && bHi <= b16_kanji2aHigh_jou[ 0 ] ) /* The rows at the end of 2a and beginning of 2b are complete. */ || ( bHi >= b16_kanji2bLow_you[ 0 ] && bHi <= b16_kanji2bHigh_hikaru[ 0 ] ) || ( bHi == b16_kanji2bHigh_hikaru[ 0 ] && bLo <= b16_kanji2bHigh_hikaru[ 1 ] ) ) ) return 2; else return 0; } /* This is completely time-biased against kanji, and a little harder to mentally verify. { ubyte * uchp = (ubyte *) chp; int bHi = uchp[ 0 ]; int bLo = uchp[ 1 ]; if ( !slowsjIsPHighByte( chp ) || !slowsjIsPLowByte( chp + 1 ) || bHi < b16_kanji1Low_a_sub[ 0 ] || ( bHi == b16_kanji1Low_a_sub[ 0 ] && bLo < b16_kanji1Low_a_sub[ 1 ] ) || ( bHi == b16_kanji1High_ude_arm[ 0 ] && bLo > b16_kanji1High_ude_arm[ 1 ] && bLo < b16_kanji2aLow_ichi_formalOne[ 1 ] ) || ( bHi > b16_kanji2aHigh_ude_arm[ 0 ] && bHi < b16_kanji2bLow_yo_e040[ 0 ] ) || ( bHi == b16_kanji2bHigh_hikaru_eaa4[ 0 ] && bLo > b16_kanji2bHigh_hikaru_eaa4[ 1 ] ) || bHi > b16_kanji2bHigh_hikaru_eaa4[ 0 ] ) return 0; else return 2; } */ int slowsjIsPQuasiKanji( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_KanjiIbid[ 0 ] ) && ( b >= b16_KanjiIbid[ 1 ] /* This might be a proper Kanji? */ || b <= b16_Ditto[ 1 ] /* Should this be only with European mods? */ || b <= b16_Shime[ 1 ] /* Probably not Kanji? */ || b <= b16_KanjiZero[ 1 ] /* Should this be Kanji? */ || b <= b16_OpenCircle_Maru[ 1 ] /* Often used as fill-in-th-blank. */ || b <= b16_KanjiRepeat[ 1 ] ) )? 2 : 0; } /* Run-time bias against everybody. // Should give fairly even timing in general use // and give best timing for generating tables. */ int slowsjIsPAlpha( char * chp ) { int result = slowsjIsPKanji( chp ); if ( result == 0 ) result = slowsjIsPKana( chp ); if ( result == 0 ) result = slowsjIsPEurAsianAlpha( chp ); return result; } /* Use the same bias as alpha, just to be obnoxious. */ int slowsjIsPQuasiAlpha( char * chp ) { int result = slowsjIsPQuasiKanji( chp ); if ( result == 0 ) result = slowsjIsPQuasiKana( chp ); if ( result == 0 ) result = slowsjIsPQuasiEurAsianAlpha( chp ); return result; } /* Bias? What bias? */ int slowsjIsPAlNum( char * chp ) { int result = slowsjIsPDigit( chp ); if ( result == 0 ) result = slowsjIsPAlpha( chp ); return result; } /* Bias? What bias? */ int slowsjIsPAlNumQuasi( char * chp ) { int result = slowsjIsPQuasiAlpha( chp ); if ( result == 0 ) result = slowsjIsPAlNum( chp ); return result; } int slowsjIsPLineDraw( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = uchp[ 1 ]; return ( ( * uchp == b16_LineDraw_1H[ 0 ] ) && ( b >= b16_LineDraw_1H[ 1 ] && b <= b16_LineDraw_1H2V[ 1 ] && b != 0x7f ) )? 2 : 0; } int slowsjIsPPunct( char * chp ) { ubyte * uchp = (ubyte *) chp; int b = * uchp; if ( b == b16_ToTen[ 0 ] ) /* Nice of the JIS comittee to put them all together. */ { b = uchp[ 1 ]; return ( b != 0x7f /* Check and excuse later */ && ( ( b >= b16_ToTen[ 1 ] && b <= b16_Geta[ 1 ] ) || ( b >= b16_Element[ 1 ] && b <= b16_Intersection[ 1 ] ) || ( b >= b16_Conjunction_And[ 1 ] && b <= b16_Exists[ 1 ] ) || ( b >= b16_Angle[ 1 ] && b <= b16_DoubleIntegral[ 1 ] ) || ( b >= b16_Angstrom[ 1 ] && b <= b16_Paragraph[ 1 ] ) || ( b == b16_CompositionCircle[ 1 ] ) ) ) ? 2 : 0; } else { return ( ( b >= b7_EXCLAIM && b <= b7_SLASH ) || ( b >= b7_COLON && b <= b7_ATEACH ) || ( b >= b7_LEFTBRACKET && b <= b7_ACCENTGRAVE ) || ( b >= b7_LEFTBRACE && b <= b7_TILDE ) || ( b >= b8_Kuten && b <= b8_ChuTen ) || ( b == b8_ChoOn ) || ( b >= b8_DakuTen && b <= b8_HandakuTen ) ) ? 1 : 0; } } int slowsjIsPGraph( char * chp ) { int result = slowsjIsPAlNum( chp ); if ( result == 0 ) result = slowsjIsPPunct( chp ); return result; } int slowsjIsPPrint( char * chp ) { ubyte * uchp = (ubyte *) chp; if ( * uchp == b7_SP ) return 1; else if ( uchp[ 0 ] == b16_SP[ 0 ] && uchp[ 1 ] == b16_SP[ 1 ] ) return 2; else return slowsjIsPGraph( chp ); } /* Macro to isprint() works just fine because there are no two-byte control characters. int slowsjIsP2Byte( char * chp ) {} */ /* ToLower/Upper will have to test the 7f gap specifically for each range that suffers it. // Some are entirely above and some entirely below. // JIS Roman/Greek/Russian doesn't include any caseless characters in my materials. // But if they did I could test the converted character for validity before returning it. // Just for fun, I'll include the test anyway. */ int slowsjPToLowerRoman( char * chpin, char * chpout ) { int count = slowsjIsPRomanUpper( chpin ); char temp[ 4 ] = { 0 }; switch ( count ) { case 1: temp[ 0 ] = chpin[ 0 ] + ( b7_a - b7_A ); break; case 2: temp[ 0 ] = chpin[ 0 ]; temp[ 1 ] = chpin[ 1 ] + ( b16_a[ 1 ] - b16_A[ 1 ] ); /* No gap */ break; } if ( count > 0 && slowsjIsPRomanLower( temp ) == count ) { chpout[ 0 ] = temp[ 0 ]; if ( count > 1 ) chpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToUpperRoman( char * chpin, char * chpout ) { int count = slowsjIsPRomanLower( chpin ); char temp[ 4 ] = { 0 }; switch ( count ) { case 1: temp[ 0 ] = chpin[ 0 ] - ( b7_a - b7_A ); break; case 2: temp[ 0 ] = chpin[ 0 ]; temp[ 1 ] = chpin[ 1 ] - ( b16_a[ 1 ] - b16_A[ 1 ] ); /* No gap */ break; } if ( count > 0 && slowsjIsPRomanUpper( temp ) == count ) { chpout[ 0 ] = temp[ 0 ]; if ( count > 1 ) chpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToLowerGreek( char * chpin, char * chpout ) { int count = slowsjIsPGreekUpper( chpin ); char temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = chpin[ 0 ]; temp[ 1 ] = chpin[ 1 ] + ( b16_alpha[ 1 ] - b16_ALPHA[ 1 ] ); /* No gap */ } if ( count == 2 && slowsjIsPGreekLower( temp ) == count ) { chpout[ 0 ] = temp[ 0 ]; chpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToUpperGreek( char * chpin, char * chpout ) { int count = slowsjIsPGreekLower( chpin ); char temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = chpin[ 0 ]; temp[ 1 ] = chpin[ 1 ] - ( b16_alpha[ 1 ] - b16_ALPHA[ 1 ] ); /* No gap */ } if ( count == 2 && slowsjIsPGreekUpper( temp ) == count ) { chpout[ 0 ] = temp[ 0 ]; chpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToLowerRussian( char * chpin, char * chpout ) { int count = slowsjIsPRussianUpper( chpin ); char temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = chpin[ 0 ]; temp[ 1 ] = chpin[ 1 ] + ( b16_Russian_a[ 1 ] - b16_Russian_A[ 1 ] ); if ( temp[ 1 ] >= 0x7f ) /* Adjust for the gap. */ temp[ 1 ] += 1; } if ( count == 2 && slowsjIsPRussianLower( temp ) == count ) { chpout[ 0 ] = temp[ 0 ]; chpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } int slowsjPToUpperRussian( char * chpin, char * chpout ) { int count = slowsjIsPRussianLower( chpin ); /* Checks the gap. */ char temp[ 4 ] = { 0 }; if ( count == 2 ) { temp[ 0 ] = chpin[ 0 ]; temp[ 1 ] = chpin[ 1 ] - ( b16_Russian_a[ 1 ] - b16_Russian_A[ 1 ] ); if ( chpin[ 1 ] > 0x7f ) /* Adjust for the gap (0x7f already filtered above). */ temp[ 1 ] -= 1; } if ( count == 2 && slowsjIsPRussianUpper( temp ) == count ) { chpout[ 0 ] = temp[ 0 ]; chpout[ 1 ] = temp[ 1 ]; } else count = 0; return count; } /* Again, time-biased in favor of the most likely. (Russian and Greek are not as commonly used.) // Would be faster to test directly, but that increases logical coupling // (increases the chance for algorithmic errors). // Reducing errors is a higher priority than speed. */ int slowsjPToLower( char * chpin, char * chpout ) { int count = slowsjPToLowerRoman( chpin, chpout ); if ( count == 0 ) count = slowsjPToLowerGreek( chpin, chpout ); if ( count == 0 ) count = slowsjPToLowerRussian( chpin, chpout ); return count; } int slowsjPToUpper( char * chpin, char * chpout ) { int count = slowsjPToUpperRoman( chpin, chpout ); if ( count == 0 ) count = slowsjPToUpperGreek( chpin, chpout ); if ( count == 0 ) count = slowsjPToUpperRussian( chpin, chpout ); return count; } /* ToLower/Upper will have to test the 7f gap specifically for each range that suffers it. Some are entirely above and some entirely below. JIS Roman/Greek/Russian doesn't include caseless. For converting katakana to hiragana, I can test whether the result is valid before returning it. int slowsjToUpper( unsigned char * mbcin, unsigned char * mbcout ) Converts cased word forming characters to upper case, including 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count converted or zero. */ /* So, the initial, standard function headers: int slowsjIsCntrl( unsigned char * mbc ) As near as I can tell, all one byte, between 0 and 0x1f, inclusive. Returns byte count. int slowsjIsSpace( unsigned char * mbc ) Adds one two byte version of the space character. Returns byte count. int slowsjIsPrint( unsigned char * mbc ) All graphic characters, including non-control space characters. Returns byte count. int slowsjIsGraph( unsigned char * mbc ) All graphic non-space characters. Returns byte count. int slowsjIsPunct( unsigned char * mbc ) All non-word-forming characters. Will later be subdivided for the richer JIS set. Returns byte count. int slowsjIsDigit( unsigned char * mbc ) The standard digits 0..9, as specified in ANSI/ISO ctype. Includes both one and two byte digits. Does not include kanji numbers. Returns byte count. int slowsjIsXDigit( unsigned char * mbc ) The standard hexadecimal digits specified in ANSI/ISO ctype. Includes both one and two byte digits. Does not include kanji numbers. Returns byte count. int slowsjIsAlpha( unsigned char * mbc ) Characters used to form words, as used by non-programmers. Does not include the standard decimal digits, but does include the kanji numbers. Includes a lot of caseless characters, of course. Returns byte count. int slowsjIsAlNum( unsigned char * mbc ) Characters used to form words, as used by programmers, thus including digits. Returns byte count. int slowsjIsUpper( unsigned char * mbc ) Upper cased characters, includes 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count. int slowsjIsLower( unsigned char * mbc ) Lower cased characters, includes 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count. int slowsjToLower( unsigned char * mbcin, unsigned char * mbcout ) Converts cased word forming characters to lower case, including 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count converted or zero. int slowsjToUpper( unsigned char * mbcin, unsigned char * mbcout ) Converts cased word forming characters to upper case, including 8 bit JIS-Latin; 16 bit JIS-Roman, -Greek, and -Russian characters, but no kanji. Returns byte count converted or zero. int slowsjIs1Byte( unsigned char * mbc ) Valid one byte character. Returns byte count. int slowsjIs2Byte( unsigned char * mbc ) Valid two byte character? Returns byte count. int slowsjCouldBe2Byte( unsigned char * mbc ) A combination of valid lead byte and valid tail byte? Returns byte count. The second, or fast version slowsjIsXX() functions will use constants of the pattern slowsjIsXX_k. The constants and the general call will also be provided in the source header, as mentioned above, for optimization: int slowsjCType( unsigned long type, unsigned char * mbc ) Test the type formed by the bit-or of the type constants passed as the first parameter. Returns byte count on test true or zero on test false. The initial slow version functions will have names of the pattern slow_slowsjIsXX() so they can co-exist during debugging. slowsjrIsXX()? Now, some of the foreseeable necessary extensions: int slowsjIsMath( unsigned char * mbc ) The plethora of math and logic symbols in JIS. Returns byte count. int slowsjIsUnit( unsigned char * mbc ) The plethora of unit symbols in JIS, but not system specific extensions like m2. Does not include kanji. Returns byte count. int slowsjIsQuote( unsigned char * mbc ) The plethora of quoting and parenthetic characters in JIS. Returns byte count. int slowsjIsKanji( unsigned char * mbc ) All the proper kanji characters. Returns byte count. int isNumberKanji( unsigned char * mbc ) All the number kanji, including the special ones used, for example, on currency and bank notes. Returns byte count. int slowsjIsKana( unsigned char * mbc ) All the katakana and hiragana characters, including the one byte katakana. Also including the free-standing voicing and plosive symbols, dakuten and handakuten. Returns byte count. int slowsjIsKata( unsigned char * mbc ) All the katakana, including the SJIS one byte katakana, but not the free-standing voicing and plosive symbols, dakuten and handakuten. Returns byte count. int slowsjIsHira( unsigned char * mbc ) All the hiragana, not including the free-standing voicing and plosive symbols, dakuten and handakuten. Returns byte count. int slowsjToKata( unsigned char * mbcin, unsigned char * mbcout ) Converts hiragana to katakana. Returns byte count converted or zero. int slowsjToHira( unsigned char * mbcin, unsigned char * mbcout ) Converts katakana to hiragana, where possible. Moves the unconvertable katakana as they are. Does not convert the one byte katakana. Returns byte count converted or zero. int slowsjTo16Kata( unsigned char * mbcin, unsigned char * mbcout ) Converts the one byte katakana to two byte katakana. Round trip slowsjTo16Kata() -> slowsjTo8Kata() should be guaranteeable. Returns byte count converted or zero. int slowsjTo8Kata( unsigned char * mbcin, unsigned char * mbcout ) Converts two byte katakana to one byte katakana, where possible. Round trip slowsjTo8Kata() -> slowsjTo16Kata() may be guaranteeable, I'm not sure yet. Returns byte count converted or zero. Some of the hypothetical extensions: int slowsjIsMusic( unsigned char * mbc ) The music symbols in JIS. Returns byte count. int slowsjIsKanjiUnit( unsigned char * mbc ) The kanji version of units, including also ten, hundred, thousand, ten-thousand, etc. Returns byte count. int slowsjIsRoman( unsigned char * mbc ) All the JIS Roman (two byte Latin) characters. Returns byte count. int slowsjIsGreek( unsigned char * mbc ) All the JIS Greek characters. Returns byte count. int slowsjIsRussian( unsigned char * mbc ) All the JIS Russian characters. Returns byte count. int slowsjIsLatin( unsigned char * mbc ) All the Latin characters, including the two byte Roman (Latin) and one byte Latin. Returns byte count. int slowsjToRoman( unsigned char * mbcin, unsigned char * mbcout ) Convert one byte Latin to two byte JIS Roman (Latin). Returns byte count converted or zero. int slowsjToLatin( unsigned char * mbcin, unsigned char * mbcout ) Convert two byte JIS Roman (Latin) to one byte Latin. Returns byte count converted or zero. */ | |
\ No newline at end of file |