Code page 930
CCSID 930 (sometimes known as CP930 or codepage 930) is one of several Japanese EBCDIC code pages created by IBM for representation of Japanese text. It is commonly used on IBM z/OS and IBM System i operating system.
It encodes halfwidth Katakana, fullwidth Katakana, Hiragana and Kanji.
Technical detail
CCSID 930 uses a stateful EBCDIC encoding scheme that uses 1 byte to encode halfwidth Katakana and 2 bytes to encode all other Japanese characters. The single byte portion is CCSID 290, which is also known as EBCDIK (Extended Binary Coded Decimal Interchange Kana). The double byte portion is CCSID 300, which is shared with CCSID 939.[1][2] If only halfwidth Katakana mixed with Latin characters is used, which was the standard till the 80s, CCSID 930 can be considered a pure 8-bit encoding. When other types of Japanese or fullwidth characters are used, it is a multibyte encoding where the Shift-In 0x0E and Shift-Out 0x0F bytes are used to indicate the start and end of a double-byte encoding.
The most recent versions of CCSID 930 (CCSID 1390) supports JIS X 0213.
It was invented by Alan Lloyd Jones at IBM Hursley Laboratories, UK.
Practical considerations
CCSID 930 itself and its encoding scheme contains a number of idiosyncrasies that makes working with CCSID 930 in practice hard (see also EBCDIC for idiosyncrasies of the EBCDIC standard) and are of some practical relevance.
- Because of the Shift-In, Shift-Out codes parsing a byte sequence from the middle is hard. Interpretation of the bytes requires backing up until one of the shift bytes is encountered.
- Although CCSID 930 allows for mixed halfwidth and fullwidth character text, many database schemas strictly distinguish between columns containing only single byte halfwidth Katakana and such containing only double byte fullwidth characters. This is a convenience created for software developers to make text length prediction for a given column size in bytes easier and vice versa.
- On the downside the above means that for consistency Latin text in such fullwidth character column will have to be entered or converted into fullwidth Alphabetic characters (interesting when doing database searches) such that they are encoded as double byte characters
- When database columns are implicitly defined as pure fullwidth character text the Shift-In, Shift-Out codes are often omitted, which results in strictly speaking incorrect encoding. When the shift codes are missing, usually CCSID 290 or CCSID 300 needs to be used for proper conversion to another charset, like the more portable Unicode.
- The encoding of lowercase Latin letters a–z in CCSID 290/930 is different from their common encoding in EBCDIC. This means, for example, that a program that checks for the letter 'a' would not recognize the letter 'a' in texts in this encoding.
References
- Lunde, Ken. CJKV Information Processing. Sebastopol, Calif.: O'Reilly & Associates, 1998. ISBN 1-56592-224-7.
- ↑ http://www.ibm.com/software/globalization/ccsid/ccsid930.html
- ↑ http://www.ibm.com/software/globalization/ccsid/ccsid939.html