Home > Code page
Code page is the traditional IBM term used for a specific character encoding table: a mapping in which a sequence of bits, usually a single octet representing integer values 0 through 255, is associated with a specific character. A few code pages use more than 8 bits per character and thus encode more than 256 characters. The term cmap (character map) is used in technical documentation on Macintosh platforms.Although IBM created and maintained many code pages, the term came to be associated primarily with character maps used by the IBM PC and compatible platforms, especially prior to the advent of Unicode-capable programming languages and operating systems.
To this day, it is typical for PC hardware to support a single 8-bit code page that is, by default, for a particular regional market, and to make available mechanisms for operating systems to switch to other code pages. However, it is now commonplace for operating system vendors to provide their own character encoding and rendering systems that bypass the hardware code pages entirely. These alternative character encodings are sometimes called code pages as well.
1 Relationship to ASCII
The basis of many PC code pages is ASCII, a 7-bit code representing 128 characters and control codes. In the past, 8-bit extensions to the ASCII code often either set the top bit to zero, or used it as a parity bit in network data transmissions. When this bit was instead made available for representing character data, another 128 characters and control codes could be represented. IBM used this extended range to encode characters used by various languages. No formal standard existed for these ‘ extended character sets’; IBM merely referred to the variants as code pages, as it had always done for variants of EBCDIC encodings.
2 Partial List of IBM Code Pages
Since the original IBM PC code page ( number 437) was not really designed for international use, several incompatible variants emerged. Examples include:
- 737 -- Greek characters
- 850 -- Multilingual, most European languages
- 858 -- Multilingual with euro symbol
- 860 -- Portugal
- 863 -- French CanadianCanada historically the Dominion of Canada is the second-largest, and northernmost, country in the world. It is a decentralized federation of 10 provinces and 3 territories, governed as a constitutional monarchy, and formed in 1867 through an act of Confe
- 865 -- NordicThe Nordic countries is a term used collectively for five countries in Northern Europe. The Nordic countries have an aggregate population of about 24 million. The Nordic Countries are also the member countries of the Nordic Council: Denmark Finland Icelan
- 868 -- Urdu
- 899 -- Symbol
- 904 -- TaiwanFor the political entity commonly known as "Taiwan," see Republic of China. The island of Taiwan ( Traditional: , Simplified: , Pinyin: Taiwn, Wade-Giles: T'ai-wan, Taiwanese: Tai-oan) is located off the coast of China in the Pacific Ocean. It is also kno
- 1088 -- Revised KoreanThe Korean language is the most widely used language in Korea, and is the official language of both South and North Korea. The language is also spoken widely in neighbouring Yanbian, China. Worldwide, there are around 78 million Korean speakers, including
- 1114 -- TaiwanFor the political entity commonly known as "Taiwan," see Republic of China. The island of Taiwan ( Traditional: , Simplified: , Pinyin: Taiwn, Wade-Giles: T'ai-wan, Taiwanese: Tai-oan) is located off the coast of China in the Pacific Ocean. It is also kno
- 1252 -- Superset of ISO 8859-1ISO 8859-1 more formally cited as ISO/IEC 8859-1 or less formally as Latin-1 is part 1 of ISO/IEC 8859, a standard character encoding defined by ISO. It encodes what it refers to as Latin alphabet no. 1, consisting of 191 characters from the Latin script,, used by Microsoft WindowsImage use policy. Microsoft Windows is a range of commercial operating environments for personal computers. The range was first introduced by Microsoft in 1985 and eventually has come to dominate the world personal computer market. All recent versions of
Other code pages of note are:
- 10000 -- Macintosh Roman encodingThe Mac OS Roman character set Mac-Roman encoding is a one byte character encoding system, traditionally used by Mac OS. In Mac OS X, it been replaced with Unicode. The first 128 characters are equal to the ASCII character encoding (or 'cmap' in Macintosh (followed by several other Mac character sets)
- 12000 -- Unicode little-endian, 12001 big-endian
- 20000 -- CNS Taiwan, followed by other national character sets
In modern applications, operating systems and programming languages, the IBM code pages have been rendered obsolete by international standards, such as ISO 8859-1 and Unicode.