Encoding Utility Library 4.0.6

Delphi 4 to 2009 and Kylix 3 Implementation

Dieter Köhler

LICENSE

The contents of the Extended Document Object Model files are subject to the Mozilla Public License Version 1.1 (the "License"); you may not use this files except in compliance with the License. You may obtain a copy of the License at "http://www.mozilla.org/MPL/"

Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License.

The Original Code is "dkEncodingUtils.pas".

The Initial Developer of the Original Code is Dieter Köhler (Heidelberg, Germany, "http://www.philo.de/"). Portions created by the Initial Developer are Copyright (C) 1999-2009 Dieter Köhler. All Rights Reserved.

Alternatively, the contents of this files may be used under the terms of the GNU General Public License Version 2 or later (the "GPL"), in which case the provisions of the GPL are applicable instead of those above. If you wish to allow use of your version of this files only under the terms of the GPL, and not to allow others to use your version of this files under the terms of the MPL, indicate your decision by deleting the provisions above and replace them with the notice and other provisions required by the GPL. If you do not delete the provisions above, a recipient may use your version of this file under the terms of any one of the MPL or the GPL.

2004-2009


Table of Contents

Introduction
The TEncodingInfo Class
Public Class Functions
The TEncodingInfoClass Class Type
The Encodings Constant
Classes derived from TEncodingInfo
References

Introduction

The Encoding Utility Library defines the TEncodingInfo class for obtaining information about the names and aliases of character sets according to the Management Information Base (MIB) as specified in [RTF 3808] and [CSMIB]. (This specification is occasionally used in the documentation below even if not explicitly quoted.) The MIB contains the official names for character sets that may be used on the Internet.

In addition, some common non-standard aliases haven been included as well. The are marked in the source code as such.

To use the Encoding Utility Library in your application just include a reference to dkEncodingUtils in the uses clause of the relevant units and make sure that the location of the file dkEncodingUtils.pas is included in the library path list of your Delphi IDE.

The latest version of this software may be obtained through the OpenXML web-site at "http://www.philo.de/xml/". The preferred way to contact the author is by posting to the Open XML mailing list. Instructions on how to join the mailing list can be found at "http://www.philo.de/xml/" as well.

The TEncodingInfo Class

The abstract TEncodingInfo class defines a couple of class functions for obtaining information about the names and aliases of a character set. Each derived class implements this information for a specific character set.

Public Class Functions

class function Alias(I: Integer): string; virtual; abstract;

This function gives access to a list of official names for the character set. These names are expressed in ANSI_X3.4-1968, also known as US-ASCII or simply ASCII. The names are not case-sensitive. The aliases that start with "cs" have been added for use with the Printer MIB (see RFC 1759) and contain the standard numbers along with suggestive names in order to facilitate applications that want to display the names in user interfaces. The "cs" stands for character set and is provided for applications that need a lower case first letter but want to use mixed case thereafter that cannot contain any special characters, such as underbar ("_") and dash ("-").

The I parameter corresponds to the position of an alias in the list, where 0 is the first alias, 1 is the second alias, and so on. If there is no alias corresponding to the value of I, an exception is raised: An ArgumentOutOfRangeException on the .NET platform; an EAccessViolation on the Win32/Kylix platform.

The first alias, i.e. Alias[0], always contains the MIB name of the corresponding character set.

class function AliasCount: Integer; virtual; abstract;

This function returns the number of aliases of the character set. Use the AliasCount function when iterating over all the aliases in the list, or when trying to locate the position of an alias relative to the last alias in the list.

class function Name: string; virtual; abstract;

This function returns the MIB name of the character set. It is equivalent to Alias[0].

class function MIBenum: Integer; virtual; abstract;

This function returns the MIBenum value of the character set. It is a unique number for each character set.

class function PreferredMIMEName: string; virtual; abstract

This function returns the preferred MIME name of the character set. If no preferred MIME name has been specified for this character set, an empty string is returned.

The TEncodingInfoClass Class Type

TEncodingInfoClass is the class type for TEncodingInfo classes. Win32/Kylix platform only.

The Encodings Constant

On the .NET platform the Encodings constant is an array of type TEncodingInfo. It contains a specific TEncodingInfo object for each TEncodingInfo subclass defined in the EncodingUtils unit.

On the Win32/Kylix platform the Encodings constant is an array of type TEncodingInfoClass. It contains a reference to each TEncodingInfo subclass defined in the EncodingUtils unit.

Classes derived from TEncodingInfo

The following classes have been derived from TEncodingInfo. Each class holds information about valid names for a particular character set:

TEncodingInfoAscii, TEncodingInfoIsoLatin1, TEncodingInfoIsoLatin2, TEncodingInfoIsoLatin3, TEncodingInfoIsoLatin4, TEncodingInfoIsoLatinCyrillic, TEncodingInfoIsoLatinArabic, TEncodingInfoIsoLatinGreek, TEncodingInfoIsoLatinHebrew, TEncodingInfoIsoLatin5, TEncodingInfoIsoLatin6, TEncodingInfoIsoTextComm, TEncodingInfoHalfWidthKatakana, TEncodingInfoJISEncoding, TEncodingInfoShiftJIS, TEncodingInfoEUCPPkdFmtJapanese, TEncodingInfoEUCFixWidJapanese, TEncodingInfoISO4UnitedKingdom, TEncodingInfoISO11SwedishForNames, TEncodingInfoISO15Italian, TEncodingInfoISO17Spanish, TEncodingInfoISO21German, TEncodingInfoISO60DanishNorwegian, TEncodingInfoISO69French, TEncodingInfoISO10646UTF1, TEncodingInfoISO646basic1983, TEncodingInfoInvariant, TEncodingInfoISO2Int1RefVersion, TEncodingInfoNATSSEFI, TEncodingInfoNATSSEFIADD, TEncodingInfoNATSDANO, TEncodingInfoNATSDANOADD, TEncodingInfoISO10Swedish, TEncodingInfoKSC56011987, TEncodingInfoISO2022KR, TEncodingInfoEUCKR, TEncodingInfoISO2022JP, TEncodingInfoISO2022JP2, TEncodingInfoISO13JISC6220jp, TEncodingInfoISO14JISC6220ro, TEncodingInfoISO16Portuguese, TEncodingInfoISO18Greek7Old, TEncodingInfoISO19LatinGreek, TEncodingInfoISO25French, TEncodingInfoISO27LatinGreek1, TEncodingInfoISO5427Cyrillic, TEncodingInfoISO42JISC62261978, TEncodingInfoISO47BSViewdata, TEncodingInfoISO49INIS, TEncodingInfoISO50INIS8, TEncodingInfoISO51INISCyrillic, TEncodingInfoISO5427Cyrillic1981, TEncodingInfoISO5428Greek, TEncodingInfoISO57GB1988, TEncodingInfoISO58GB231280, TEncodingInfoISO61Norwegian2, TEncodingInfoISO70VideotexSupp1, TEncodingInfoISO84Portuguese2, TEncodingInfoISO85Spanish2, TEncodingInfoISO86Hungarian, TEncodingInfoISO87JISX0208, TEncodingInfoISO88Greek7, TEncodingInfoISO89ASMO449, TEncodingInfoISO90, TEncodingInfoISO91JISC62291984a, TEncodingInfoISO92JISC62991984b, TEncodingInfoISO93JIS62291984badd, TEncodingInfoISO94JIS62291984hand, TEncodingInfoISO95JIS62291984handadd, TEncodingInfoISO96JISC62291984kana, TEncodingInfoISO2033, TEncodingInfoISO99NAPLPS, TEncodingInfoISO102T617bit, TEncodingInfoISO103T618bit, TEncodingInfoISO111ECMACyrillic, TEncodingInfoISO121Canadian1, TEncodingInfoISO122Canadian2, TEncodingInfoISO123CSAZ24341985gr, TEncodingInfoISO88596E, TEncodingInfoISO88596I, TEncodingInfoISO128T101G2, TEncodingInfoISO88598E, TEncodingInfoISO88598I, TEncodingInfoISO139CSN369103, TEncodingInfoISO141JUSIB1002, TEncodingInfoISO143IECP271, TEncodingInfoISO146Serbian, TEncodingInfoISO147Macedonian, TEncodingInfoISO150GreekCCITT, TEncodingInfoISO151Cuba, TEncodingInfoISO6937Add, TEncodingInfoISO153GOST1976874, TEncodingInfoISO8859Supp, TEncodingInfoISO10367Box, TEncodingInfoISO158Lap, TEncodingInfoISO159JISX02121990, TEncodingInfoISO646Danish, TEncodingInfoUSDK, TEncodingInfoDKUS, TEncodingInfoKSC5636, TEncodingInfoUnicode11UTF7, TEncodingInfoISO2022CN, TEncodingInfoISO2022CNEXT, TEncodingInfoUTF8, TEncodingInfoISO885913, TEncodingInfoIsoLatin8, TEncodingInfoIsoLatin9, TEncodingInfoIsoLatin10, TEncodingInfoGBK, TEncodingInfoGB18030, TEncodingInfoOSD_EBCDIC_DF04_15, TEncodingInfoOSD_EBCDIC_DF03_IRV, TEncodingInfoOSD_EBCDIC_DF04_1, TEncodingInfoISO115481, TEncodingInfoKZ1048, TEncodingInfoUCS2, TEncodingInfoUCS4, TEncodingInfoUnicodeASCII, TEncodingInfoUnicodeLatin1, TEncodingInfoISO10646J1, TEncodingInfoUnicodeIBM1261, TEncodingInfoUnicodeIBM1268, TEncodingInfoUnicodeIBM1276, TEncodingInfoUnicodeIBM1264, TEncodingInfoUnicodeIBM1265, TEncodingInfoUnicode11, TEncodingInfoSCSU, TEncodingInfoUTF7, TEncodingInfoUTF16BE, TEncodingInfoUTF16LE, TEncodingInfoUTF16, TEncodingInfoCESU8, TEncodingInfoUTF32, TEncodingInfoUTF32BE, TEncodingInfoUTF32LE, TEncodingInfoBOCU1, TEncodingInfoWindows30Latin1, TEncodingInfoWindows31Latin1, TEncodingInfoWindows31Latin2, TEncodingInfoWindows31Latin5, TEncodingInfoHPRoman8, TEncodingInfoAdobeStandardEncoding, TEncodingInfoVenturaUS, TEncodingInfoVenturaInternational, TEncodingInfoDECMCS, TEncodingInfoPC850Multilingual, TEncodingInfoPCp852, TEncodingInfoPC8CodePage437, TEncodingInfoPC8DanishNorwegian, TEncodingInfoPC862LatinHebrew, TEncodingInfoPC8Turkish, TEncodingInfoIBMSymbols, TEncodingInfoIBMThai, TEncodingInfoHPLegal, TEncodingInfoHPPiFont, TEncodingInfoHPMath8, TEncodingInfoHPPSMath, TEncodingInfoHPDesktop, TEncodingInfoVenturaMath, TEncodingInfoMicrosoftPublishing, TEncodingInfoWindows31J, TEncodingInfoGB2312, TEncodingInfoBig5, TEncodingInfoMacintosh, TEncodingInfoIBM037, TEncodingInfoIBM038, TEncodingInfoIBM273, TEncodingInfoIBM274, TEncodingInfoIBM275, TEncodingInfoIBM277, TEncodingInfoIBM278, TEncodingInfoIBM280, TEncodingInfoIBM281, TEncodingInfoIBM284, TEncodingInfoIBM285, TEncodingInfoIBM290, TEncodingInfoIBM297, TEncodingInfoIBM420, TEncodingInfoIBM423, TEncodingInfoIBM424, TEncodingInfoIBM500, TEncodingInfoIBM851, TEncodingInfoIBM855, TEncodingInfoIBM857, TEncodingInfoIBM860, TEncodingInfoIBM861, TEncodingInfoIBM863, TEncodingInfoIBM864, TEncodingInfoIBM865, TEncodingInfoIBM868, TEncodingInfoIBM869, TEncodingInfoIBM870, TEncodingInfoIBM871, TEncodingInfoIBM880, TEncodingInfoIBM891, TEncodingInfoIBM903, TEncodingInfoIBM904, TEncodingInfoIBM905, TEncodingInfoIBM918, TEncodingInfoIBM1026, TEncodingInfoIBMEBCDICATDE, TEncodingInfoEBCDICATDEA, TEncodingInfoEBCDICCAFR, TEncodingInfoEBCDICDKNO, TEncodingInfoEBCDICDKNOA, TEncodingInfoEBCDICFISE, TEncodingInfoEBCDICFISEA, TEncodingInfoEBCDICFR, TEncodingInfoEBCDICIT, TEncodingInfoEBCDICPT, TEncodingInfoEBCDICES, TEncodingInfoEBCDICESA, TEncodingInfoEBCDICESS, TEncodingInfoEBCDICUK, TEncodingInfoEBCDICUS, TEncodingInfoUnknown8Bit, TEncodingInfoMnemonic, TEncodingInfoMnem, TEncodingInfoVISCII, TEncodingInfoVIQR, TEncodingInfoKOI8R, TEncodingInfoHZGB2312, TEncodingInfoIBM866, TEncodingInfoPC775Baltic, TEncodingInfoKOI8U, TEncodingInfoIBM00858, TEncodingInfoIBM00924, TEncodingInfoIBM01140, TEncodingInfoIBM01141, TEncodingInfoIBM01142, TEncodingInfoIBM01143, TEncodingInfoIBM01144, TEncodingInfoIBM01145, TEncodingInfoIBM01146, TEncodingInfoIBM01147, TEncodingInfoIBM01148, TEncodingInfoIBM01149, TEncodingInfoBig5HKSCS, TEncodingInfoIBM1047, TEncodingInfoPTCP154, TEncodingInfoAmiga1251, TEncodingInfoKOI7switched, TEncodingInfoBRF, TEncodingInfoTSCII, TEncodingInfoWindows1250, TEncodingInfoWindows1251, TEncodingInfoWindows1252, TEncodingInfoWindows1253, TEncodingInfoWindows1254, TEncodingInfoWindows1255, TEncodingInfoWindows1256, TEncodingInfoWindows1257, TEncodingInfoWindows1258, TEncodingInfoTIS620

References

[CSMIB] IANA: Character Sets, 2001-08-23, see: "http://www.iana.org/assignments/character-sets".

[RFC 3808] McDonald, I.: "IANA Charset MIB", RFC 3808, 2004, see "http://www.ietf.org/rfc/rfc3808.txt".