Parser Utilities Library 1.0.0

Delphi 5, 6, 7 and Kylix Implementation

by Dieter Köhler


LICENSE

The contents of this file are subject to the Mozilla Public License Version 1.1 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at "http://www.mozilla.org/MPL/"

Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License.

The Original Code is "ParserUtils.pas".

The Initial Developer of the Original Code is Dieter Köhler (Heidelberg, Germany, "http://www.philo.de/"). Portions created by the Initial Developer are Copyright (C) 2003 Dieter Köhler. All Rights Reserved.

Alternatively, the contents of this file may be used under the terms of the GNU General Public License Version 2 or later (the "GPL"), in which case the provisions of the GPL are applicable instead of those above. If you wish to allow use of your version of this file only under the terms of the GPL, and not to allow others to use your version of this file under the terms of the MPL, indicate your decision by deleting the provisions above and replace them with the notice and other provisions required by the GPL. If you do not delete the provisions above, a recipient may use your version of this file under the terms of any one of the MPL or the GPL.


Acknowledgment

The TUtilsCustomReader and TUtilsCustomWriter classes were written by Robert Marquardt.


Introduction

The Parser Utilities Library contains general classes for parsing a byte stream. The latest version of this software is available at <http://www.philo.de/xml/>.


Using the unit

The Tree Utilities Library does not contain any components to be registered. So using it from inside your own projects is very simple: Add "ParserUtils" to the uses clause of your unit and make sure that the path to the location of the ParserUtils.pas file is included in Delphi's list of library paths. To include it go to the Library section of Delphi's Environment Options dialog (see the menu item: "Tools/Environment Options ...").


Defined Resourcestrings

These strings are used for the error messages of exceptions.


Typed Constants


TUtilsCustomReader = class

Use TUtilsCustomReader as a base class when defining a class for buffered input of stream data.

Protected Properties
property BufSize: Integer (readonly)

Returns the size of the buffer as specified in the constructor.

property Position: Longint

Position is used to track the reader's position within the stream. The value of Position will always be inside the most recent buffer block read. Thus, for reading, Position will always be less than the stream's Position.

Protected Methods
procedure FlushBuffer; virtual;

FlushBuffer synchronizes the reader's buffer with the associated stream by setting the stream's Position to match the reader's Position.

function GetPosition: Longint;

This function is called by the Position property to get the reader's position within the stream.

Return Value:

function Read( var Buf; const Count: Longint): Boolean; virtual;

Attempts to read up to Count bytes from the associated stream into Buf.

Parameters:

Return Value:

procedure SetPosition(Value: Integer);

This procedure is called by the Position property to specify the reader's position within the stream.

Parameters:

Public Methods
constructor Create(const Stream: TStream; const BufSize: Integer);

Creates a new TUtilsCustomReader object. Create allocates memory for a TUtilsCustomReader object, and associates it with the stream passed in the Stream parameter, with a buffer of size BufSize.

Parameters:

destructor Destroy; override;

Destroys the TUtilsCustomReader instance and frees its memory. Do not call Destroy directly in an application. Call Free instead, which checks for a nil reference before calling Destroy.


TUtilsCustomWriter = class

Use TUtilsCustomWriter as a base class when defining a class for buffered output of stream data.

Protected Properties
property BufSize: Integer (readonly)

Returns the size of the buffer as specified in the constructor.

property Position: Longint

Position is used to track the writers's position within the stream. The value of Position will always be inside the most recent buffer block wrote. Thus, for writing, Position will always be greater than the stream's Position.

Protected Methods
procedure FlushBuffer; virtual;

FlushBuffer synchronizes the writer's buffer with the associated stream by setting the stream's Position to match the writer's Position.

function GetPosition: Longint;

This function is called by the Position property to get the writer's position within the stream.

Return Value:

procedure SetPosition(Value: Integer);

This procedure is called by the Position property to specify the writer's position within the stream.

Parameters:

procedure Write(const Buf; const Count: Longint): Boolean; virtual;

Writes Count bytes from Buf to the associated stream.

Parameters:

Exceptions:

Public Methods
constructor Create(const Stream: TStream; const BufSize: Integer);

Creates a new TUtilsCustomWriter object. Create allocates memory for a TUtilsCustomWriter object, and associates it with the stream passed in the Stream parameter, with a buffer of size BufSize.

Parameters:

destructor Destroy; override;

Destroys the TUtilsCustomWriter instance and frees its memory. Do not call Destroy directly in an application. Call Free instead, which checks for a nil reference before calling Destroy.


TUtilsInputSource = class(TUtilsCustomReader)

TUtilsInputSource encapsulates information about a character stream input source in a single object.

Protected Properties
Column: Integer (readonly)

The column number of the current character. If the current character is a string terminator ($9C) the column number of the previous character is returned.

Line: Integer (readonly)

The line number of the current character. The line number is automatically incremented fo each LINE FEED ($0A) detected. If the current character is a string terminator ($9C) the line number of the previous character is returned.

NormalizeLineFeed: Boolean (default True)

If 'True' (the default), line breaks are adjusted to Linux-style breaks with a single linefeed character, i.e. a sequence of CARRIAGE RETURN ($0D) + LINE FEED ($0A) or a single CARRIAGE RETURN is normalized to a single LINE FEED ($0A). If 'False' no normalization is taking place.

TabWidth: Integer

Specifies the width of TAB characters ($09) when calculating the Column number. This is especially useful for adjusting the TUtilsInputSource object to the settings of some editor. The default value of TabWidth is 4.

Public Properties
Bof: Boolean (readonly)

'True' if the input source is at its start position, i.e. the value of the current code point is $98 (START OF STRING); 'False' otherwise.

CurrentCodePoint: Longint (readonly)

The UCS-4 code point of the current character. Immediately after creating a TUtilsInputSource object the value $98 (START OF STRING) is returned. When the end of the input source is reached the value $9C (STRING TERMINATOR) is returned.

Encoding: TdomEncodingType (readonly)

Returns the character encoding scheme used by the input stream.

Eof: Boolean (readonly)

'True' if the end of the input stream was reached, i.e. the value of the current code point is $9C (STRING TERMINATOR); 'False' otherwise.

NextCodePoint: Longint (readonly)

The UCS-4 code point of the character following the current character. If the current character is of code point $9C (STRING TERMINATOR) or if the last character of the input source, the value $9C (STRING TERMINATOR) is returned.

PreviousCodePoint: Longint (readonly)

The UCS-4 code point of the character preceding the current character. Immediately after creating a TUtilsInputSource object or if the current character if positioned at the first character of the input source or if the input source is empty the value $98 (START OF STRING) is returned.

Public Methods
constructor Create(const Stream: TStream; const LineOffset, ColumnOffset, BufSize: Integer; const AEncoding: TdomEncodingType);

Constructs and initializes an instance of TUtilsInputSource with the specified Stream.

Parameters:

Exceptions:

function Match(Ucs2Str: WideString): Boolean; virtual;

Advances the current code point as far as the following content of the input stream matches the specified WideString. After calling Match, if the specified WideString completely matched the following content of the input stream, the position of the current code point is that of the last matched character. If the following content of the input stream did not completely match the specified WideString, the position of the current code point after calling Match is that of the first mismatched character.

Hint: If the input stream contains a character of code point $9C (STRING TERMINATOR) the TUtilsInputSource object cannot advance the current character beyond this character. The Match function may nevertheless test for STRING TERMINATOR which must appear at the end of the specified wideString in order to get a chance for a positive result.

Parameters:

Return Value:

Exceptions:

procedure Next; virtual;

Advances the current character to the next character (if any) of the input stream. If the code point of the current character is $9C (STRING TERMINATOR) calling Next has no effect. If the end of the input stream is reached the code point of the current character is set to $9C (STRING TERMINATOR).

Hint: If the input stream contains a character of code point $9C (STRING TERMINATOR) the TUtilsInputSource object cannot advance the current character beyond this character. Note also that if the value of the current character is $9C the code point returned by the NextChar property is always $9C no matter whether the end of the input stream was reached or not.

Exceptions:

procedure Reset; virtual;

Resets the input source to its initial position and state.

Exceptions:

function SkipNext(Ucs2Str: WideString): Integer; virtual;

Advances the current character to the next character (if any) of the input stream while skipping any UCS-2 character contained in Ucs2Str. If the code point of the current character is $9C (STRING TERMINATOR) calling SkipNext has no effect. If the end of the input stream is reached the code point of the current character is set to $9C (STRING TERMINATOR).

Hint: If the input stream contains a character of code point $9C (STRING TERMINATOR) the TUtilsInputSource object cannot advance the current character beyond this character. Including $9C in the Ucs2Str parameter has no effect. Note also that if the value of the current character is $9C the code point returned by the NextChar property is always $9C no matter whether the end of the input stream was reached or not.

Parameters:

Return Value:

Exceptions:


TUtilsAutoDetectInputSource = class(TUtilsInputSource)

TUtilsAutoDetectInputSource is a TUtilsInputSource descendant which can autodetect UTF-8 or UTF-16 encodings when initialized with a stream starting with a byte order marks.

Public Properties
ByteOrderMarkType: TUtilsBOMType (readonly)

The type of the Byte Order Mark (if any) of the input stream. This is one of the following values:

Public Methods
constructor Create(const Stream: TStream; const LineOffset, ColumnOffset, BufSize: Integer; const AEncoding: TdomEncodingType);

Constructs and initializes an instance of TUtilsInputSource with the specified Stream. If the specified Stream starts with a UTF-8 or UTF-16 byte order mark, the character encoding scheme is set as indicated by the byte order mark and the byte order mark is skipped, i.e. it is not accessable via the CurrentCodePoint etc. properties. If no byte order mark was found the specified default character encoding scheme is used.

Parameters:

Exceptions: