RDF Parser 1.0.6

Delphi 2009 Implementation

by Dieter Köhler


LICENSE

The contents of this file are subject to the Mozilla Public License Version 1.1 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at "http://www.mozilla.org/MPL/"

Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License.

The Original Code is "RdfParser_1_1.pas".

The Initial Developer of the Original Code is Dieter Köhler (Heidelberg, Germany, "http://www.philo.de/"). Portions created by the Initial Developer are Copyright (C) 2003-2010 Dieter Köhler. All Rights Reserved.

Alternatively, the contents of this file may be used under the terms of the GNU General Public License Version 2 or later (the "GPL"), in which case the provisions of the GPL are applicable instead of those above. If you wish to allow use of your version of this file only under the terms of the GPL, and not to allow others to use your version of this file under the terms of the MPL, indicate your decision by deleting the provisions above and replace them with the notice and other provisions required by the GPL. If you do not delete the provisions above, a recipient may use your version of this file under the terms of any one of the MPL or the GPL.


Acknowledgment

The NTripleStrUnescape() helper function uses code provided by Ernst van der Pols.


Introduction

The RdfParser_1_1 unit implements classes for parsing RDF graphs. It requires the Rdf_1_1 unit which defines basic classes to maintaine RDF graphs, and the ParserUtils unit which defines general classes for parsing character streams. The latest version of all units is available at <http://www.philo.de/rdf/>.

The current version of RdfParser_1_1 supports parsing RDF graphs stored in NTriple files or strings into a TRdfGraph object. (The TRdfGraph class is defined in the required Rdf_1_1 unit.) I am planning to add support for parsing other data formats as well as for serializing TRdfGraph objects in future versions.

The software pattern chosen for the parser is that of a sequence of streaming objects. When parsing a string the Parser (TRdfNTripleToRdfGraphParser) builds the RDF graph from a series of analyzed RDF statements requested from a Token Analyzer (TRdfTokenAnalyzer). To comply with such a request the Token Analyzer itself requests partly analyzed RDF statements from a Triple Tokenizer (TRdfNTripleTokenizer). To comply with the request of the Triple Analyzer the Triple Tokenizer requests the individual lines of an NTriple string from a Triple Line Tokenizer (TRdfNTripleLineTokenizer).

I am well aware that this strategy is not the optimum in respect to performance. A monolithic procedure would for example require far less if-then statements. But using a series of specialized streaming objects makes it comparatively easy to implement new kinds of RDF parsers on the basis of the exisiting classes.


Installation

Before you install the RdfParser_1_1 you must first intall the required Rdf_1_1 and ParserUtils units. Then, to install RdfParser_1_1 proceed as follows:
  1. Create a new directory and extract into it the RdfParser_1_1 zip-archive.
  2. Make sure that the path to the location of the RdfParser_1_1.pas file is included in Delphi's list of library paths. To include it go to the Library section of Delphi's Environment Options dialog (see the menu item: "Tools/Environment Options ...").
  3. Select the Option "Install Component" from the "Component" menu.
  4. Add the file ending with ".pas" to a new or to an already existing package.
  5. Click on OK, and next confirm that the components should be compiled and installed.
  6. Close the package window, and next confirm that the modifications should be saved.
  7. The new components are now available in the component's palette on a page entitle "RDF".

Helper Functions

function NTripleStrUnescape(S: string): UTF8String;

Replaces all N-Triples \-escape sequences in S by its UTF-8 equivalent.

Parameters:

Return Value:

Exceptions:

procedure RdfParserError(Msg: string);

Raises an ERdfParserException with the specified message string.

Parameters:

Exceptions:


Typed Constants

TRdf_N_Triple_Line_Token_Type

TRdf_N_Triple_Line_Token_Type contains the following constants used by the TRdfNTripleLineTokenizer.read() procedure to indicate the content tpye of a returned line. Valid values are:

NTL_TRIPLE_TOKEN
Indicates a line containing an RDF statement.
NTL_COMMENT_TOKEN
Indicates a line containing a comment.
NTL_EMPTY_LINE_TOKEN
Indicates an empty line.
NTL_END_OF_TEXT_TOKEN
Indicates that the end of the text is reached.
NTL_INVALID_LINE_TOKEN
Indicates an invalid line.

Exception classes

ERdfParserException = class(ERdfException);

The Class Hierarchy of the RDF Parser Classes


TRdfNTripleToRdfGraphParser = class(TComponent)

TRdfNTripleToRdfGraphParser is a class used to parse RDF graphs stored in NTriple files, streams or strings into a TRdfGraph object. (The TRdfGraph class is defined in the required Rdf_1_1 unit.)

Published Properties
RdfGraph: TRdfGraph

The target TRdfGraph object to which the new statements are added.

Public Methods
function ParseFile(const Filename: string; var Handle: Integer): Boolean; virtual;

Parses an NTriple file.

Parameters:

Var Parameters:

Return Value:

function ParseStream(const Stream: TStream; var Handle: Integer): Boolean; virtual;

Parses an NTriple stream.

Parameters:

Var Parameters:

Return Value:

function ParseString(const Expression: string; var Handle: Integer): Boolean; virtual;

Parses an NTriple string.

Parameters:

Var Parameters:

Return Value:


TRdfCustomAnalyzer = class

TRdfCustomAnalyzer is an abstract base class for classes offering sequential access to detailed information about an RDF source's individual statements.

Public Methods
function Read(out RdfSubject, RdfPredicate, RdfObject, RdfObjectLexical, RdfObjectLanguage, RdfObjectDatatype: string): Boolean; virtual; abstract;

Reads the next statement (if any) from the RDF source.

Parameters:

Return Value:

Exceptions:

  • ERdfException
    Raised if a well-formedness error was detected in the RDF source.

  • TRdfTokenAnalyzer = class

    TRdfTokenAnalyzer is used to sequentially access the analyzed statements from a TRdfCustomTokenizer object.

    Public Methods
    constructor Create(const ATokenizer: TRdfCustomTokenizer);

    Constructs and initializes an instance of TRdfTokenAnalyzer with the specified TRdfCustomTokenizer object.

    Parameters:

    function Read(out RdfSubject, RdfPredicate, RdfObject, RdfObjectLexical, RdfObjectLanguage, RdfObjectDatatype: string): Boolean; override;

    Retrieves the next statement (if any).

    Parameters:

    Return Value:

    Exceptions:

  • ERdfException
    Raised if a well-formedness error was detected in the RDF source.

  • TRdfCustomTokenizer = class

    TRdfCustomTokenizer is an abstract base class for classes offering sequential access to semi-analyzed statements contained in an RDF source.

    Public Methods
    function Read(out RdfSubject, RdfPredicate, RdfObject: string): Boolean; virtual; abstract;

    Reads the next statement (if any) from the RDF source.

    Parameters:

    Return Value:

    Exceptions:

  • ERdfException
    Raised if a well-formedness error was detected in the statement.

  • TRdfNTripleTokenizer = class(TRdfCustomTokenizer)

    TRdfNTripleTokenizer is used to sequentially access the semi-analyzed statements contained in an NTriple source.

    Public Methods
    constructor Create(const ALineTokenizer: TRdfNTripleLineTokenizer);

    Constructs and initializes an instance of TRdfNTripleTokenizer with the specified TRdfNTripleLineTokenizer object.

    Parameters:

    function Read(out RdfSubject, RdfPredicate, RdfObject: string): Boolean; override;

    Reads the next statement (if any) from the NTriple source.

    Parameters:

    Return Value:

    Exceptions:

  • ERdfException
    Raised if a well-formedness error was detected in the statement.

  • TRdfNTripleLineTokenizer = class

    TRdfNTripleLineTokenizer is used to sequentially access the lines of an NTriple source.

    Public Methods
    constructor Create(const AInputSource: TRdfInputSource);

    Constructs and initializes an instance of TRdfNTripleLineTokenizer with the specified TRdfInputSource object.

    Parameters:

    procedure Read(out Symbol: TRdf_N_Triple_Line_Token_Type; out NextLine: string); virtual;

    Reads the next line (if any) from the NTriple source.

    Parameters:

    Exceptions:

  • EConvertError
    Raised if one of the characters of the next line cannot be converted to a UCS4 code point.

  • TRdfInputSource = class(TUtilsInputSource)

    The TRdfInputSource inherits from TUtilsInputSource (specified in the required ParserUtils unit). It adds no new features, but only publishes the 'Line' property which is protected in TUtilsInputSource.