RDF Parser 1.0.6
Delphi 2009 Implementation
by Dieter Köhler
LICENSE
The contents of this file are subject to the Mozilla Public License Version
1.1 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
"http://www.mozilla.org/MPL/"
Software distributed under the License is distributed on an "AS IS" basis,
WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for
the specific language governing rights and limitations under the License.
The Original Code is "RdfParser_1_1.pas".
The Initial Developer of the Original Code is Dieter Köhler (Heidelberg,
Germany, "http://www.philo.de/"). Portions created by the Initial Developer
are Copyright (C) 2003-2010 Dieter Köhler. All Rights Reserved.
Alternatively, the contents of this file may be used under the terms of the
GNU General Public License Version 2 or later (the "GPL"), in which case the
provisions of the GPL are applicable instead of those above. If you wish to
allow use of your version of this file only under the terms of the GPL, and
not to allow others to use your version of this file under the terms of the
MPL, indicate your decision by deleting the provisions above and replace them
with the notice and other provisions required by the GPL. If you do not delete
the provisions above, a recipient may use your version of this file under the
terms of any one of the MPL or the GPL.
Acknowledgment
The NTripleStrUnescape() helper function uses code provided by Ernst van der Pols.
Introduction
The RdfParser_1_1 unit implements classes for parsing RDF graphs. It requires the Rdf_1_1 unit which defines basic classes to maintaine RDF graphs, and the ParserUtils unit which defines general classes for parsing character streams. The latest version of all units is available at <http://www.philo.de/rdf/>.
The current version of RdfParser_1_1 supports parsing RDF graphs stored in NTriple files or strings into a TRdfGraph object. (The TRdfGraph class is defined in the required Rdf_1_1 unit.) I am planning to add support for parsing other data formats as well as for serializing TRdfGraph objects in future versions.
The software pattern chosen for the parser is that of a sequence of streaming objects. When parsing a string the Parser (TRdfNTripleToRdfGraphParser) builds the RDF graph from a series of analyzed RDF statements requested from a Token Analyzer (TRdfTokenAnalyzer). To comply with such a request the Token Analyzer itself requests partly analyzed RDF statements from a Triple Tokenizer (TRdfNTripleTokenizer). To comply with the request of the Triple Analyzer the Triple Tokenizer requests the individual lines of an NTriple string from a Triple Line Tokenizer (TRdfNTripleLineTokenizer).
I am well aware that this strategy is not the optimum in respect to performance. A monolithic procedure would for example require far less if-then statements. But using a series of specialized streaming objects makes it comparatively easy to implement new kinds of RDF parsers on the basis of the exisiting classes.
Installation
Before you install the RdfParser_1_1 you must first intall the required Rdf_1_1 and ParserUtils units. Then, to install RdfParser_1_1 proceed as follows:
- Create a new directory and extract into it the RdfParser_1_1 zip-archive.
- Make sure that the path to the location of the RdfParser_1_1.pas file is included in Delphi's list of library paths. To include it go to the Library section of Delphi's Environment Options dialog (see the menu item: "Tools/Environment Options ...").
- Select the Option "Install Component" from the "Component" menu.
- Add the file ending with ".pas" to a new or to an already existing package.
- Click on OK, and next confirm that the components should be compiled and installed.
- Close the package window, and next confirm that the modifications should be saved.
- The new components are now available in the component's palette on a page entitle "RDF".
Helper Functions
function NTripleStrUnescape(S: string): UTF8String;
Replaces all N-Triples \-escape sequences in S by its UTF-8 equivalent.
Parameters:
- S
The N-Triples string to unescape.
Return Value:
- The content of S as an unescaped UTF-8 string.
Exceptions:
- EConvertError
Raised if S contains invalid characters or escape sequences.
procedure RdfParserError(Msg: string);
Raises an ERdfParserException with the specified message string.
Parameters:
- Msg
The message string of the ERdfParserException to raise.
Exceptions:
- ERdfParserException
Always raised.
Typed Constants
TRdf_N_Triple_Line_Token_Type
TRdf_N_Triple_Line_Token_Type contains the following constants used by the TRdfNTripleLineTokenizer.read() procedure to indicate the content tpye of a returned line. Valid values are:
- NTL_TRIPLE_TOKEN
- Indicates a line containing an RDF statement.
- NTL_COMMENT_TOKEN
- Indicates a line containing a comment.
- NTL_EMPTY_LINE_TOKEN
- Indicates an empty line.
- NTL_END_OF_TEXT_TOKEN
- Indicates that the end of the text is reached.
- NTL_INVALID_LINE_TOKEN
- Indicates an invalid line.
Exception classes
ERdfParserException = class(ERdfException);
The Class Hierarchy of the RDF Parser Classes
- TObject
- TComponent
- TRdfNTripleToRdfGraphParser
- TRdfCustomAnalyzer
- TRdfCustomTokenizer
- TRdfNTripleLineTokenizer
TRdfNTripleToRdfGraphParser = class(TComponent)
TRdfNTripleToRdfGraphParser is a class used to parse RDF graphs stored in NTriple files, streams or strings into a TRdfGraph object. (The TRdfGraph class is defined in the required Rdf_1_1 unit.)
Published Properties
RdfGraph: TRdfGraph
The target TRdfGraph object to which the new statements are added.
Public Methods
function ParseFile(const Filename: string;
var Handle: Integer): Boolean; virtual;
Parses an NTriple file.
Parameters:
- Filename
The path to the file to be parsed.
Var Parameters:
- Handle
Specifies the handle used to add statements to the graph. Handles are important if you want to append the contents of the file to a graph which already contains statements. Blank nodes in RDF graphs are represented in the NTriple format by local nodeIDs. So we need a means to distinguish between the nodeIDs if the same nodeID is used in different graphs to be merged. The handle parameter plays exactly this role: Only nodes having the same value of both, nodeID and handle, are treated as equivalent. So, if information about one RDF graph is distributed among several files, you should use the same handle when parsing the files. Otherwise, if you want to merge different RDF graphs, you should use different handles for each file to be parsed. To automatically aquire an unused handle from the graph, pass a parameter of a negative or null value; the function will then automatically find a new handle for the current parsing and return its value for further use.
Return Value:
- 'True' if the parsing was successful, 'False' otherwise.
function ParseStream(const Stream: TStream;
var Handle: Integer): Boolean; virtual;
Parses an NTriple stream.
Parameters:
- Stream
The stream to be parsed.
Var Parameters:
- Handle
Specifies the handle used to add statements to the graph. -- For details see the description of the ParseFile() function.
Return Value:
- 'True' if the parsing was successful, 'False' otherwise.
function ParseString(const Expression: string;
var Handle: Integer): Boolean; virtual;
Parses an NTriple string.
Parameters:
- Expression
The string to be parsed.
Var Parameters:
- Handle
Specifies the handle used to add statements to the graph. -- For details see the description of the ParseFile() function.
Return Value:
- 'True' if the parsing was successful, 'False' otherwise.
TRdfCustomAnalyzer = class
TRdfCustomAnalyzer is an abstract base class for classes offering sequential access to detailed information about an RDF source's individual statements.
Public Methods
function Read(out RdfSubject,
RdfPredicate,
RdfObject,
RdfObjectLexical,
RdfObjectLanguage,
RdfObjectDatatype: string): Boolean; virtual; abstract;
Reads the next statement (if any) from the RDF source.
Parameters:
- RdfSubject
Returns the subject of the next statement or an empty string if there was no next statement.
- RdfPredicate
Returns the predicate of the next statement or an empty string if there was no next statement.
- RdfObject
Returns the object of the next statement unless the object is a lexical. If the object is a lexical the string '#lexical' is returned. If there was no next statement an empty string is returned.
- RdfObjectLexical
Returns the object lexical if the statement has one, or an empty string otherwise or if there was no next statement. All \-escape sequences are resolved into their UTF-8 equivalents.
- RdfObjectLanguage
Returns the object lenguage if the statement has one, or an empty string otherwise or if there was no next statement.
- RdfObjectDatatype
Returns the object datatype if the statement has one, or an empty string otherwise or if there was no next statement.
Return Value:
- 'True' if reading the next expression was successful, 'False' otherwise.
Exceptions:
ERdfException
Raised if a well-formedness error was detected in the RDF source.
TRdfTokenAnalyzer = class
TRdfTokenAnalyzer is used to sequentially access the analyzed statements from a TRdfCustomTokenizer object.
Public Methods
constructor Create(const ATokenizer: TRdfCustomTokenizer);
Constructs and initializes an instance of TRdfTokenAnalyzer with the specified TRdfCustomTokenizer object.
Parameters:
- ATokenizer
The TRdfCustomTokenizer object which gives access to the statements to be processed.
function Read(out RdfSubject,
RdfPredicate,
RdfObject,
RdfObjectLexical,
RdfObjectLanguage,
RdfObjectDatatype: string): Boolean; override;
Retrieves the next statement (if any).
Parameters:
- RdfSubject
Returns the subject of the next statement or an empty string if there was no next statement.
- RdfPredicate
Returns the predicate of the next statement or an empty string if there was no next statement.
- RdfObject
Returns the object of the next statement unless the object is a lexical. If the object is a lexical the string '#lexical' is returned. If there was no next statement an empty string is returned.
- RdfObjectLexical
Returns the object lexical if the statement has one, or an empty string otherwise or if there was no next statement. All \-escape sequences are resolved into their UTF-8 equivalents.
- RdfObjectLanguage
Returns the object lenguage if the statement has one, or an empty string otherwise or if there was no next statement.
- RdfObjectDatatype
Returns the object datatype if the statement has one, or an empty string otherwise or if there was no next statement.
Return Value:
- 'True' if reading the next statement was successful, 'False' otherwise.
Exceptions:
ERdfException
Raised if a well-formedness error was detected in the RDF source.
TRdfCustomTokenizer = class
TRdfCustomTokenizer is an abstract base class for classes offering sequential access to semi-analyzed statements contained in an RDF source.
Public Methods
function Read(out RdfSubject,
RdfPredicate,
RdfObject: string): Boolean; virtual; abstract;
Reads the next statement (if any) from the RDF source.
Parameters:
- RdfSubject
Returns the subject of the next statement or an empty string if there was no next statement.
- RdfPredicate
Returns the predicate of the next statement or an empty string if there was no next statement.
- RdfObject
Returns the object of the next statement or an empty string if there was no next statement.
Return Value:
- 'True' if reading the next statement was successful, 'False' otherwise.
Exceptions:
ERdfException
Raised if a well-formedness error was detected in the statement.
TRdfNTripleTokenizer = class(TRdfCustomTokenizer)
TRdfNTripleTokenizer is used to sequentially access the semi-analyzed statements contained in an NTriple source.
Public Methods
constructor Create(const ALineTokenizer: TRdfNTripleLineTokenizer);
Constructs and initializes an instance of TRdfNTripleTokenizer with the specified TRdfNTripleLineTokenizer object.
Parameters:
- ALineTokenizer
The TRdfNTripleLineTokenizer object which gives access to the NTriple source to be processed.
function Read(out RdfSubject,
RdfPredicate,
RdfObject: string): Boolean; override;
Reads the next statement (if any) from the NTriple source.
Parameters:
- RdfSubject
Returns the subject of the next statement or an empty string if there was no next statement.
- RdfPredicate
Returns the predicate of the next statement or an empty string if there was no next statement.
- RdfObject
Returns the object of the next statement or an empty string if there was no next statement.
Return Value:
- 'True' if reading the next statement was successful, 'False' otherwise.
Exceptions:
ERdfException
Raised if a well-formedness error was detected in the statement.
TRdfNTripleLineTokenizer = class
TRdfNTripleLineTokenizer is used to sequentially access the lines of an NTriple source.
Public Methods
constructor Create(const AInputSource: TRdfInputSource);
Constructs and initializes an instance of TRdfNTripleLineTokenizer with the specified TRdfInputSource object.
Parameters:
- AInputSource
The TRdfInputSource object which gives access to the RDF source to be processed.
procedure Read(out Symbol: TRdf_N_Triple_Line_Token_Type;
out NextLine: string); virtual;
Reads the next line (if any) from the NTriple source.
Parameters:
- Symbol
Returns the type of the next line returned. This is any of the following constants: NTL_TRIPLE_TOKEN, NTL_COMMENT_TOKEN, NTL_EMPTY_LINE_TOKEN, NTL_END_OF_TEXT_TOKEN, NTL_INVALID_LINE_TOKEN. For details see the description of the TRdf_N_Triple_Line_Token_Type typed constants.
- NextLine
Returns the next line, or an empty string if there was no next line or a well-formedness error was detected. When the next line is of type NTL_TRIPLE_TOKEN, before returning the line the ending full stop is removed and the resulting string is trimed.
Exceptions:
EConvertError
Raised if one of the characters of the next line cannot be converted to a UCS4 code point.
TRdfInputSource = class(TUtilsInputSource)
The TRdfInputSource inherits from TUtilsInputSource (specified in the required ParserUtils unit). It adds no new features, but only publishes the 'Line' property which is protected in TUtilsInputSource.