mirror of https://github.com/postgres/postgres
119 lines
3.4 KiB
Plaintext
119 lines
3.4 KiB
Plaintext
|
This package contains some simple routines for manipulating XML
|
||
|
documents stored in PostgreSQL. This is a work-in-progress and
|
||
|
somewhat basic at the moment (see the file TODO for some outline of
|
||
|
what remains to be done).
|
||
|
|
||
|
At present, two modules (based on different XML handling libraries)
|
||
|
are provided.
|
||
|
|
||
|
Prerequisite:
|
||
|
|
||
|
pgxml.c:
|
||
|
expat parser 1.95.0 or newer (http://expat.sourceforge.net)
|
||
|
|
||
|
or
|
||
|
|
||
|
pgxml_dom.c:
|
||
|
libxml2 (http://xmlsoft.org)
|
||
|
|
||
|
The libxml2 version provides more complete XPath functionality, and
|
||
|
seems like a good way to go. I've left the old versions in there for
|
||
|
comparison.
|
||
|
|
||
|
Compiling and loading:
|
||
|
----------------------
|
||
|
|
||
|
The Makefile only builds the libxml2 version.
|
||
|
|
||
|
To compile, just type make.
|
||
|
|
||
|
Then you can use psql to load the two function definitions:
|
||
|
\i pgxml_dom.sql
|
||
|
|
||
|
|
||
|
Function documentation and usage:
|
||
|
---------------------------------
|
||
|
|
||
|
pgxml_parse(text) returns bool
|
||
|
parses the provided text and returns true or false if it is
|
||
|
well-formed or not. It returns NULL if the parser couldn't be
|
||
|
created for any reason.
|
||
|
|
||
|
pgxml_xpath (XQuery functions) - differs between the versions:
|
||
|
|
||
|
pgxml.c (expat version) has:
|
||
|
|
||
|
pgxml_xpath(text doc, text xpath, int n) returns text
|
||
|
parses doc and returns the cdata of the nth occurence of
|
||
|
the "simple path" entry.
|
||
|
|
||
|
However, the remainder of this document will cover the pgxml_dom.c version.
|
||
|
|
||
|
pgxml_xpath(text doc, text xpath, text toptag, text septag) returns text
|
||
|
evaluates xpath on doc, and returns the result wrapped in
|
||
|
<toptag>...</toptag> and each result node wrapped in
|
||
|
<septag></septag>. toptag and septag may be empty strings, in which
|
||
|
case the respective tag will be omitted.
|
||
|
|
||
|
Example:
|
||
|
|
||
|
Given a table docstore:
|
||
|
|
||
|
Attribute | Type | Modifier
|
||
|
-----------+---------+----------
|
||
|
docid | integer |
|
||
|
document | text |
|
||
|
|
||
|
containing documents such as (these are archaeological site
|
||
|
descriptions, in case anyone is wondering):
|
||
|
|
||
|
<?XML version="1.0"?>
|
||
|
<site provider="Foundations" sitecode="ak97" version="1">
|
||
|
<name>Church Farm, Ashton Keynes</name>
|
||
|
<invtype>watching brief</invtype>
|
||
|
<location scheme="osgb">SU04209424</location>
|
||
|
</site>
|
||
|
|
||
|
one can type:
|
||
|
|
||
|
select docid,
|
||
|
pgxml_xpath(document,'//site/name/text()','','') as sitename,
|
||
|
pgxml_xpath(document,'//site/location/text()','','') as location
|
||
|
from docstore;
|
||
|
|
||
|
and get as output:
|
||
|
|
||
|
docid | sitename | location
|
||
|
-------+--------------------------------------+------------
|
||
|
1 | Church Farm, Ashton Keynes | SU04209424
|
||
|
2 | Glebe Farm, Long Itchington | SP41506500
|
||
|
3 | The Bungalow, Thames Lane, Cricklade | SU10229362
|
||
|
(3 rows)
|
||
|
|
||
|
or, to illustrate the use of the extra tags:
|
||
|
|
||
|
select docid as id,
|
||
|
pgxml_xpath(document,'//find/type/text()','set','findtype')
|
||
|
from docstore;
|
||
|
|
||
|
id | pgxml_xpath
|
||
|
----+-------------------------------------------------------------------------
|
||
|
1 | <set></set>
|
||
|
2 | <set><findtype>Urn</findtype></set>
|
||
|
3 | <set><findtype>Pottery</findtype><findtype>Animal bone</findtype></set>
|
||
|
(3 rows)
|
||
|
|
||
|
Which produces a new, well-formed document. Note that document 1 had
|
||
|
no matching instances, so the set returned contains no
|
||
|
elements. document 2 has 1 matching element and document 3 has 2.
|
||
|
|
||
|
This is just scratching the surface because XPath allows all sorts of
|
||
|
operations.
|
||
|
|
||
|
Note: I've only implemented the return of nodeset and string values so
|
||
|
far. This covers (I think) many types of queries, however.
|
||
|
|
||
|
John Gray <jgray@azuli.co.uk> 16 August 2001
|
||
|
|
||
|
|