BachelorDragon.png

The bachelor programme Celtic Languages and Culture at Utrecht University is under threat.


arraysize: 0


CETEIcean extension: XPath selection

Selecting nodes and creating excerpts from a document.

Example 1

For the purpose of demonstration, we have taken part of a document from the CELT website (an invaluable resource!) containing the Annals of Ulster. This partial document is saved on a wiki page. To retrieve a section, for instance one of the entries under the year 1123, we need to select it first; and to select this passage, we need to know what XPath (short for XML Path) expression to use.

The code of this section looks like:

<div2 n="U1123.1">
<p> <on type="people/lineage">Gailenga</on> do ghab<ex>ail</ex> tighi i n-<placeName>Daim Liac <personName><foreName type="saint">Ciannan</foreName></personName></placeName> for <personName><foreName>Murch<ex>adh</ex></foreName> <sn>H. Mael Sechl<ex>ainn</ex></sn></personName> for <term type="king">righ</term> <placeName type="kingdom">Temhrach</placeName> co ro loiscset in tech &ampersir; <num value="80"><ex>ocht</ex>mogha</num> taighi ime &ampersir; co ro marbsat soch<ex>aid</ex>i dia muinnter. Ternai <ex>imorro</ex> <personName><foreName>Murch<ex>adh</ex></foreName></personName> do ainiuch <personName><foreName type="saint">Ciannan</foreName></personName> cen marb<ex>adh</ex> cen loscadh.</p>
</div2>
  • In this example, we're going to identify our XML node by specifying the tagname (div2), attribute name (n) and the relevant attribute value (U1123.2). For security reasons, each tagname (div2 in the example) must be preceded by a namespace prefix (ctc:).
{{#cetei:doc=Cetei:AU-test
|sel=//ctc:div2[@n='U1123.2']
}}
Result
Ammus anaithnigh do thabairt for comarba Ailbe .i. Mael Mordha m. m. Clothna .i. tech do ghabail fair for lar Imlecha fein ⁊ for m. Cerbaill H. Ciarmaic .i. ri Aine co ro marbadh morseiser and. Ternatur imorro na doene maithi ass tria rath Ailbhe ⁊ na h-ecailsi. Ro loiscedh imorro ann Bernan Ailbhe. Ro marbadh imorro ria cind mis inti ro gabh in tech .i. in Gilla Caech H. Ciarmaic, ⁊ deochain eisidhe iar n-ainmniughudh ⁊ ro beanadh a cenn de i sarughudh Ailbhe ⁊ in Comdhegh.

Minor comment. Because there is no other use of this pair of attribute name and value to be found in the document, we could have left the tagname unspecified, using a wildcard (*) instead:

{{#cetei:doc=Cetei:AU-test
|sel=//ctc:*[@n='U1123.2']
}}

Example 2: combining expressions

To join two or more expressions, XPath offers the pipe symbol, i.e. a vertical bar. Because MediaWiki reserves the same symbol for other uses and thus has special meaning, we need to escape it as {{!}}.

{{#cetei:doc=Cetei:Documents/64162-8515
|sel=//ctc:p[@xml:id='I.i.17'] {{!}} //ctc:p[@xml:id='I.i.18']
}}
Result
Ispania land is þryscyte ⁊ eall mid fleote utan ymbhæfd, ge eac binnan ymbhæfd ofer ða land ægþer ge of þæm garsecge ge of ðam Wendelsæ. An ðæra garena lið suðwest ongean þæt igland þe Gades hatte, ⁊ oþer east ongean þæt land Narbonense, ⁊ se ðridda norðwest ongean Brigantia Gallia burh ⁊ ongean Scotland ofer ðone sæs earm, on geryhte ongean þæne muðan þe mon hæt Scene. Seo us fyrre Ispania, hyre is be westan garsecg ⁊ be norðan, Wendelsæ be suðan, ⁊ be eastan seo us nearre Ispania; be norðan þære synt Equitania, ⁊ be norðaneastan is se weald Pireni, ⁊ be eastan Narbonense, ⁊ be suðan Wendelsæ. Brittannia þæt igland, hit in norðeastlang, ⁊ hit is eahta hund mila lang ⁊ twa hund mila brad. Þonne is be suðan him on oðre healfe þæs sæs earmes Gallia Bellica, ⁊ on westhealfe on oþre healfe þæs sæs earmes is Ibærnia þæt iglnd, ⁊ on norðhealfe Orcadus þæt igland. Igbernia, þæt we Scotland hatað, hit is on ælce healfe ymbfangen mid garsecge, ⁊ for ðon þe sio sunne þær gæð near on setl þonne on oðrum lande, þær syndon lyðran wedera þonne on Brettannia. þonne be westannorðan Ibernia is þæt ytemeste land þæt man hæt Thila, ⁊ him is feawum mannum cuð for ðære offerfyrre.

These excerpts are from J. Bately's edition of the Old English Orosius.

Example 3

From the same document of the Annals of Ulster, let's fetch all uses of the tag term where type="church". P.S. Results appear in the order in which they are found. XPath does not allow you to sort results alphabetically or otherwise, which would be a job for XSLT and might be worth looking into if it solves anyone's problem.

{{#cetei:doc=Cetei:AU-test
|sel=//ctc:term[@type='church']
}}
Result
ecclesiaeclaistempullceallatempollceallacheallaceallecailsitempullęcluischellthempluibhtempallthempallchilltempailltempluibhthempluibhtempallcillTempulltempaillchellatemplaibhReclescellaceallceallaeclustempalltempaillreiclesatempaillCeallacellacellceallchillchellaibhtempulreiclesthemplaibhcelltempalltempoillceall