arraysize: 0

CETEIcean extension: extracting a page fragment

Demonstration of an experimental feature that lets you 'break out' TEI XML content between two self-closing breaks. A self-closing break is typically used to mark the beginning of something new, such as a page, folio or column. Because isolating text between such breaks may result in malformed XML, this parser function attempts to repair the XML and render the selection.
This example select page 251 from a transcription of an edition of the Welsh version of the birth of Arthur.

Why should we want to do so?

The need for such a feature arose when I was trying to show an image of a manuscript page along with the corresponding diplomatic edition, in this case that of the Lebor na hUidre as transcribed for CELT. In one of our examples elsewhere JS is used to scroll to the intended page, which has a number of practical benefits but also one noticeable disadvantage: with so much text, it is easy to lose track (imagine if we had squeezed all of the Book of Leinster, a substantially larger manuscript, into that space).



Where break1 and break2 represent the self-closing tags, each identified using three elements delimited by ---: the tag (here pb), the attribute name (here n) and attribute value (here 251 and 252).


yn vnThere is evidently some mistake here, some words having been omitted. ffuryf a Gwrleis a hwnnw a gysgod y gyt a mi y nos honno a phann aeth ef ymaith drannoeth y gedewis ef vyvi yn veichioc. Gorev kynghor a wnn i Tewi heb yr Uthyr hyt pan aner ac yna mi ai hanvonaf yr lle y caffo i veithrin yn anwyl; ac ar hynny y trigassant yni anet mab tec, a chenhat a gyrchod ac ef hyt yn llys Kynyr Varvoc Arglwydd Penllyn a llythyrev Uthyr a rai Myrddin gantho a phan ddoeth ir llys y dodes y mab ger bron Kynyr ar llythyrev yn i law a Chynyr a agores ac a darlleodd y geiriav hynn. Y mae Uthyr Bendragon benn y Brytaniet yn anvon annerch a gwir arglwyddiaeth y Gynyr Varuoc atnabyddet dy gynddrycholder di erchi ynn drwy vy hun mynet allan oddieithyr drws yr ystauell a pha eneidiawl bynnac a welwn yno peri i veithrin yn annwyl wrth hynny mi a orchymynnaf ytti peri meithrin y mab a gowsom ni yno ac ydd ym yn i anuot attat ti a hynny ar laeth bronneu dy wraic dy hun a pheri mamaeth arall ith vab ditheu. A gwedy darvot i Gynyr darllen y llythyreu kymryt y mab a oruc a pheri i vedyddio ai henwi Arthur ai veithrin yni yttoedd bederblwyd ar ddec yn y mod ydd erchyssit iddaw o gwbyl. Uthyr Bendragon a wledychodd yn yr ynys honn hynny o vlynydded o gwbyl a merch a vu iddaw o Eygyr a elwitt Anna ac yny bedwaredd vlwyddyn ar ddec yn wythnos wyl Marthin y tervynwyt ar Uthyr Bendragon yny mod ytreithir yn Ystoria y Brytaniett. A gwedy marw Uthyr yd ymgynullasant ygyt wyrda yr ynys hyt Ynghaer Vuddai rac bronn Dyffric Archescob i ymgynhori ac ef pwy a wnelynt yn vrenin i lywio y dyrnas ac i edrych pwy a allai vott o voned a moessau a chedernyt yn vrenin canys dieu oed ganthunt varw Uthyr yn ddietiued oi gorff onyt merch, ac angen a oed yn eu kymell i hynny nit amgen nor Saesson a gyfleassai Gwrtheyrn Gwrtheneu yn yr ynys honn a pha glowssant varwolaeth Uthyr yddanvonassant gennadeu i Germania ynol eu kenedyl ac y gorysgynnyssant or ynys o Aber Hvmyr hyt mor Katneif. A gwedy gwarandaw o Ddyffrig ar berigl y deyrnas ae govit kyt ddoluriaw a oruc ar bobyl a galw attaw esgyb y deyrnas ae phenadurieit ac erchi vddunt mynet i ddethol brenin arnunt yn enw Duw. Ac yna dechreu


This feature is experimental. For reasons I won't go into here, a potential cause of error is the presence of other self-closing tags, most of which are happily ignored but some tags may not be on my radar. The issue is confounded by the possibility that some tags may behave as tag pairs as well as single self-closing tags.

Also, since the parser function expects a second parameter (break2) to mark the end of the selection, there is currently no way to fetch the final unit if it is not followed by a new beginning. You are, of course, free to add one to make it work.