Bibliography

Meelen, Marieke, and David Willis, “Towards a historical treebank of Middle and Early Modern Welsh, part I: workflow and POS tagging”, Journal of Celtic Linguistics 22 (2021): 125–154.

journal article

Citation details

Contributors

Meelen (Marieke)

Willis (David)

Article

“Towards a historical treebank of Middle and Early Modern Welsh, part I: workflow and POS tagging”

Periodical

Journal of Celtic Linguistics 22 (2021)

Journal of Celtic Linguistics 22 (2021), University of Wales Press.

Volume

22

Pages

125–154

BibTeX

Download (concise)

Description

Abstract (cited)

This article introduces the working methods of the Parsed Historical Corpus of the Welsh Language (PARSHCWL). The corpus is designed to provide researchers with a tool for automatic exhaustive extraction of instances of grammatical structures from Middle and Modern Welsh texts in a way comparable to similar tools that already exist for various European languages. The major features of the corpus are outlined, along with the overall architecture of the workflow needed for a team of researchers to produce it. In this paper, the two first stages of the process, namely pre-processing of texts and automated part-of-speech (POS) tagging are discussed in some detail, focusing in particular on major issues involved in defining word boundaries and in defining a robust and useful tagset.

Subjects and topics

web page identifiers

page name: Meelen and Willis 2021 jceltl22ore

page url: https://codecs.vanhamel.nl/Meelen_and_Willis_2021_jceltl22ore
redirect: https://codecs.vanhamel.nl/Special:Redirect/page/58585
numerical alternative: https://codecs.vanhamel.nl/index.php?curid=58585
page ID: 58585
page ID tracker: https://codecs.vanhamel.nl/index.php?title=Show:ID&id=58585

Browse linked data

Contributors

Page created

April 2022