Search Facilities
Data Collection
Transcription
Technical Solutions
Tagging
Tagging

Tagging of Danish
The Danish transcriptions are lemmatised and POS tagged by a Danish Constraint Grammar Tagger developed for written Danish, see Bick (2003).
Tagging performed by: Eckhard Bick, University of Southern Denmark, Odense.

(Bick, Eckhard (2003), PaNoLa - The Danish Connection, In: Henrik Holmboe (red.) Nordic Language Technology, Årbog for Nordisk Sprogteknologisk Forskningsprogram 2000-2004 (Yearbook 2002). pp. 75-88. Copenhagen: Museum Tusculanum.)

Tagging of Faroese
The Faroese transcriptions have first been tagged with a Constraint Grammar Tagger developed by Trond Trosterud for written Faroese, see Trosterud (2009). Since spoken Faroese has a lot of words that are not approved in written standard Faroese, about half of the material is manually corrected after the Constraint Grammar tagging. Finally a TreeTagger is trained on the corrected material, and the rest of the transcriptions are tagged again.
Tagging performed by Anders Nøklestad. Remco Knooihuizen corrected the Constraint Grammar tagging.

Financed by: NordForsk through the project Scandinavian Dialect Infrastructure: Corpus, Database and Dialect Maps (situated at the Text Laboratory).

(Trosterud, Trond. 2009. A constraint grammar for Faroese. NEALT Proceedings Series.)

Tagging of Icelandic
The Icelandic transcriptions are first tagged with a tagger for written Icelandic, see Loftsson (2008). Some of the transcriptions are manually corrected afterwards.
Tagging performed by Anders Nøklestad. Gísli Rúnar Harðarson corrected the tagging.
Financed by: NordForsk through the project Scandinavian Dialect Infrastructure: Corpus, Database and Dialect Maps (situated at the Text Laboratory).

(Loftsson, Hrafn. 2008. Tagging Icelandic text: A linguistic rule-based approach. Nordic Journal of Linguistics 31.1.)

Tagging of Norwegian
The orthographic version of the corpus is lemmatised and POS tagged by a TreeTagger originally developed for Oslo speech. The Oslo speech tagger (the NoTa tagger) was trained on manually corrected output from the the written language Oslo-Bergen tagger, see Nøklestad and Søfteland (2008).
Tagging performed by: The Text Laboratory, UiO

(Nøklestad, Anders and Åshild Søfteland (2007). Tagging a Norwegian Speech Corpus. NODALIDA 2007 Conference Proceedings.).

Tagging of Swedish and Övdalian
The Swedish tagger is a TnT tagger, see Kokkinakis (2003). The tagger is trained on the Swedish PAROLE corpus and manually tagged orthographic Övdalian transcriptions. The tagger is applied to both the Swedish transcriptions and the orthographic versions of the Övdalian transcriptions. Tagging is performed at the Text Laboratory (UiO) by André Lynum. Piotr Garbacz has manually tagged the Övdalian transcriptions.
Financed by: ILN (UiO)

(Kokkinakis, Sofie Johansson. 2003. En studie över påverkande faktorer i ordklasstaggning. Baserad på taggning av svensk text med EPOS. Göteborg University.)