NorDiaSyn - Tools

The technical solutions for the Nordic Dialect Corpus and the Nordic Syntax Database are developed by the Text Laboratory.

Corpus Tools
The corpus is searchable via the search interface Glossa, developed by the Text Laboratory. Glossa is a user interface for searching and results processing, built over the system IMS Corpus Work Bench Query. Search results appear as concordences coupled with audio and video.

Glossa makes it possible to process the search results. You can export them to other file formats, display them with frequency counts or as maps.

Glossa: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/glossa/index.html

IMS Corpus Work Bench Query System: https://cwb.sourceforge.io/

Transcription Tools and transliterator
The phonetic transcription has been done via the free software Transcriber. The orthographic transcription has been done as a translation from the phonetic transcription using the semi-automatic Oslo Transliterator, also developed by the Text Laboratory. The phonetic and orthographic transcriptions are linked together in Glossa, and you can choose to search in only one of them, or both simultaneously.

Transcriber: https://trans.sourceforge.net/en/presentation.php

The Oslo Transliterator: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/oslo-transliterator/index.html

Taggers
To tag the Norwegian Dialect Corpus we have used a TreeTagger which was trained on a manually corrected version of the output from the Oslo-Bergen tagger for the NoTa-project.. The performance of the tagger, measured by 10-fold cross validation, is 96, 9 %.

TreeTagger: https://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

Oslo-Bergen Tagger: https://tekstlab.uio.no/obt-ny/english/index.html

The technical solutions have been financed by NorDiaSyn and Northern Research Council (NordForsk).