NorDiaSyn - Tools
 
 

 

 
 


The technical solutions
for the Nordic Dialect Corpus and the Nordic Syntax Database are developed by the Text Laboratory.

Corpus Tools
The corpus is searchable via the search interface Glossa, developed by the Text Laboratory. Glossa is a user interface for searching and results processing, built over the system IMS Corpus Work Bench Query. Search results appear as concordences coupled with audio and video.


Glossa makes it possible to process the search results. You can export them to other file formats, display them with frequency counts or as maps.


Glossa: http://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/glossa/index.html


IMS Corpus Work Bench Query System: http://www.ims.uni-stuttgart.de/forschung/projekte/CorpusWorkbench.html


Transcription Tools and transliterator
The phonetic transcription has been done via the free software Transcriber. The orthographic transcription has been done as a translation from the phonetic transcription using the semi-automatic Oslo Transliterator, also developed by the Text Laboratory. The phonetic and orthographic transcriptions are linked together in Glossa, and you can choose to search in only one of them, or both simultaneously.


Transcriber: http://trans.sourceforge.net/en/presentation.php


The Oslo Transliterator: http://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/oslo-transliterator/index.html


Taggers
To tag the Norwegian Dialect Corpus we have used
a TreeTagger which was trained on a manually corrected version of the output from the Oslo-Bergen tagger for the NoTa-project.. The performance of the tagger, measured by 10-fold cross validation, is 96, 9 %.

TreeTagger: http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/


Oslo-Bergen Tagger: http://tekstlab.uio.no/obt-ny/english/index.html


The technical solutions have been financed by NorDiaSyn and Northern Research Council (NordForsk).

 
 

Norwegian Speech Corpora


Nordic Dialect Corpus and Syntax Database

 

UiO