NorDiaSyn - Transcription



Transcripton of the Norwegian Dialect Corpus
We have decided to do two transcriptions of all recordings, one phonetic and one orthographic, despite the fact that transcription is expensive.

Phonetic transcription
It is natural to choose a phonetic transcription of the recordings. This way dialect features will be clearly presented in the written representation, whether they are phonological, morphological, syntactic or lexical. A written representation of speech is a great help for the linguist when it comes to get a fast overview of the material.

The phonetic transcription method is based upon Papazian and Helleland's Norsk talemål. Lokal og sosial variasjon (2005), but we use no special characters, only the Norwegian alphabet.Also, the transcription is quite broad. There are two reasons for these choices. First, we want the transcriptions to be easy to read for most people. Second, there are many transcribers involved in the work, and transcriptions that are too detailed would lead to extension of training time and greater risk for individual differences between the transcriptions.
But phonetic transcriptions have some disadvantages; there is so much variation between the different individual and dialectal versions of each word, that they would be difficult to process with automatic methods such as tagging, etc. Also, it would be problematic to make general queries in the corpus for those who do not have a full overview of all the variant forms. For these reasons, we also have an orthographic transcription.

Orthographic transcription
An orthographic transcription is important because it is a generalization above all the variation. This way one can do general searches, and use automated methods, such as tagging. Doing the orthographic transcription is much faster than doing the phonetic one, because we use a semi-automatic dialect transliterator which translates from the phonetic transcription to bokmål orthography. This transliterator was developed by the Text Laboratory specifically for the NorDiaSyn Project.

Example of the two transcriptions:
Orthographic: 'Jeg har fått den'
jæ ha fått n

Orthographic: 'Jeg har ikke penger'
jæ ha kke pennger

Transcription Guidelines (Norwegian)

Translation to orthographic transcription - Guidelines (Norwegian)


Norwegian Speech Corpora

Nordic Dialect Corpus and Syntax Database