NorDiaSyn - Data Collection



The Norwegian Dialect Corpus is the Norwegian part of the Nordic Dialect Corpus. In version 3 it contains recordings from 111 selected locations in Norway. The recordings have been transcribed both phonetically and orthographically. They are also grammatically tagged. Through a user-friendly, web-based search interface, you can search the entire corpus via either transcription type, and you can choose to search through the grammatical tags. The results are displayed with the audio and video, and are available for further processing.

In version 3 the Norwegian Dialect Corpus contains 2 291 133 million Norwegian words, from 165 recording sites. This includes recordings from NORMS workshops and Målførearkivet, see the bottom of the page.

Recording Locations and Selection of Informants
The field work is done in collaboration between the UiO (Text Laboratory), NTNU and UiT. Look at this map and this list to find which measurement points we have visited.

At each measuring point there are four informants:
- one man > 50 years
- one woman > 50 years
- one man < 30 years
- one woman < 30 years

Informants should ideally be born and raised at the actual measuring point, and should not have lived away for more than seven years. They should not have higher education; small places usually have no education institutions, so higher education often means that people have lived away from home for  a long time.

The informants take part in three activities with the project assistants: interview, conversation and questionnaire:

The interview is done by a project assistant with one informant at a time. In the interviews we ask simple questions that should be straightforward to answer. The point of the interviews is to get informants to talk, they are not meant to reveal a lot of information about themselves. For example, we ask if they can tell about the place where they have grown up, about childhood memories and the games they played when they were young. When we visit a recording location, we try also try to customize the questions. It may be appropriate to ask about local landmarks, roads or bridges for example.

The conversation takes place between two informants, preferably of the same age group. The field workers usually leave the room, and let the informants speak uninterrupted. The informants can choose from a list of given conversation topics to get started, but this is often not necessary. We try to create a relaxed atmosphere in the recording situation, and believe the informants relax and think it's fun. Most informants are proud that we will record them.

In the questionnaire task a project assistant plays recorded sentences with different word orders for the informants, and the informants are asked to do grammaticality judgements of the sentences w.r.t. their own dialect. The sentences in the questionnaire illustrate different grammatical structures that we know or assume vary between dialects. The sentences are presented orally (as a recording) for the informants, and have been prerecorded by a person from the relevant dialect area.

Storage of data
Conversations and interviews are transcribed and added to the searchable Nordic Dialect Corpus, and thus becomes a Norwegian Dialect Corpus, while the responses from the questionnaire are stored in the Nordic Syntax Database. Local media shows great interest in the project. This can be seen in our press archive.

Since we store the recordings of the informants and make them available, the project is subject to the guidelines from the Norwegian Data Inspectorate, see NSD. This limits how much private information the informants may provide about themselves. We tell the informants during the fieldwork not to disclose sensitive personal information about themselves and others.

Some deviation from the selection criteria
Some measuring points come in addition to those that are strictly defined above. There are two types of recordings of this type in the Norwegian Dialect Corpus:


- Recordings from the Dialect Archive at the University of Oslo (Search in Målførearkivet or listen to sound files from Målførearkivet)

- Recordings done during excursions at workhops arranged by the Nordic Center of Excellence in Microcomparative Syntax (NORMS).


The Dialect Archive recordings are 40-50 years old, and have been selected from the same measurement points as the current ones, to the extent that it has been possible. With the NORMS-recordings we have had less control over the informant selection, and it will typically be deviations from the norm for age, education, etc.

The field work has been done by different people concerning the project. See under the tab Project Info.


Norwegian Speech Corpora

Nordic Dialect Corpus and Syntax Database