Other corpora
In the press
Contact us
Norwegian Speech Corpora


The Norwegian Speech Corpora below are a collection of several subcorpora, hosted and partly or fully developed at the Text Laboratory, UiO, sometimes in cooperation with others. Some corpora are still under development but can already be used.


Norsk talespråkskorpus-Oslo-delen
Talemålsundersøkelsen i Oslo
BigBrother Info
Nordisk dialektkorpus - Scandinavian Dialect Corpus Info
Utviklingsprosesser i urbane språkmiljøer























NoTa-Oslo [Norsk talespråkskorpus-Oslo-delen] Oslo speech 2005 [Homepage] [Search]
- A corpus of orthographically transcribed speech with linked audio and video files
- Informants carefully selected w.r.t. sociolinguistic variables
- Time of recording: 2005
- Place of recording: Oslo and Oslo area
- Number of informants: 166
- Number if words: approx. 900 000
- Type of material: Interviews and conversations
- Status: Finished

TAUS [Talemålsundersøkelsen i Oslo] Oslo speech from the 1970s [Homepage] [Search]
- Originally a corpus of phonologically transcribed speech with non-linked sound files
- Transcribed orthographically with linked audio files in 2006 - 2007
- Informants carefully selected w.r.t. sociolinguistic variables
- Time of recording: 1970-1975
- Place of recording: Oslo (Frogner og Vålerenga)
- Number of informants: 59
- Number if words: approx. 244 000
- Type of material: Interviews
- Status: Finished

Big Brother [TV-show] Talemål fra unge voksne [Homepage] [Search]
- A corpus of orthographically transcribed speech with linked audio and video files
- The informants are 10 young adults from several places in Norway
- Time of recording: 2001
- Number of words: approx 550 000
- Type of material: Many different kinds of situations in the BigBrother house
- Status: Finished

Nordisk dialektkorpus - Scandinavian Dialect Corpus [Homepage] [Search]
Nordic Dialect Corpus is a corpus of Norwegian, Swedish, Danish, Faroese and Övdalian (and soon Icelandic and Finland Swedish) spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes frome a variety of sources, both old and new. The corpus contains nearly 2 million words from conversations by dialect speakers. It is transcribed and linked to audio and video, has a map function, and can be searched in a large variety of ways.

The Nordic dialect corpus and database are being developed in cooperation with our partners in the Nordic network ScanDiaSyn and the Nordic Center of Excellence, NORMS. The corpus is already available for research.


UPUS [Utviklingsprosesser i urbane språkmiljøer] [Homepage] [Search]
- Corpus under developement at the UPUS-project. Project leader Brit Mæhlum, INL, NTNU

Multimedia representation of corpora

The fact that all the speech is transcribed makes it searchable. In time all the corpora will be linked to sound and (in some cases) video files. The corpora are in the process of being grammatically tagged. The results are presented as concordances, where each line is clickable for listening and viewing sound and video files. The files are also individually downloadable, and listenable.

Multiple search options

The speech corpora are or will be searchable via words, strings of words, parts of words, grammatical tags, and events.


Fill in this form to get permission.

Research options

The way the corpora are or will be represented by high-quality transcriptions, tagging, other annotation, video and sound files, make them useful for many kinds of linguistic research: syntax, morphology, phonology, phonetics, semantics, lexicography, language technology and computational linguistics, discourse analysis, sociolinguistics etc. Given the speech modality, and the fact that the corpora have been recorded in different situations and of different people, these corpora are also useful for topics related to special studies of language in particular settings or of particular types, such as emotive situations. For this reason, it is also useful for studies in artificial intelligence, as well as psychology and sociology.

Other speech corpora

In The press:


The NoTa-Oslo-project and The Big Brother-corpus

See the the Norwegian home page here.


The UPUS project

See the UPUS home page here.


The ScanDiaSyn project

See the ScandiaSyn home page here.


Contact tekstlab-post at @iln.uio.no for more information.