NoTa-Oslo (Norwegian Speech Corpus - the Oslo part)

NoTa-Oslo is a speech corpus with interviews and conversations from 166 informants born and raised in Oslo and the Oslo area. The informants are carefully selected w.r.t. sociolinguistic variables and therefore representative in terms of age, gender, place of residence and education. NoTa-Oslo consists of approx. 957 000 words that are orthographically transcribed and morphologically tagged. The corpus is searchable in the search interface Glossa, and the transcriptions are linked to audio and video files.

The NoTa-Oslo corpus was built during the period 2004 - 2006.

NoTa-Oslo now uses the new version of Glossa, a search and post-processing tool developed by the Text Laboratory.
Log in with Feide or CLARIN. Contact us if you need another login alternative.

Search in NoTa-Oslo

Download the transcriptions from Github:

License for downloading
Go to the transcriptions on Github