CLARINO is a Norwegian infrastructure project jointly funded by the Research Council of Norway and a consortium of Norwegian universities and research institutions. Its goal is to implement the Norwegian part of CLARIN. The ultimate aim is to make existing and future language resources easily accessible for researchers and to bring eScience to humanities disciplines. The CLARINO project is coordinated by University of Bergen.
CLARINO Text Laboratory Centre is a C centre in the CLARIN infrastructure.
The table below shows Text Laboratory resources with a signed CLARIN agreement. More resources will come. Go to the Text Laboratory homepage to view all resources from the Text Laboratory.
Corpora:
The Big Brother Corpus | (2007) 440 300 tokens. Speech. Norwegian TV show from 2001. Accessible through Glossa. Licence: ![]() |
Corpus of American Nordic Speech v.3 | (2019) (746 000 tokens). Speech. American Norwegian/Swedish. Accessible through interface. Licence: ![]() |
Corpus of Doctor-Patient Consultations from Ahus | (2015) 950 000 tokens. Speech. Transcriptions without audio files. Accessible through interface. Licence: ![]() |
The Lexicographic Corpus for Norwegian Bokmål | (2013) 100 mill tokens. Written text. Norwegian Bokmål. Accessible through interface. Licence: ![]() |
(2018) 3,5 mill tokens. Speech. Norwegian dialects from 1937 - 1996. Accessible through interface. Licence: ![]() - Download metadata - Search the corpus |
|
LIA Sápmi - Sámegiela hállangiellakorpus | (2018) 190 000 tokens. Speech. Sami dialects. Accessible through interface. Licence: ![]() - Download metadata - Search the corpus |
Nordic Dialect Corpus v. 4.0 | (2013) 2.75 mill tokens. Speech. Nordic dialects. Accessible through interface. Licence: ![]() |
Nordic Syntax Database | (2013) 924 sentence judgments by Nordic dialect speakers. Accessible through interface. Licence: ![]() ![]() |
The NORINT Corpus | (2017) Speech (110 000 tokens) and written text (53 000 tokens). Norwegian as second language. Accessible through interface. Licence: ![]() |
The NORM Corpus | (2017) 1.17 mill tokens. Written pupil texts. Norwegian Bokmål and Nynorsk. Accessible through interface. Licence: ![]() |
Norwegian Words | (2013) Lexical database with 1650 Norwegian Bokmål nouns, adjectives and verbs. Accessible through interface. Licence: ![]() |
NoTa-Oslo Norsk talespråkskorpus - Oslodelen | (2006) 957 000 tokens. Speech. Oslo sociolects. Accessible through interface. Licence: ![]() |
NoWaC - Norwegian Web as Corpus v1.0 | (2010) 700 million tokens. Written text. Bokmål. Accessible through interface or download. Licence: ![]() ![]() ![]() - Download metadata - Download the corpus - Search the corpus |
Frequency lists from NoWaC | (2010) Frequency lists. Bokmål. Licence: ![]() ![]() - Download metadata - Download Frequency lists |
The SKRIV Corpus | (2016) 112 000 tokens. Written texts by students in upper secondary vocational education programs. Norwegian Bokmål. Accessible through interface. Licence: ![]() |
TAUS - Talemålsundersøkelsen i Oslo v.3 | (2007, 2020) 388 000 tokens. Speech. Oslo sosiolect from 1971-1973. Accessible through interface. Licence: ![]() |
Downloadable transcriptions (and audio files) from corpora:
Tools:
Glossa | Search and post-processing tool for text and speech corpora. Licence: ![]() |
The Oslo-Bergen Tagger | Morphological tagger for Norwegian Bokmål and Nynorsk. Licence: ![]() |
More language resources from the Text Laboratory.
Contact: tekstlab-post at iln.uio.no
Clarino Consortium partners: