Home        
CLARINO        
CLARIN        
Text Laboratory        



Welcome to CLARINO Text Laboratory Centre

CLARINO is a Norwegian infrastructure project jointly funded by the Research Council of Norway and a consortium of Norwegian universities and research institutions. Its goal is to implement the Norwegian part of CLARIN. The ultimate aim is to make existing and future language resources easily accessible for researchers and to bring eScience to humanities disciplines. The CLARINO project is coordinated by University of Bergen.

CLARINO Text Laboratory Centre is a C centre in the CLARIN infrastructure.
The table below shows Text Laboratory resources with a signed CLARIN agreement. More resources will come. Go to the Text Laboratory homepage to view all resources from the Text Laboratory.

Corpora:

The Big Brother Corpus (2007) 550 000 words. Speech. Norwegian TV show from 2001 - Download metadata - Search the corpus - Licence: Accessible through interface. License conditions *
Corpus of American Norwegian Speech (2015) 182 000 words. Speech. American Norwegian - Download metadata - Search the corpus - License: Accessible through interface. Licence conditions
Nordic Dialect Corpus (2013) 2.8 mill words. Speech. Nordic dialects - Download metadata - Search the corpus - Licence: Accessible through interface. License conditions
Nordic Syntax Database (2013) 924 sentence judgments by Nordic dialect speakers - Download metadata - Search the database - Licence: Accessible through interface.
The Oslo Corpus of Tagged Norwegian Texts
Bokmål
(1999) 18.5 mill words. Written text. Norwegian Bokmål. Download metadata - Search the corpus - Licence: Accessible through interface. License conditions
The Oslo Corpus of Tagged Norwegian Texts
Nynorsk
(1999) 3.8 mill words. Written texts. Norwegian Nynorsk. Download metadata - Search the corpus - Licence: Accessible through interface. License conditions
NoTa-Oslo Norsk talespråkskorpus - Oslodelen (2006) 900 000 words. Speech. Oslo sociolects - Download metadata - Search the corpus - Licence: Accessible through interface. License conditions
NoWaC - Norwegian Web as Corpus v1.0 (2010) 700 million tokens. Written text. Bokmål - Download metadata - Search the corpus - Licence: Accessible through interface. License conditions.
Download the corpus - Licence:
Frequency lists from NoWaC (2010) Frequency lists. Bokmål - Download metadata - Download Frequency lists - Licence:
TAUS - Talemålsundersøkelsen i Oslo (2007) 245 500 words. Speech. Oslo sosiolect from 1971-1973 - Download metadata - Search the corpus - Licence: Accessible through interface. License conditions

Tools:

Glossa Search and post-processing tool for text and speech corpora. - Download metadata - Download Glossa - Licence: MIT Licence
The Oslo-Bergen Tagger Morphological tagger for Norwegian Bokmål and Nynorsk - Download metadata - Download OBT - Licence: GPL


More language resources from the Text Laboratory.

Contact: tekstlab-post at iln.uio.no

 

 

Clarino Consortium partners: