CMDI 1.1. Metadata
Header
MdCreator: Kristin Hagen
MdCreationDate: 2016-11-04
MdSelfLink:
MdProfile: clarin.eu:cr1:p_1407745711925
MdCollectionDisplayName: Clarino - Textlab
Resources
ResourceProxyList:
JournalFileProxyList:
ResourceRelationList:
IsPartOfList:
Components
corpusProfile:
resourceCommonInfo [ComponentId=‘clarin.eu:cr1:c_1396012485126’]:
resourceType: corpus
identificationInfo [ComponentId=‘clarin.eu:cr1:c_1396012485125’]:
resourceName [xml:lang=‘nb’]: NORINT-korpuset
resourceName [xml:lang=‘en’]: The NORINT Corpus
description [xml:lang=‘en’]: The NORINT Corpus consists of speech from 51 and written texts from 116 adult learners of Norwegian as second language, all of whom were taking advanced Norwegian courses (≈the CEFR level B2) at the University of Oslo during the summers of 2014 and 2015.

The NORINT Corpus is divided into three sub-parts:

- NORINT Speech: The speech part of the corpus consists of interviews and conversations, 111,000 words all together. In the interviews, a teacher asks L2 learners general questions about their background, studies, work, and future plans. In addition, the same L2 learners converse in pairs about optional themes such as culture, leisure, travel, or life in Norway. There are both audio and video recordings of the interviews and conversations.
The recordings are transcribed orthographically with the transcription tool Elan.
- NORINT Recited: 57 L2 learners, 51 of whom contributed to the NORINT Speech sub-part, recite a short story, as well as 60 non-contextualized sentences. This part of the corpus has been audio-recorded.
- NORINT Text: The text part of the corpus consists of 53,247 words from 116 exam papers written by adult L2 learners taking their Norwegian exams. The informants are partially the same as in NORINT Speech and NORINT Recited but the identification of participants is not possible in the corpus because of privacy protection.
The texts are available in three formats: one original hand written version in pdf format, one written digital copy of the original version and one version where all the orthographic errors are corrected. The original text version and the corrected version are linked together.

The corpus is searchable in the search interface Glossa, and the transcriptions are linked to audio and video files.
description [xml:lang=‘nb’]: NORINT-korpuset inneholder muntlig materiale fra 51 og skriftlig materiale fra 116 voksne internasjonale studenter som gikk på norskkurs på høyere nivå (≈CEFR-nivå B2) ved Universitetet i Oslo sommeren 2014 og 2015.

NORINT-korpuset består av tre deler:

- NORINT tale: Taledelen av korpuset består av intervjuer og samtaler, i alt 111 000 ord. Studentene ble intervjuet om bakgrunn, studier, arbeid og fremtidsplaner. I tillegg er det gjort video- og lydopptak der informantene samtaler to og to om emner som kultur, fritid, reiser eller livet i Norge. Det er 30 – 40 minutters opptak av hver student.
Opptakene er transkribert ortografisk med transkripsjonsprogrammet Elan.
- NORINT opplest: 57 informanter, 51 av dem de samme som bidro til NORINT tale, leser opp 60 utvalgte setninger og en liten historie. Det finnes bare lydopptak av opplesningene.
- NORINT tekst: Tekstdelen av korpuset består av 53 247 ord fra 116 eksamensoppgaver. Informantene er delvis de samme som i den muntlige delen av materialet. Av hensyn til personvern er det imidlertid ikke synlige koplinger i korpuset.
Tekstene i NORINT tekst foreligger i tre ulike formater: en håndskrevet originalversjon i pdf-format, en innskrevet nøyaktig kopi av originalversjonen og en versjon der alle ortografiske feil er rettet. Tekstversjonene og de korrigerte versjonene er lenket sammen.

Korpuset er søkbart i søkeverktøyet Glossa der transkripsjonene dessuten er koplet til lyd- og videofiler.
resourceShortName: NORINT
url: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/projects/norint/index.html
url: https://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/norint/index.html
PID: http://hdl.handle.net/11538/0000-000B-C01E-B
distributionInfo [ComponentId=‘clarin.eu:cr1:c_1396012485124’]:
licenceInfo [ComponentId=‘clarin.eu:cr1:c_1396012485158’]:
userCategory: Academic
distributionAccessMedium: accessibleThroughInterface
executionLocation: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/norint/
executionLocation: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/projects/norint/index.html
licence [ComponentId=‘clarin.eu:cr1:c_1447674760330’]:
licenceFamily: CLARIN
licenceName: CLARIN_ACA-NC-LOC-PRIV-ND-*
licenceURL: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
conditionsOfUse: BY
conditionsOfUse: ID
conditionsOfUse: LOC
conditionsOfUse: NC
conditionsOfUse: ND
conditionsOfUse: NORED
conditionsOfUse: PRIV
nonStandardConditionsOfUse: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the corpus is accesible only through Glossa, a search and post-processing tool developed by the Text Laboratory. The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory.
licensor:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: l.a.harnas@iln.uio.no
email: annely.tomson@iln.uio.no
url: http://www.hf.uio.no/iln/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
distributionRightsHolder:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies, University of Oslo
organizationShortName [xml:lang=‘en’]: ILN
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/english/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
contact:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Harnæs
givenName: Liv Andlem
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: l.a.harnas@iln.uio.no
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Tomson
givenName: Annely
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: annely.tomson@iln.uio.no
metadataInfo [ComponentId=‘clarin.eu:cr1:c_1407745711922’]:
metadataCreationDate: 2017-03-21
metadataLastDateUpdated: 2017-09-18
metadataCreator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Hagen
givenName: Kristin
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: kristin.hagen@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
versionInfo [ComponentId=‘clarin.eu:cr1:c_1430905751648’]:
version: 1
lastDateUpdated: 2016-09-01
resourceDocumentationInfo [ComponentId=‘clarin.eu:cr1:c_1355150532301’]:
documentationStructured [ComponentId=‘clarin.eu:cr1:c_1361876010648’]:
role: documentation
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: manual
title [xml:lang=‘nb’]: Brukerveiledning til Norint-korpuset
author: Kristin Hagen and Viktoria Holund in cooperation with Annely Thomson
year: 2017
url: http://tekstlab.uio.no/norint/index.html
documentLanguageName: Norwegian Bokmål
documentLanguageId: nb
resourceCreationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711921’]:
creationStartDate: 2014-01-01
creationEndDate: 2016-09-01
resourceCreator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Tomson
givenName: Annely
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: annely.tomson@iln.uio.no
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Harnæs
givenName: Liv Andlem
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: l.a.harnas@iln.uio.no
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName: The NORINT Corpus
fundingType: ownFunds
funder: Department of Linguistic and Scandinavian Studies, University of Oslo
corpusInfo [ComponentId=‘clarin.eu:cr1:c_1407745711878’]:
corpusType: Written Corpus
corpusType: Multimodal Corpus
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:
mediaType: text
corpusTextInfo [ComponentId=‘clarin.eu:cr1:c_1396012485188’]:
textFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477072’]:
mimeType: txt
characterEncodingInfo [ComponentId=‘clarin.eu:cr1:c_1447674760355’]:
characterEncoding: utf-8
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:
mediaType: audio
corpusAudioInfo [ComponentId=‘clarin.eu:cr1:c_1404130561236’]:
audioSizeInfo [ComponentId=‘clarin.eu:cr1:c_1360230992160’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 57 participants x 3 audio files each for NORINT opplest (Recited)
sizeUnit: files
settingInfo [ComponentId=‘clarin.eu:cr1:c_1360230992162’]:
naturality: readSpeech
conversationalType: monologue
scenarioType: other
audience: no
interactivity: nonInteractive
audioFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477070’]:
mimeType: mp3 and wav
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:
mediaType: video
corpusVideoInfo [ComponentId=‘clarin.eu:cr1:c_1407745711880’]:
videoContentInfo [ComponentId=‘clarin.eu:cr1:c_1360931019779’]:
typeOfVideoContent: Grown up foreign students learning Norwegian as their second language
settingInfo [ComponentId=‘clarin.eu:cr1:c_1360230992162’]:
naturality: spontaneous
conversationalType: dialogue
interactivity: overlapping
interaction: Each informant participates in one conversation with another informant and an interview with a teacher.
videoFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477073’]:
mimeType: mp4
corpusPartGeneralInfo [ComponentId=‘clarin.eu:cr1:c_1407745711882’]:
sourceWorkInfo [ComponentId=‘clarin.eu:cr1:c_1407745712071’]:
workDescription: The NORINT Corpus is divided into three sub-parts:

- NORINT Speech: The speech part of the corpus consists of interviews and conversations, 111,000 words all together. In the interviews, a teacher asks L2 learners general questions about their background, studies, work, and future plans. In addition, the same L2 learners converse in pairs about optional themes such as culture, leisure, travel, or life in Norway. There are both audio and video recordings of the interviews and conversations.
The recordings are transcribed orthographically with the transcription tool Elan.

- NORINT Recited: 57 L2 learners, 47 of whom contributed to the NORINT Speech sub-part, recite a short story, as well as 60 non-contextualized sentences. This part of the corpus has been audio-recorded.

- NORINT Text: The text part of the corpus consists of 53,247 words from 116 exam papers written by adult L2 learners taking their Norwegian exams. The informants are partially the same as in NORINT Speech and NORINT Recited but the identification of participants is not possible in the corpus because of privacy protection.
The texts are available in three formats: one original hand written version in pdf format, one written digital copy of the original version and one version where all the orthographic errors are corrected. The original text version and the corrected version are linked together.
personSourceSetInfo [ComponentId=‘clarin.eu:cr1:c_1360931019775’]:
numberOfPersons: 57
ageOfPersons: adult
sexOfPersons: mixed
originOfPersons: nonNative
dialectAccentOfPersons: Foreign students learning Norwegian.
lingualityInfo [ComponentId=‘clarin.eu:cr1:c_1355150532313’]:
lingualityType: monolingual
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: nb
languageName: Norwegian Bokmål
modalityInfo [ComponentId=‘clarin.eu:cr1:c_1447674760356’]:
modalityType: writtenLanguage
sizePerModality [ComponentId=‘clarin.eu:cr1:c_1447674760351’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 53 247 in NORINT tekst (Text)
sizeUnit: words
modalityInfo [ComponentId=‘clarin.eu:cr1:c_1447674760356’]:
modalityType: spokenLanguage
sizePerModality [ComponentId=‘clarin.eu:cr1:c_1447674760351’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 110 979 in NORINT tale (Speech)
sizeUnit: words
modalityInfo [ComponentId=‘clarin.eu:cr1:c_1447674760356’]:
modalityType: spokenLanguage
modalityTypeDetails: recited text
sizePerModality [ComponentId=‘clarin.eu:cr1:c_1447674760351’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 36 895 in NORINT opplest (Recited)
sizeUnit: words
annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:
annotationType: lemmatization
annotationType: morphosyntacticAnnotation-posTagging
segmentationLevel: word
tagset: The Oslo Bergen-tagger tagset: http://tekstlab.uio.no/obt-ny/english/index.html
tagsetLanguageId: Nb
tagsetLanguageName: Norwegian Bokmål
theoreticModel: Constraint Grammar
annotationMode: automatic
annotationManualUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532325’]:
role: annotationManual
documentUnstructured: http://www.tekstlab.uio.no/obt-ny/english/index.html
annotationTool [ComponentId=‘clarin.eu:cr1:c_1355150532326’]:
targetResourceNameURI: The Oslo-Bergen Tagger: http://tekstlab.uio.no/obt-ny/english/index.html
annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:
annotationType: morphosyntacticAnnotation-posTagging
annotatedElements: other
segmentationLevel: word
tagset: POS tagset created for the statistical NoTa-tagger - based on the tagset of the Oslo Bergen Tagger.
tagsetLanguageId: Nb
tagsetLanguageName: Norwegian Bokmål
theoreticModel: TreeTagger
annotationMode: automatic
annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:
role: annotationManual
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: article
title [xml:lang=‘en’]: Tagging a Norwegian Speech Corpus
author: Anders Nøklestad and Åshild Søfteland
editor: Joakim Nivre,Heiki-Jaan Kaalep,Kadri Muischnek, Mare Koit
year: 2007
bookTitle: Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007
pages: 245–248
conference: Nodalida 2007
documentLanguageName: English
documentLanguageId: en
annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:
role: annotationManual
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: article
title [xml:lang=‘nb’]: Manuell morfologisk
tagging av NoTa-materialet med støtte fra en statistisk tagger.
author: Åshild Søfteland og Anders Nøklestad
editor: Janne Bondi Johannessen og Kristin Hagen
year: 2008
publisher: Novus forlag
bookTitle: Språk i Oslo. Ny forskning omkring talespråk
pages: 226–234.
ISBN: 978-82-7099-471-7
documentLanguageName: Norwegian
documentLanguageId: nb
annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:
role: annotationManual
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: manual
title [xml:lang=‘nb’]: NoTa-taggeren: TAGGEVEILEDNING
author: Åshild Søfteland
year: 2007
url: http://www.tekstlab.uio.no/nota/oslo/Taggeveiledning2.pdf
documentLanguageName: Norwegian bokmål
documentLanguageId: nb
classificationInfo [ComponentId=‘clarin.eu:cr1:c_1403588862809’]:
genreInfo [ComponentId=‘clarin.eu:cr1:c_1407745711877’]:
genreType: textGenre
genre: unstandardised
unstandardisedGenre: Exam papers written by students
The texts are available in three different versions: one scanned original in pdf format and two transcribed versions in txt format: one original transcription with errors and one version where the errors are corrected.
All versions are linked and it is possible to search in both transcribed versions.
genreInfo [ComponentId=‘clarin.eu:cr1:c_1407745711877’]:
genreType: speechGenre
genre: informal
genreInfo [ComponentId=‘clarin.eu:cr1:c_1407745711877’]:
genreType: speechGenre
genre: recited
timeCoverageInfo [ComponentId=‘clarin.eu:cr1:c_1447674760358’]:
timeCoverage: 2014