CMDI 1.1 Metadata
Header
MdCreator: Kristin Hagen
MdCreationDate: 2019-11-21
MdProfile: clarin.eu:cr1:p_1407745711925
MdCollectionDisplayName: Clarino - Textlab
Resources
ResourceProxyList:
ResourceProxy [id=‘nordic-dialect-corpus-lp’]:
ResourceType [mimetype=‘’]: LandingPage
ResourceRef: http://www.tekstlab.uio.no/nota/scandiasyn/index.html
ResourceProxy [id=‘ndc-corpus’]:
ResourceType: Resource
ResourceRef: https://tekstlab.uio.no/glossa3/ndc2
ResourceProxy [id=‘ndc-transcriptions’]:
ResourceType [mimetype=‘’]: Resource
ResourceRef: http://www.tekstlab.uio.no/scandiasyn/download.html
JournalFileProxyList:
ResourceRelationList:
ResourceRelation:
RelationType: transcriptions
Res1 [ref=‘ndc-corpus’]:
Res2 [ref=‘ndc-transcriptions’]:
IsPartOfList:
Components
corpusProfile:
resourceCommonInfo [ComponentId=‘clarin.eu:cr1:c_1396012485126’]:
resourceType [ref=‘ndc-corpus’]: corpus
identificationInfo [ComponentId=‘clarin.eu:cr1:c_1396012485125’]:
resourceName [xml:lang=‘en’]: Nordic Dialect Corpus v. 4.0
resourceName [xml:lang=‘nb’]: Nordisk dialektkorpus v. 4.0
description [xml:lang=‘en’]: Nordic Dialect Corpus v.4.0 is a corpus of Norwegian, Swedish, Danish, Faroese, Icelandic and Övdalian spoken language. It consists of spontaneous speech data from dialects of the North Germanic languages across all of the Nordic countries. The linguistic data in the corpus comes from a variety of sources, (see homepage - Data Collection), recorded in 1998 - 2015. The corpus contains more than 2.75 million words from conversations and interviews by dialect speakers. It is transcribed and linked to audio and video, has a map function, and can be searched in a large variety of ways. Even if the aim of the corpus is Nordic syntax research, the corpus is a general one, a Norwegian Dialect Corpus, a Swedish Dialect Corpus and so on, to be used in a wide range of research areas, such as phonology, morphology and lexicography.

Note! v. 3.0 contains old recordings and transcriptions from Målførearkivet (Oslo Old Dialect Archive. The same transcriptions are now searchable in LIA Norwegian - Corpus of Old Dialect Recordings.
Use v. 4.0 to search the corpus without the old Målførearkiv recordings.
resourceShortName [xml:lang=‘en’]: NDC - Nordic Dialect Corpus v. 4.0
url: http://www.tekstlab.uio.no/nota/scandiasyn/
PID: http://hdl.handle.net/11538/0000-0005-E7C7-6
distributionInfo [ComponentId=‘clarin.eu:cr1:c_1396012485124’] [ref=‘ndc-corpus’]:
licenceInfo [ComponentId=‘clarin.eu:cr1:c_1396012485158’]:
userCategory: Academic
distributionAccessMedium: accessibleThroughInterface
executionLocation: http://www.tekstlab.uio.no/nota/scandiasyn/
licence [ComponentId=‘clarin.eu:cr1:c_1447674760330’]:
licenceFamily: CLARIN
licenceName: CLARIN_ACA-NC-LOC-PRIV-ND-*
licenceURL: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
conditionsOfUse: *
conditionsOfUse: BY
conditionsOfUse: ID
conditionsOfUse: LOC
conditionsOfUse: NC
conditionsOfUse: ND
conditionsOfUse: NORED
conditionsOfUse: PRIV
nonStandardConditionsOfUse: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the corpus is accessible only through Glossa, a search and post-processing tool developed by the Text Laboratory.
The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory.
Please note that every individual researcher is responsible for treating the participants in the corpus with respect and sincerity. Furthermore, the participants must be kept anonymous in every published paper or other output.
licensor:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/english/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
distributionRightsHolder:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/english/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
contact:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
metadataInfo [ComponentId=‘clarin.eu:cr1:c_1407745711922’]:
metadataCreationDate: 2015-02-03
metadataLastDateUpdated: 2021-04-16
metadataCreator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Hagen
givenName: Kristin
sex: female
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
validationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711923’]:
validated: true
validationType: content
validationMode: manual
validationModeDetails: The transcriptions are proof read against the audio files. The national projects NorDiaSyn, DanDiaSyn and SweDiaSyn have proof read own transcriptions, see homepage - Transcription
validationExtent: full
resourceDocumentationInfo [ComponentId=‘clarin.eu:cr1:c_1355150532301’]:
documentationStructured [ComponentId=‘clarin.eu:cr1:c_1361876010648’]:
role: documentation
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: other
title: Nordic Dialect Corpus and Syntax Database
author: The Text Laboratory
year: 2013
url: http://www.tekstlab.uio.no/nota/scandiasyn/
documentLanguageId: en
documentationStructured [ComponentId=‘clarin.eu:cr1:c_1361876010648’]:
role: documentation
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: manual
title: The Nordic Dialect Corpus - Search Interface Documentation
author: Eirik Olsen
year: 2014
url: http://www.tekstlab.uio.no/nota/scandiasyn/help/
documentLanguageId: en
documentationStructured [ComponentId=‘clarin.eu:cr1:c_1361876010648’]:
role: documentation
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: book
title [xml:lang=‘nb’]: Om artiklene i denne boka og Nordisk dialektkorpus
editor: Janne Bondi Johannessen og Kristin Hagen
year: 2014
publisher: Novus forlag
bookTitle: Språk i Norge og nabolanda. Ny forskning om talespråk.
ISBN: 978-82-7099-795-4
documentLanguageName: Norwegian bokmål
documentLanguageId: nb
resourceCreationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711921’]:
creationStartDate: 2005-01-01
creationEndDate: 2019-09-31
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName [xml:lang=‘en’]: Scandinavian Dialect Syntax
projectShortName: ScanDiaSyn
url: http://websim.arkivert.uit.no/scandiasyn/scandiasyn/index.html%3fcolapsemenu=colapsemenu
url: http://www.tekstlab.uio.no/nota/scandiasyn/index.html
fundingType: other
funder: http://websim.arkivert.uit.no/scandiasyn/scandiasyn/29
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName [xml:lang=‘nb’]: NorDiaSyn - Norsk dialektsyntaks
projectName: Nordiasyn - Norwegian Dialect Syntax
projectShortName: Nordiasyn
url: http://www.tekstlab.uio.no/nota/NorDiaSyn/index.html
url: http://www.tekstlab.uio.no/nota/NorDiaSyn/english/index.html
fundingType: nationalFunds
funder: The Research Council of Norway
fundingCountry: Norway
projectStartDate: 2009-01-01
projectEndDate: 2013-12-31
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName [xml:lang=‘nb’]: For the funding of the national projects in Norway, Sweden, Denmark, Iceland and Faroese islands, see under National Projects: http://www.tekstlab.uio.no/nota/scandiasyn/dialect_data_collection.html
url: http://www.tekstlab.uio.no/nota/scandiasyn/dialect_data_collection.html
fundingType: nationalFunds
corpusInfo [ComponentId=‘clarin.eu:cr1:c_1407745711878’]:
corpusType [ref=‘ndc-corpus’]: Multimodal Corpus
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:
mediaType: text
corpusTextInfo [ComponentId=‘clarin.eu:cr1:c_1396012485188’]:
textFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477072’]:
mimeType: txt
sizePerTextFormat [ComponentId=‘clarin.eu:cr1:c_1447674760342’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 2 754 289
sizeUnit: tokens
characterEncodingInfo [ComponentId=‘clarin.eu:cr1:c_1447674760355’]:
characterEncoding: utf-8
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:
mediaType: video
corpusVideoInfo [ComponentId=‘clarin.eu:cr1:c_1407745711880’]:
videoContentInfo [ComponentId=‘clarin.eu:cr1:c_1360931019779’]:
typeOfVideoContent: (Some recordings in the corpus are audio only, below are the video recordings)
Norway: informal conversations and semi-formal interwievs. 438 informants from 111 places.
Âlvdalen, Sweden: interviews and conversations: 17 informants from 7 places
Denmark: intervievs and conversations:18 informants from 4 places
Faroese islands: intervievs and conversations: 20 informants from 5 places
Iceland: conversations: 6 informants from 2 places
textIncludedInVideo: none
dynamicElementInfo [ComponentId=‘clarin.eu:cr1:c_1360931019781’]:
bodyParts: face
bodyParts: arms
settingInfo [ComponentId=‘clarin.eu:cr1:c_1360230992162’]:
naturality: spontaneous
conversationalType: dialogue
audience: few
interactivity: overlapping
interaction: Two scenarios in the corpus:
1) semiformal interview: research assistant/researcher and informant(s).
2) Free conversation between two informants. Research assistants were some times passively present in the room during the conversations to prevent conversations about sensitive matters
videoFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477073’]:
mimeType: videos in mpeg4 streaming format available through Glossa
frameRate: 25
resolutionInfo [ComponentId=‘clarin.eu:cr1:c_1360931019784’]:
sizeWidth: 400
sizeHeight: 300
resolutionStandard: HD.720
compressionInfo [ComponentId=‘clarin.eu:cr1:c_1360230992165’]:
compression: true
compressionName: mpg
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:
mediaType: audio
corpusAudioInfo [ComponentId=‘clarin.eu:cr1:c_1404130561236’]:
audioSizeInfo [ComponentId=‘clarin.eu:cr1:c_1360230992160’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: approx 27 GB
sizeUnit: gb
audioContentInfo [ComponentId=‘clarin.eu:cr1:c_1360230992161’]:
textualDescription: Norway:
1)old audio recordings from Målførearkivet, University of Oslo. Interviews: 126 informants from 52 places.
(In v. 4.0 these recordings are moved to LIA Norwegian - Corpus of Old Dialect Recordings. They are still searcable in NDC v. 3.0)

2) New recordings: informal conversations and semi-formal interwievs. 438 informants from 111 places.

Sweden: interviews. 133 informants from 37 places.
+
Âlvdalen, Sweden: interviews and conversations: 17 informants from 7 places

Denmark: interviews: 81 informants from 15 places

Iceland: intervievs and conversations: 48 informants from 8 places

Faroese islands: intervievs and conversations: 20 informants from 8 places
settingInfo [ComponentId=‘clarin.eu:cr1:c_1360230992162’]:
naturality: spontaneous
conversationalType: dialogue
audience: few
interactivity: overlapping
interaction: Two scenarios:
1) (semiformal) interview: research assistant or researcher and informant(s).
2) Free conversation between two informants. Research assistants were sometimes passively present in the room during the conversations to prevent conversations about sensitive matters
audioFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477070’]:
mimeType: wav and mp3
signalEncoding: linearPCM
samplingRate: 32
quantization: 64
numberOfTracks: 1
recordingQuality: medium
compressionInfo [ComponentId=‘clarin.eu:cr1:c_1360230992165’]:
compression: true
compressionName: mp3
corpusPartGeneralInfo [ComponentId=‘clarin.eu:cr1:c_1407745711882’]:
personSourceSetInfo [ComponentId=‘clarin.eu:cr1:c_1360931019775’]:
numberOfPersons: 737
ageOfPersons: teenager
ageOfPersons: adult
ageOfPersons: elderly
ageRangeStart: 11
ageRangeEnd: 94
sexOfPersons: mixed
originOfPersons: native
dialectAccentOfPersons: Dialects from Norway, Sweden, Denmark, the Faroe Islands, Iceland and Älvdalen.
geographicDistributionOfPersons: Norway, Sweden, Denmark, the Faroe Islands, Iceland and Älvdalen
lingualityInfo [ComponentId=‘clarin.eu:cr1:c_1355150532313’]:
lingualityType: multilingual
multilingualityType: other
multilingualityTypeDetails: Interviews and conversations in 5 scandinavian languages. Can be translated to english by google translate
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: nb
languageName: Norwegian Bokmål (the orthographic transcriptions)
sizePerLanguage [ComponentId=‘clarin.eu:cr1:c_1447674760349’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 1 997 920
sizeUnit: tokens
languageVarietyInfo [ComponentId=‘clarin.eu:cr1:c_1428388179422’]:
languageVarietyType: dialect
languageVarietyName: Dialects from 111 places in Norway, 438 informants
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: Sv
languageName: Swedish (Övdalien included)
sizePerLanguage [ComponentId=‘clarin.eu:cr1:c_1447674760349’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 376 868,14 798 of them are Övdalian
sizeUnit: tokens
languageVarietyInfo [ComponentId=‘clarin.eu:cr1:c_1428388179422’]:
languageVarietyType: dialect
languageVarietyName: Dialects from 44 places in Sweden, 150 informants
17 informants from 7 places are Övdalian.
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: Da
languageName: Danish
sizePerLanguage [ComponentId=‘clarin.eu:cr1:c_1447674760349’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 220 360
sizeUnit: tokens
languageVarietyInfo [ComponentId=‘clarin.eu:cr1:c_1428388179422’]:
languageVarietyType: dialect
languageVarietyName: Dialects from 15 places in Denmark. 81 informants
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: Is
languageName: Icelandic
sizePerLanguage [ComponentId=‘clarin.eu:cr1:c_1447674760349’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 94 338
sizeUnit: tokens
languageVarietyInfo [ComponentId=‘clarin.eu:cr1:c_1428388179422’]:
languageVarietyType: dialect
languageVarietyName: Dialects from 8 places in Iceland, 48 informants
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: fo
languageName: Faroese
sizePerLanguage [ComponentId=‘clarin.eu:cr1:c_1447674760349’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 64 803
sizeUnit: tokens
languageVarietyInfo [ComponentId=‘clarin.eu:cr1:c_1428388179422’]:
languageVarietyType: dialect
languageVarietyName: Dialects from 5 places on the Faroese islands, 20 informants
modalityInfo [ComponentId=‘clarin.eu:cr1:c_1447674760356’]:
modalityType: spokenLanguage
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 2 754 289
sizeUnit: tokens
annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:
annotationType: morphosyntacticAnnotation-posTagging
annotatedElements: other
segmentationLevel: word
annotationFormat: See http://www.tekstlab.uio.no/nota/scandiasyn/tagging.html for tagging of the five languages
tagset: See http://www.tekstlab.uio.no/nota/scandiasyn/tagging.html for tagging of the five languages
annotationMode: automatic
annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:
annotationType: speechAnnotation-phoneticTranscription
annotationManualUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532325’]:
role: annotationManual
documentUnstructured: Norwegian and Övdalian have phonetic transcriptions, see http://www.tekstlab.uio.no/nota/scandiasyn/transcription.html
annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:
annotationType: speechAnnotation-orthographicTranscription
annotationManualUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532325’]:
role: annotationManual
documentUnstructured: All languages are ortographical transcribed, see http://www.tekstlab.uio.no/nota/scandiasyn/transcription.html
annotationTool [ComponentId=‘clarin.eu:cr1:c_1355150532326’]:
targetResourceNameURI: Transcriber (http://trans.sourceforge.net/en/presentation.php )
ELAN (https://tla.mpi.nl/tools/tla-tools/elan/)
annotationTool [ComponentId=‘clarin.eu:cr1:c_1355150532326’]:
targetResourceNameURI: For Norwegian and Övdalian: https://www.hf.uio.no/iln/english/about/organization/text-laboratory/services/oslo-transliterator/index.html
classificationInfo [ComponentId=‘clarin.eu:cr1:c_1403588862809’]:
genreInfo [ComponentId=‘clarin.eu:cr1:c_1407745711877’]:
genreType: speechGenre
genre: informal
unstandardisedGenre: conversations
classificationInfo [ComponentId=‘clarin.eu:cr1:c_1403588862809’]:
genreInfo [ComponentId=‘clarin.eu:cr1:c_1407745711877’]:
genreType: speechGenre
genre: semi formal
unstandardisedGenre: interviews
timeCoverageInfo [ComponentId=‘clarin.eu:cr1:c_1447674760358’]:
timeCoverage: 1998 - 2015
geographicCoverageInfo [ComponentId=‘clarin.eu:cr1:c_1447674760357’]:
geographicCoverage: Norway, Sweden, Denmark, the Faroe Islands, Iceland and Älvdalen from 183 places
recordingInfo [ComponentId=‘clarin.eu:cr1:c_1426673949970’]:
recordingDeviceType: tapeVHS
recordingDeviceType: tapeVHS
recordingDeviceType: other
recordingEnvironment: office
recordingEnvironment: closedPublicPlace
recordingEnvironment: conferenceRoom
recordingEnvironment: lectureRoom
recordingEnvironment: other
captureInfo [ComponentId=‘clarin.eu:cr1:c_1407745712025’]:
capturingDeviceType: closeTalkMicrophone
capturingDeviceType: camera