CMDI 1.1. Metadata
Header
MdCreator: Kristin Hagen
MdCreationDate: 2015-03-04
MdSelfLink:
MdProfile: clarin.eu:cr1:p_1407745711925
MdCollectionDisplayName: Clarino - Textlab
Resources
ResourceProxyList:
JournalFileProxyList:
ResourceRelationList:
IsPartOfList:
Components
corpusProfile:
resourceCommonInfo [ComponentId=‘clarin.eu:cr1:c_1396012485126’]:
resourceType: corpus
identificationInfo [ComponentId=‘clarin.eu:cr1:c_1396012485125’]:
resourceName [xml:lang=‘nb’]: Amerika-norsk talespråkskorpus
resourceName [xml:lang=‘en’]: Corpus of American Norwegian Speech
description [xml:lang=‘en’]: Almost two hundred years ago, the first Norwegians took to America to start a new life. Since then nearly 900,000 Norwegians have followed. At present, most Norwegian Americans only speak American, but there still are some who learned Norwegian at home and who continue to speak Norwegian as adults. These Americans are most often well up in their eighties and nineties.

Corpus of American Norwegian Speech is a small speech corpus with conversations and interviews from some of these Norwegian Americans. The transcriptions are both phonetic and orthographic and are linked to audio and video.

Corpus of American Norwegian Speech will be extended with both new and older recordings and transcriptions.
resourceShortName: CANS
url: http://www.tekstlab.uio.no/norskiamerika/english/index.html
PID: http://hdl.handle.net/11538/0000-0005-E7C9-4
distributionInfo [ComponentId=‘clarin.eu:cr1:c_1396012485124’]:
licenceInfo [ComponentId=‘clarin.eu:cr1:c_1396012485158’]:
userCategory: Academic
distributionAccessMedium: accessibleThroughInterface
executionLocation: http://www.tekstlab.uio.no/norskiamerika/english/index.html
licence [ComponentId=‘clarin.eu:cr1:c_1447674760330’]:
licenceFamily: CLARIN
licenceName: CLARIN_ACA-NC-LOC-PRIV-ND-*
licenceURL: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaAca?ID=1&AFFIL=EDU&BY=1&NC=1&LOC=1&PRIV=1&NORED=1&ND=1
conditionsOfUse: *
conditionsOfUse: BY
conditionsOfUse: ID
conditionsOfUse: LOC
conditionsOfUse: NC
conditionsOfUse: ND
conditionsOfUse: NORED
conditionsOfUse: PRIV
nonStandardConditionsOfUse: The corpus has audio and video recordings classified as personal data. In agreement with NSD, the Data Protection Official in Norway, the corpus is accesible only through Glossa, a search and post-processing tool developed by the Text Laboratory. The video and audio excerpts given by the search interface can not be shown in public unless you have an agreement with the Text Laboratory.
licensor:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
distributionRightsHolder:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
contact:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
metadataInfo [ComponentId=‘clarin.eu:cr1:c_1407745711922’]:
metadataCreationDate: 2015-03-04
metadataLastDateUpdated: 2017-09-11
metadataCreator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Hagen
givenName: Kristin
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: kristin.hagen@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
versionInfo [ComponentId=‘clarin.eu:cr1:c_1430905751648’]:
version: version 1
updateFrequency: The corpus will be extended with both new and older recordings and transcriptions.
validationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711923’]:
validated: true
validationType: content
validationMode: manual
validationModeDetails: The transcriptions are proof read against the audio files.
validationExtent: full
validator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
resourceDocumentationInfo [ComponentId=‘clarin.eu:cr1:c_1355150532301’]:
documentationUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532302’]:
role: documentation
documentUnstructured: http://www.tekstlab.uio.no/norskiamerika/english/index.html
resourceCreationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711921’]:
creationStartDate: 2010-01-01
resourceCreator:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Text Lab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName [xml:lang=‘en’]: Norwegian in America
projectShortName [xml:lang=‘en’]: NorAmDiaSyn
fundingType: nationalFunds
funder: The Research Council of Norway
fundingCountry: Norway
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName [xml:lang=‘en’]: Norwegian in America
projectShortName [xml:lang=‘en’]: NorAmDiaSyn
fundingType: other
funder: Department of Linguistics and Scandinavian Studies, University of Tromsø (through Merete Anderssen and Marit Westergaard)
fundingCountry: Norway
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName [xml:lang=‘en’]: Norwegian in America
projectShortName: NorAmDiaSyn
fundingType: ownFunds
funder: The Text Laboratory
fundingCountry: Norway
fundingProject:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName [xml:lang=‘en’]: Language Infrastructure made Accessible
projectShortName [xml:lang=‘en’]: LIA
url: http://www.hf.uio.no/iln/english/research/projects/language-infrastructure-made-accessible/index.html
fundingType: nationalFunds
funder: The Research Council of Norway
fundingCountry: Norway
projectStartDate: 2014-04-01
projectEndDate: 2019-04-01
corpusInfo [ComponentId=‘clarin.eu:cr1:c_1407745711878’]:
corpusType: Multilingual Corpus
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:
mediaType: text
corpusTextInfo [ComponentId=‘clarin.eu:cr1:c_1396012485188’]:
textFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477072’]:
mimeType: .txt
sizePerTextFormat [ComponentId=‘clarin.eu:cr1:c_1447674760342’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 197 889
sizeUnit: words
characterEncodingInfo [ComponentId=‘clarin.eu:cr1:c_1447674760355’]:
characterEncoding: utf-8
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:
mediaType: video
corpusVideoInfo [ComponentId=‘clarin.eu:cr1:c_1407745711880’]:
videoContentInfo [ComponentId=‘clarin.eu:cr1:c_1360931019779’]:
typeOfVideoContent: Interviews and conversations between American Norwegians
textIncludedInVideo: none
settingInfo [ComponentId=‘clarin.eu:cr1:c_1360230992162’]:
naturality: spontaneous
conversationalType: multilogue
audience: some
interactivity: overlapping
videoFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477073’]:
mimeType: video in streaming format mp4 available through Glossa
frameRate: 25
resolutionInfo [ComponentId=‘clarin.eu:cr1:c_1360931019784’]:
sizeWidth: 400
sizeHeight: 300
resolutionStandard: HD.720
compressionInfo [ComponentId=‘clarin.eu:cr1:c_1360230992165’]:
compression: true
compressionName: mpg
corpusPartInfo [ComponentId=‘clarin.eu:cr1:c_1407745711885’]:
mediaType: audio
corpusAudioInfo [ComponentId=‘clarin.eu:cr1:c_1404130561236’]:
audioSizeInfo [ComponentId=‘clarin.eu:cr1:c_1360230992160’]:
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: approx 10 GB
sizeUnit: gb
settingInfo [ComponentId=‘clarin.eu:cr1:c_1360230992162’]:
naturality: spontaneous
conversationalType: dialogue
audience: some
interactivity: overlapping
interaction: Two scenarios: one semiformal interview: research assistant/researcher and informant. One free conversation between two informants.
audioFormatInfo [ComponentId=‘clarin.eu:cr1:c_1427452477070’]:
mimeType: wav and mp4
signalEncoding: linearPCM
samplingRate: 32
quantization: 64
numberOfTracks: 1
recordingQuality: medium
compressionInfo [ComponentId=‘clarin.eu:cr1:c_1360230992165’]:
compression: true
compressionName: mpg
corpusPartGeneralInfo [ComponentId=‘clarin.eu:cr1:c_1407745711882’]:
personSourceSetInfo [ComponentId=‘clarin.eu:cr1:c_1360931019775’]:
numberOfPersons: 50
ageOfPersons: elderly
ageRangeStart: 67
ageRangeEnd: 98
sexOfPersons: mixed
originOfPersons: native
dialectAccentOfPersons: American-Norwegian
geographicDistributionOfPersons: USA and Canada
lingualityInfo [ComponentId=‘clarin.eu:cr1:c_1355150532313’]:
lingualityType: monolingual
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: No
languageName: Norwegian
languageVarietyInfo [ComponentId=‘clarin.eu:cr1:c_1428388179422’]:
languageVarietyType: other
languageVarietyName: American Norwegian speech from 22 places in USA and Canada with approx
languageInfo [ComponentId=‘clarin.eu:cr1:c_1428388179423’]:
languageId: NB
languageName: Norwegian Bokmål
modalityInfo [ComponentId=‘clarin.eu:cr1:c_1447674760356’]:
modalityType: spokenLanguage
sizeInfo [ComponentId=‘clarin.eu:cr1:c_1353678848785’]:
size: 197 889
sizeUnit: words
annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:
annotationType: morphosyntacticAnnotation-posTagging
annotatedElements: other
segmentationLevel: word
tagset: POS tagset created for the statistical NoTa-tagger - based on the tagset of the Oslo Bergen Tagger.
tagsetLanguageId: nb
tagsetLanguageName: Norwegian Bokmål
theoreticModel: TreeTagger
annotationMode: automatic
annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:
role: annotationManual
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: manual
title [xml:lang=‘nb’]: NoTa-taggeren: TAGGEVEILEDNING
author: Åshild Søfteland
year: 2007
url: http://www.tekstlab.uio.no/nota/oslo/Taggeveiledning2.pdf
documentLanguageName: Norwegian bokmål
documentLanguageId: nb
annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:
role: annotationManual
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: article
title [xml:lang=‘en’]: Tagging a Norwegian Speech Corpus
author: Anders Nøklestad and Åshild Søfteland
editor: Joakim Nivre,Heiki-Jaan Kaalep,Kadri Muischnek, Mare Koit
year: 2007
bookTitle: Proceedings of the 16th Nordic Conference of Computational Linguistics NODALIDA-2007
pages: 245–248
conference: Nodalida 2007
documentLanguageName: English
documentLanguageId: en
annotationManualStructured [ComponentId=‘clarin.eu:cr1:c_1361876010647’]:
role: annotationManual
documentInfo [ComponentId=‘clarin.eu:cr1:c_1353678848788’]:
documentType: article
title [xml:lang=‘nb’]: Manuell morfologisk
tagging av NoTa-materialet med støtte fra en statistisk tagger.
author: Åshild Søfteland og Anders Nøklestad
editor: Janne Bondi Johannessen og Kristin Hagen
year: 2008
publisher: Novus forlag
bookTitle: Språk i Oslo. Ny forskning omkring talespråk
pages: 226–234.
ISBN: 978-82-7099-471-7
documentLanguageName: Norwegian
documentLanguageId: nb
annotationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711924’]:
annotationType: speechAnnotation-phoneticTranscription
segmentationLevel: word
annotationMode: manual
annotationManualUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532325’]:
role: annotationManual
documentUnstructured: http://www.tekstlab.uio.no/norskiamerika/english/index.html
annotationTool [ComponentId=‘clarin.eu:cr1:c_1355150532326’]:
targetResourceNameURI: Transcriber (http://trans.sourceforge.net/en/presentation.php )
classificationInfo [ComponentId=‘clarin.eu:cr1:c_1403588862809’]:
timeCoverageInfo [ComponentId=‘clarin.eu:cr1:c_1447674760358’]:
timeCoverage: Interviews and conversations from 2010 - 2015
geographicCoverageInfo [ComponentId=‘clarin.eu:cr1:c_1447674760357’]:
geographicCoverage: Informants from 22 places in USA and Canada.
recordingInfo [ComponentId=‘clarin.eu:cr1:c_1426673949970’]:
recordingDeviceType: hardDisk
recordingEnvironment: office
recordingEnvironment: closedPublicPlace
recordingEnvironment: conferenceRoom
recordingEnvironment: lectureRoom
recordingEnvironment: other
recorderActor:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
captureInfo [ComponentId=‘clarin.eu:cr1:c_1407745712025’]:
capturingDeviceType: closeTalkMicrophone
capturingDeviceType: camera
creationInfo [ComponentId=‘clarin.eu:cr1:c_1360230992154’]:
creationMode: manual