CMDI 1.1 Metadata
Header
MdCreator: Kristin Hagen
MdCreationDate: 2024-01-04
MdProfile: clarin.eu:cr1:p_1422885449331
MdCollectionDisplayName: Clarino - Textlab
Resources
ResourceProxyList:
ResourceProxy [id=‘obt’]:
ResourceType [mimetype=‘’]: LandingPage
ResourceRef: http://www.tekstlab.uio.no/obt-ny/
ResourceProxy [id=‘cg’]:
ResourceType [mimetype=‘’]: Resource
ResourceRef: https://github.com/noklesta/The-Oslo-Bergen-Tagger/tree/master/cg
ResourceProxy [id=‘multi’]:
ResourceType [mimetype=‘’]: Resource
ResourceRef: https://github.com/noklesta/The-Oslo-Bergen-Tagger
JournalFileProxyList:
ResourceRelationList:
ResourceRelation:
RelationType: partOf
Res1 [ref=‘obt’]:
Res2 [ref=‘cg’]:
ResourceRelation:
RelationType: partOf
Res1 [ref=‘obt’]:
Res2 [ref=‘multi’]:
IsPartOfList:
IsPartOf:
Components
toolProfile:
resourceCommonInfo [ComponentId=‘clarin.eu:cr1:c_1396012485126’] [ref=‘obt’]:
resourceType [ref=‘obt’]: toolService
identificationInfo [ComponentId=‘clarin.eu:cr1:c_1396012485125’] [ref=‘obt’]:
resourceName [ref=‘obt’] [xml:lang=‘en’]: The Oslo-Bergen Tagger
resourceName [ref=‘obt’] [xml:lang=‘no’]: Oslo-Bergen-taggeren
description [ref=‘obt’] [xml:lang=‘en’]: The Oslo-Bergen tagger is a robust morphological tagger developed at the University of Oslo and at Uni Computing in Bergen over several years. The tagger consists of three main modules: a preprocessor with multitagger and compound analyser, a grammar module for morphological disambiguation (Constraint Grammar) and a statistical module that removes the last of the remaining morphological ambiguity (only for Bokmål). The Constraint Grammar module uses a compiler developed at the University of Southern Denmark in Odense. The multitagger uses the lexicon Norsk ordbank.
resourceShortName [ref=‘obt’]: obt
url [ref=‘obt’]: https://www.tekstlab.uio.no/obt-ny/english/index.html
PID [ref=‘obt’]: http://hdl.handle.net/11538/0000-0005-E7C6-7
distributionInfo [ComponentId=‘clarin.eu:cr1:c_1396012485124’] [ref=‘obt’]:
licenceInfo [ComponentId=‘clarin.eu:cr1:c_1396012485158’] [ref=‘obt’]:
userCategory: Public
distributionAccessMedium: downloadable
downloadLocation: https://github.com/noklesta/The-Oslo-Bergen-Tagger
licence [ComponentId=‘clarin.eu:cr1:c_1447674760330’]:
licenceFamily: GNU
licenceName: General Public License (GPL)
licenceURL: http://www.gnu.org/licenses/gpl.html
conditionsOfUse: BY
conditionsOfUse: SA
licensor:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
distributionRightsHolder:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: University of Oslo
organizationName [xml:lang=‘no’]: Universitetet i Oslo
organizationShortName [xml:lang=‘no’]: UiO
organizationShortName [xml:lang=‘en’]: UoO
departmentName [xml:lang=‘en’]: Department of Linguistics and Scandinavian Studies
departmentName [xml:lang=‘no’]: Institutt for lingvistiske og nordiske studier (ILN)
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/english/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
iprHolder:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’] [ref=‘cg’]:
actorType [ref=‘cg’]: organization
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’] [ref=‘obt’]:
actorType [ref=‘cg’]: person
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: Uni Research AS
departmentName [xml:lang=‘en’]: Uni Research Computing
contact [ref=‘obt’]:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’] [ref=‘obt’]:
actorType: person
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’] [ref=‘obt’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname [xml:lang=‘en’]: Meurer
givenName [xml:lang=‘en’]: Paul
sex: male
position: Senior researcher
affiliation:
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName [xml:lang=‘en’]: Uni Research AS
departmentName [xml:lang=‘en’]: Uni Research Computing
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: paul.meurer@uni.no
metadataInfo [ComponentId=‘clarin.eu:cr1:c_1407745711922’] [ref=‘obt’]:
metadataCreationDate: 2015-03-16
metadataLastDateUpdated: 2024-01-05
revision: Updated Clarino+ version
metadataCreator [ref=‘obt’]:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: person
personInfo [ComponentId=‘clarin.eu:cr1:c_1396012485192’]:
surname: Hagen
givenName: Kristin
organizationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711883’]:
organizationName: The Text Laboratory
organizationShortName: Textlab
departmentName: Department of Linguistics and Scandinavian Studies, University of Oslo
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: kristin.hagen@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
versionInfo [ComponentId=‘clarin.eu:cr1:c_1430905751648’] [ref=‘obt’]:
version [ref=‘obt’]: Clarino + version
revision [ref=‘obt’]: A new version of the multitagger was created in Python (2018-2022). Here, among other things, the multi-word expressions from the original multitagger are gone so that each word gets its own reading.
Both the lexicon and the CG rules are revised and modernized.
lastDateUpdated: 2024-01-04
validationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711923’] [ref=‘cg’]:
validated: true
validationModeDetails [ref=‘cg’]: Bokmål: The evaluation of the morphological constraint grammar modul showed a success rate (recall) of 99% and a precision of 96%. This gives an f-measure of 97.5% (if recall and precision are weighted equally).

The tagger was tested on a 30 000 words long evaluation corpus with texts from newspapers, magazines, journals, government reports and novels.

Including the statistical module to perform complete disambiguation of the evaluation corpus yielded a tagger accuracy of 96.5%. This number includes both fully disambiguation of morphology and lemma.

Nynorsk: Evaluation was only made for the original CG1-module of the Oslo-Bergen tagger. This module had a success rate (recall) of 98.7% with 93.6% precision. This gives an f-measure of 96.2%.

The evaluation corpus for Nynorsk also had about 30 000 words taken from newspapers, magazines, journals, government reports and novels.
validationReportUnstructured [ComponentId=‘clarin.eu:cr1:c_1353678848789’]:
role [ref=‘obt’]: validationReport
documentUnstructured: See in publications:
http://www.tekstlab.uio.no/obt-ny/english/publications.html
resourceDocumentationInfo [ComponentId=‘clarin.eu:cr1:c_1355150532301’] [ref=‘obt’]:
documentationUnstructured [ComponentId=‘clarin.eu:cr1:c_1355150532302’]:
role: documentation
documentUnstructured: http://www.tekstlab.uio.no/obt-ny/english/index.html
resourceCreationInfo [ComponentId=‘clarin.eu:cr1:c_1407745711921’] [ref=‘obt’]:
creationStartDate: 1996
creationEndDate: 2024
resourceCreator [ref=‘obt’]:
actorInfo [ComponentId=‘clarin.eu:cr1:c_1396012485194’]:
actorType: organization
communicationInfo [ComponentId=‘clarin.eu:cr1:c_1352813745460’]:
email: tekstlab-post@iln.uio.no
url: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/
address: Box 1102 Blindern
zipCode: 0317
city: OSLO
country: Norway
fundingProject [ref=‘obt’]:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName [xml:lang=‘en’]: the Tagger Project (Taggerprosjektet 1996 - 1998)
fundingType: nationalFunds
funder: The Research Council of Norway
fundingCountry: Norway
projectStartDate: 1996-01-01
projectEndDate: 1998-12-31
fundingProject [ref=‘obt’]:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName [xml:lang=‘en’]: Norwegian Newspaper Corpus (2007-2009)
fundingType: nationalFunds
funder: The Research Council of Norway
fundingCountry: Norway
projectStartDate: 2007-01-01
projectEndDate: 2009-12-31
fundingProject [ref=‘multi’]:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName [xml:lang=‘en’]: New version of the multitagger
fundingType: other
funder: The Ministry of Foreign Affairs
fundingCountry: Norway
projectStartDate: 2018
fundingProject [ref=‘obt’]:
projectInfo [ComponentId=‘clarin.eu:cr1:c_1430905751647’]:
projectName: Common Language Resources and Technology Infrastructure Norway +
projectShortName: CLARINO +
projectID: 295700
url: http://clarin.b.uib.no/
fundingType: nationalFunds
funder: the Research Council of Norway
fundingCountry: Norway
projectStartDate: 2020-03-01
projectEndDate: 2023-12-31
toolInfo [ComponentId=‘clarin.eu:cr1:c_1422885449327’]:
description: The tagger consists of three parts:
1) A multitagger (tokenizer, morphological analyzer, and compound analyzer)
2) A Constraint Grammar (CG) tagger
a) VISL CG-3 compiler from University of Southern Denmark
b) Constraint grammar rules
3) OBT+stat - A statistical (HunPoS) tagger removing ambiguity not resolved in the CG step (only for bokmål)
inputInfo [ComponentId=‘clarin.eu:cr1:c_1360931019804’]:
mediaType: text
resourceType: corpus
modalityType: writtenLanguage
languageName: Norwegian
languageName: Norwegian Bokmål
languageName: Norwegian Nynorsk
languageId: No
languageId: Nb
languageId: Nn
mimeType: txt, xml
characterEncoding: latin1, utf-8
annotationType: lemmatization
annotationType: morphosyntacticAnnotation-posTagging
tagset: http://www.tekstlab.uio.no/obt-ny/english/tagset.html
segmentationLevel: word
segmentationLevel: clause
outputInfo [ComponentId=‘clarin.eu:cr1:c_1360931019824’]:
mediaType: text
resourceType: corpus
modalityType: writtenLanguage
languageName: Norwegian
languageName: Norwegian Bokmål
languageName: Norwegian Nynorsk
languageId: No
languageId: Nb
languageId: Nn
mimeType: txt, xml
characterEncoding: latin1, utf-8
tagset: http://www.tekstlab.uio.no/obt-ny/english/tagset.html
segmentationLevel: clause
segmentationLevel: word
toolServiceOperationInfo [ComponentId=‘clarin.eu:cr1:c_1360931019835’]:
operatingSystem: linux
operatingSystem: mac-OS
runningEnvironmentInfo [ComponentId=‘clarin.eu:cr1:c_1360931019826’]:
requiredSoftware [ComponentId=‘clarin.eu:cr1:c_1360931019827’]:
targetResourceNameURI: VISL CG3: http://beta.visl.sdu.dk/cg3/chunked/installation.html.
requiredSoftware [ComponentId=‘clarin.eu:cr1:c_1360931019827’]:
targetResourceNameURI: HunPos: https://code.google.com/p/hunpos/