Natively Linked Data Authorities

Lappalainen M (2020). Natively Linked Data Authorities. Tietolinja, 2020(1). Pysyvä osoite: http://urn.fi/URN:NBN:fi-fe2020050324718

Artikkeli julkaistu alun perin joulukuussa 2019 IFLA Metadata Newsletterin vuosikerrassa 5, numerossa 2.

In the past few years the National Library of Finland has been developing its subject authorities towards a more easily reusable linked data format. Old string based monolingual thesauri have been transformed into multilingual machine readable indexing “ontologies”, and a new linked data vocabulary service called Finto has been set up for the dissemination of the new type of vocabularies. A culmination point in this development was reached in the summer of 2019, when the production of subject authorities was permanently moved away from the library databases into a completely linked data based environment. In the process all old bibliographic records with subject indexing in the Finnish Libraries’ Union Catalog were converted to host the new form of concepts.

Over decades libraries have been putting a lot of effort into creating thesauri and other controlled vocabularies to better organise and make accessible vast amounts of data and information. The potential of the vocabularies for users outside the library sector has been identified in the past, but often the library based standards and formats like MARC21 used in production of the said vocabularies, have been inhibiting utilizing them elsewhere. The idea of developing library thesauri into a more widely usable linked data ontologies gained momentum in Finland with the rise of the idea of semantic web in the early 2000s. The FinnONTO research project (2003-2012) laid the foundation to the work that Finto service has been continuing in the National Library from 2013. The main focus has been in using library thesauri of various domains as a basis for a large public sector “knowledge graph”, that would help to connect data and public services Finland. Semantic technologies have been seen as key in developing the findability and accessibility of digital public services, and in this development libraries have been playing a key role in Finland during the past few years.

For library metadata experts the development has meant adapting linked data practices in day-to-day work. In the past, the most widely used library thesaurus YSA was maintained within a library cataloguing system in MARC21 authority records that were browsable through a separate interface. As the monolingual MARC-based YSA was modified into the multilingual General Finnish Ontology YSO (http://finto.fi/yso/en/), the practise of maintaining a library vocabulary took a whole new meaning. As MARC21 does not really support the production of multilingual vocabularies, it was clear from the get-go that no MARC-based system could serve as an editing platform for a multilingual vocabulary. Also as it was wanted that the new ontology could be easily usable outside the library sector, a whole new linked data environment was set up. An open source vocabulary service Finto.fi was developed in-house for the publication of the new ontologies, and APIs were added for easy integration of YSO and other new vocabularies to information systems. For editing available SKOS and OWL editors were taken into use, and new practices like linking of concepts to external international vocabularies were added to the workflow.

The use of Finto and its vocabularies spread fast to different kinds of organisations, from museums and archives to government agencies and media houses. Ironically in library sector the adoption of the new vocabularies was very slow. Once it was clear that multilingual linked vocabularies could bring major benefits to library subject indexing, the first task was to figure out the best way to use the vocabularies in existing MARC-based systems. The original idea was that Finto could be integrated to the systems using its REST API, but soon it was discovered that for some library systems in use in Finland it would be too difficult or impossible. This meant that a MARC21 authority representation of the YSO concepts was needed, that could then be used in subject cataloguing “in the old way”. But, as mentioned, MARC doesn’t really support multilinguality in this sense, and in practise this meant for example that for one trilingual YSO-concept, three MARC21 authority records were needed. Once the MARC representation of the concepts was set, a conversion pipeline was built that would fetch the concepts from Finto, convert them to MARC, and then import them to the library cataloguing system.

A diagram showing an excerpt from an YSO authority record with arrows pointing to MARC 21 authority records in Finnish, English, and Swedish, as well as arrows linking the authority records together.

Picture 1. Original multilingual concept in Finto converted to MARC21 authority records.

Transforming the old thesauri into LOD ontologies was one thing, the other was converting the existing bib-records in library databases to support the new type of concepts. This task presented several challenges again mainly related to MARC21-format. First it was decided that since the authority records were “multilingual”, so should bib-records be. This meant duplicating the 65X-fields for each of the supported languages. The permanent identifiers (URIs) were recorded to subfield 0, after each 65X-field. Finally the language of the term in a given 65X-field was expressed in the source code of the vocabulary in subfield 2. For example in the case of a finnish language term the source code was yso/fin. The end result was not the prettiest MARC-records you’ve ever seen, but it could be said that that the main objectives of the conversion were reached. Still on the to-do list is an automatic enrichment script, that would fill in the additional language version of a chosen concept to a bib-record. This way a cataloguer can do indexing using just one of the three supported YSO languages, and the rest can be added in automatically to make the end result multilingual.

A diagram showing an excerpt from a bibliographic record with arrows linking the language pairs of the different subject headings together.

Picture 2. 650-fields in a bibliographic record after the conversion. For each concept, a separate field is needed for each supported language (just Finnish and Swedish in this case). Also notice the vocabulary source codes with language additions in subfield 2, and the URIs in subfield 0.

Once the pipeline on the ontologies from Finto to library systems was ready and functional, and the legacy data was converted to match the new form of concepts, it was finally time to “pull the plug” on the old library based thesauri, and focus the vocabulary work to the development of linked data ontologies. The network of partners of the National Library has expanded through the wide usage of Finto’s ontologies, and libraries’ vocabulary work is now seen more as a part of advancing the semantic interoperability of public sector at large.

Kirjoittajan yhteystiedot

Mikko Lappalainen, kehittämispäällikkö
Kansalliskirjasto, kirjastoverkkopalvelut
PL 15 (Yliopistonkatu 1), 00014 Helsingin yliopisto
mikko.lappalainen [at] helsinki.fi

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.