Supporting the Reconciliation Service API for SKOS vocabularies

January 22, 2024 | Steffen Rörtgen

Reconciliation is the process of integrating data from sources which do not share common unique identifiers by identifying records which refer to the same entities. This happens mostly by comparing the attributes of the entities. For instance, two entries in a catalogue about persons that share the same date of birth, place of birth, name and death date, will probably be about the same person. Linking these two entries by adding the identifier from another data source is the process of reconciliation. This allows for extension of your data by taking over information from a linked record.

To facilitate this process multiple tools exist with OpenRefine being the most prominent tool. To align and standardize the way of providing data for these tools the Reconciliation Service API is drafted by the Entity Reconciliation Community Group within the World Wide Web Consortium (W3C). The specification defines endpoints that data services can expose so that applications like OpenRefine can handle that data. A number of other services have already implemented the specification, like TEI Publisher or Cocoda, or the Alma Refine plugin for the commercial Library Management System Alma.

Reconciliation and SKOS

Simple Knowledge Organization System (SKOS) is an established standard for modeling controlled vocabularies as Linked Data. Thus, SKOS vocabularies are often targets of reconciliation efforts as you can improve your local data by enrichting strings with identifiers of a controlled vocabulary. So SKOS and the Reconciliation Service API often go hand in hand. However, there has not existed an easy way to set up a reconcilation endpoint for an existing SKOS vocabulary. We decied to change that by developing the new SkoHub component SkoHub-Reconcile.

Andreas Wagner had already built a reconciliation prototype for SKOS vocabularies (see also our Workshop Blog Post. We picked this prototype up, refactored it and moved it into a container based infrastructure. We also added support for v0.2 of the reconciliation spec.

SkoHub Reconcile Publish

To make it easy to upload vocabularies to the reconciliation service a front-end was develped which you can try out at https://reconcile-publish.skohub.io/.

The reconcile-publish upload UI
The reconcile-publish upload UI

Every vocabulary that passes the SkoHub SHACL Shape (see our blog post) should work for uploading to the reconcile service. The only additional requirement is to provide a vann:preferredNamespaceUri. As you can see in the screenshot you also have to provide an account and a language. As for the account you can currently choose whatever you want, just make sure it is unique enough, so your dataset (i.e. your vocabulary) does not get overwritten by someone else. Since a lang parameter has only been available since the current draft version of the reconciliation specification and not yet implemented in SkoHub Reconcile, the current version of the SkoHub Reconcile service requires you to specify a language you want to use for reconciliation. We will improve this in the future along with the development of the specification.

Example: Usage in OpenRefine

Let’s see how we can use the service with OpenRefine.

First, we upload the vocabulary. We will use a classification of subject groups.

The filled in upload form with "test" as account name, "de" as language  and "systematik.ttl" as file to be uploaded
The filled in upload form with "test" as account name, "de" as language and "systematik.ttl" as file to be uploaded

After a successful upload of the turtle file, we are presented with a URI that leads to the “Service Manifest” of our reconciliation service.

The URL of the Service Manifest being returned by the upload UI
The URL of the Service Manifest being returned by the upload UI

If we follow the URL https://reconcile.skohub.io/reconcile?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme we see some data that services will use for reconciliation against our vocabulary:

{
    "versions": [
        "0.2",
        "0.3.0-alpha"
    ],
    "name": "SkoHub reconciliation service for account 'test', dataset 'https://w3id.org/kim/hochschulfaechersystematik/scheme'",
    "identifierSpace": "https://w3id.org/kim/hochschulfaechersystematik/",
    "schemaSpace": "http://www.w3.org/2004/02/skos/core#",
    "defaultTypes": [
        {
            "id": "ConceptScheme",
            "name": "ConceptScheme"
        },
        {
            "id": "Concept",
            "name": "Concept"
        }
    ],
    "view": {
        "url": "{{id}}"
    },
    "preview": {
        "url": "https://reconcile.skohub.io/preview?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&id={{id}}",
        "width": 100,
        "height": 320
    },
    "suggest": {
        "entity": {
            "service_url": "https://reconcile.skohub.io",
            "service_path": "/suggest?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&service=entity",
            "flyout_service_path": "/suggest/flyout?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&id=${id}"
        },
        "property": {
            "service_url": "https://reconcile.skohub.io",
            "service_path": "/suggest?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&service=property",
            "flyout_service_path": "/suggest/flyout?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&id=${id}"
        },
        "type": {
            "service_url": "https://reconcile.skohub.io",
            "service_path": "/suggest?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&service=property",
            "flyout_service_path": "/suggest/flyout&language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&id=${id}"
        }
    }
}

Now that the reconciliation service is set up with our data, let’s see how we can use it in OpenRefine.

For demo purposes we use a small vocabulary of a few discipline names:

An OpenRefine tabe with the values "mathe", "bibliothek", "forst"
An OpenRefine tabe with the values "mathe", "bibliothek", "forst"

By clicking on the dropdown button of the column we want to reconcile, we choose “Reconcile” -> “Start reconciling…“.

OpenRefine dropdown menu to start a reconciliation process
OpenRefine dropdown menu to start a reconciliation process

After clicking “Add standard service”, we can enter the url we were provided with by the upload service:

Adding the Service Manifest URL in OpenRefine
Adding the Service Manifest URL in OpenRefine

Then we just have to start the reconciliation by clicking “Start reconciling…” and our reconciliation service will be queried with the terms in our OpenRefine project. We are then presented with the results:

OpenRefine UI with lists of reconciliation candidates
OpenRefine UI with lists of reconciliation candidates

This already looks good! Now we can choose matches by clicking the checkmark or get additional information by hovering over the proposed entry from the reconcile service.

Interative OpenRefine pop-up to define a match
Interative OpenRefine pop-up to define a match

If we want we can also search through our vocabulary by clicking “Search for match”:

Searching for a vocabulary term in OpenRefine
Searching for a vocabulary term in OpenRefine

After selecting the appropritate matches we have successfully reconciled our data:

Matched values in OpenRefine column
Matched values in OpenRefine column

Further reads

Christel Annemieke Romein, Andreas Wagner and Joris J. van Zundert published a tutorial about building and deploying a classification schema using open standards and technology. In this tutorial they make use of SKOS, SkoHub Vocabs and SkoHub Reconcile. We recommend having a look to see the use of SkoHub services in action.

Next steps

The services are currently in an alpha phase and ready for testing. You can test the service under https://reconcile-publish.skohub.io/.

Feedback is very much appreciated: via email (skohub@hbz-nrw.de), as an issue or – primarily for the German-speaking users – in the newly set up discourse forum metadaten.community.

Our next step will be integrating the above mentioned lang parameter to be able to serve all languages of a vocabulary without the need to specify it beforehand.

Repositories


A blog for SkoHub. This blog is maintained by the SkoHub Community.