Reconciliation is the process of integrating data from sources which do not share common unique identifiers by identifying records which refer to the same entities. This happens mostly by comparing the attributes of the entities. For instance, two entries in a catalogue about persons that share the same date of birth, place of birth, name and death date, will probably be about the same person. Linking these two entries by adding the identifier from another data source is the process of reconciliation. This allows for extension of your data by taking over information from a linked record.
To facilitate this process multiple tools exist with OpenRefine being the most prominent tool. To align and standardize the way of providing data for these tools the Reconciliation Service API is drafted by the Entity Reconciliation Community Group within the World Wide Web Consortium (W3C). The specification defines endpoints that data services can expose so that applications like OpenRefine can handle that data. A number of other services have already implemented the specification, like TEI Publisher or Cocoda, or the Alma Refine plugin for the commercial Library Management System Alma.
Reconciliation and SKOS
Simple Knowledge Organization System (SKOS) is an established standard for modeling controlled vocabularies as Linked Data. Thus, SKOS vocabularies are often targets of reconciliation efforts as you can improve your local data by enrichting strings with identifiers of a controlled vocabulary. So SKOS and the Reconciliation Service API often go hand in hand. However, there has not existed an easy way to set up a reconcilation endpoint for an existing SKOS vocabulary. We decied to change that by developing the new SkoHub component SkoHub-Reconcile.
Andreas Wagner had already built a reconciliation prototype for SKOS vocabularies (see also our Workshop Blog Post. We picked this prototype up, refactored it and moved it into a container based infrastructure. We also added support for v0.2 of the reconciliation spec.
SkoHub Reconcile Publish
To make it easy to upload vocabularies to the reconciliation service a front-end was develped which you can try out at https://reconcile-publish.skohub.io/.
Every vocabulary that passes the SkoHub SHACL Shape (see our blog post) should work for uploading to the reconcile service.
The only additional requirement is to provide a vann:preferredNamespaceUri
.
As you can see in the screenshot you also have to provide an account and a language.
As for the account you can currently choose whatever you want, just make sure it is unique enough, so your dataset (i.e. your vocabulary) does not get overwritten by someone else.
Since a lang
parameter has only been available since the current draft version of the reconciliation specification and not yet implemented in SkoHub Reconcile, the current version of the SkoHub Reconcile service requires you to specify a language you want to use for reconciliation. We will improve this in the future along with the development of the specification.
Example: Usage in OpenRefine
Let’s see how we can use the service with OpenRefine.
First, we upload the vocabulary. We will use a classification of subject groups.
After a successful upload of the turtle file, we are presented with a URI that leads to the “Service Manifest” of our reconciliation service.
If we follow the URL https://reconcile.skohub.io/reconcile?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme we see some data that services will use for reconciliation against our vocabulary:
{
"versions": [
"0.2",
"0.3.0-alpha"
],
"name": "SkoHub reconciliation service for account 'test', dataset 'https://w3id.org/kim/hochschulfaechersystematik/scheme'",
"identifierSpace": "https://w3id.org/kim/hochschulfaechersystematik/",
"schemaSpace": "http://www.w3.org/2004/02/skos/core#",
"defaultTypes": [
{
"id": "ConceptScheme",
"name": "ConceptScheme"
},
{
"id": "Concept",
"name": "Concept"
}
],
"view": {
"url": "{{id}}"
},
"preview": {
"url": "https://reconcile.skohub.io/preview?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&id={{id}}",
"width": 100,
"height": 320
},
"suggest": {
"entity": {
"service_url": "https://reconcile.skohub.io",
"service_path": "/suggest?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&service=entity",
"flyout_service_path": "/suggest/flyout?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&id=${id}"
},
"property": {
"service_url": "https://reconcile.skohub.io",
"service_path": "/suggest?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&service=property",
"flyout_service_path": "/suggest/flyout?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&id=${id}"
},
"type": {
"service_url": "https://reconcile.skohub.io",
"service_path": "/suggest?language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&service=property",
"flyout_service_path": "/suggest/flyout&language=de&account=test&dataset=https://w3id.org/kim/hochschulfaechersystematik/scheme&id=${id}"
}
}
}
Now that the reconciliation service is set up with our data, let’s see how we can use it in OpenRefine.
For demo purposes we use a small vocabulary of a few discipline names:
By clicking on the dropdown button of the column we want to reconcile, we choose “Reconcile” -> “Start reconciling…“.
After clicking “Add standard service”, we can enter the url we were provided with by the upload service:
Then we just have to start the reconciliation by clicking “Start reconciling…” and our reconciliation service will be queried with the terms in our OpenRefine project. We are then presented with the results:
This already looks good! Now we can choose matches by clicking the checkmark or get additional information by hovering over the proposed entry from the reconcile service.
If we want we can also search through our vocabulary by clicking “Search for match”:
After selecting the appropritate matches we have successfully reconciled our data:
Further reads
Christel Annemieke Romein, Andreas Wagner and Joris J. van Zundert published a tutorial about building and deploying a classification schema using open standards and technology. In this tutorial they make use of SKOS, SkoHub Vocabs and SkoHub Reconcile. We recommend having a look to see the use of SkoHub services in action.
Next steps
The services are currently in an alpha
phase and ready for testing.
You can test the service under https://reconcile-publish.skohub.io/.
Feedback is very much appreciated: via email (skohub@hbz-nrw.de), as an issue or – primarily for the German-speaking users – in the newly set up discourse forum metadaten.community.
Our next step will be integrating the above mentioned lang
parameter to be able to serve all languages of a vocabulary without the need to specify it beforehand.