BachelorDragon.png

The bachelor programme Celtic Languages and Culture at Utrecht University is under threat.


arraysize: 0


Wikibase

CODECS is exploring the use of a Wikibase Cloud instance as a supplemental Linked Open Data service to be queried by both CODECS and other third parties in Celtic studies that might want to contribute to it or benefit from having access. It is also exploring its suitability as a means of taking older databases and datasets further into the 21st century and opening them up to the wider world. The URL is https://codecs.wikibase.cloud.
Project category

What are Wikibase, Wikidata and Wikibase Cloud?

Wikibase (not to be confused with the company Wikibase Solutions) refers to a family of MediaWiki extensions that can be installed to turn a wiki into a repository of structured data. The best known Wikibase instance is Wikidata, which is and will remain the focal point of development for the software. That said, the software is open-source and free to use and wider adoption by third parties, including cultural heritage institutions, has been steadily growing. Interesting examples include Europeana’s prize-winning EAGLE (https://wiki.eagle-network.eu), which covers inscriptions from the Greek and Roman world, and the National Library of Wales' Semantic Name Authority Repository Cymru (https://snarc-llgc.wikibase.cloud), which as the name suggests, deals with name authority records relating to Wales (video).

In addition to hosting an instance of Wikibase yourself, you can also spare yourself some trouble and set up an instance on Wikibase Cloud. This is a cloud service which the Wikimedia Foundation has been so kind to make available free of charge. While I did set up a basic wiki running Wikibase at first, I eventually signed up for a Cloud instance to get the most out of it.

Why use Wikibase when you already have Semantic MediaWiki? Why Wikibase Cloud?

(1) OpenRefine

The option of hooking up OpenRefine to a Wikibase instance allows you to import, work with, transform and clean up data in bulk. OpenRefine is not an extension to MediaWiki, or anything that lives on the same server, but a Python-based program that users can install on their local machines and run in the browser. It is, first and foremost, a spreadsheet-based data editor that can import from and export to different file formats, but it can also be set up to talk to APIs and attempt to match up your data to external resources, a process which is called reconciliation. To dampen expectations somewhat, the latest OpenRefine-Wikibase reconciliation service does appear to suffer from a few bugs, which limits its usability for now.

By comparison, on a regular wiki that uses templates to hold structured data, there are ways of editing multiple wiki pages using an extension like FlexForm (and to a limited extent, Page Forms). And JSON as an alternative data format in the wiki is beginning to gain more ground these days. However, some of the useful features offered by OpenRefine, such as its ability to work with many records, create facets and apply reconciliation, are not currently available.

It must be said we are not comparing like to like. OpenRefine can do things precisely because it uses your computer's resources, away from the wiki, rather than make direct changes on a remote server. It would be possible to prepare data in OpenRefine and convert them to a template-based data structure in the wiki with something like the Data Transfer extension, more or less as you would with QuickStatements. But to take that route would not be as smooth, to put it mildly.

There are many introductions to OpenRefine out there, but for a recent, use case driven one, with illustrations, this blog post may be an instructive read.

(2) Stimulate engagement

While CODECS is already open to input from other researchers, a Wikibase instance can open up new ways of collaborating, whether directly, by inviting others to share and work on datasets, or indirectly, by making use of external resources.

(3) Complementary roles

A Wikibase instance can be used to relieve and complement the main wiki

  • as an additional repository, ready to be queried by the main wiki if necessary.
  • as a repository to be queried by third parties.
  • as a means of prototyping data structures without having to 'pollute' the main wiki.

(4) Specific advantages of the Wikibase Cloud service

Initially, I began hosting a Wikibase installation, but then turned my attention to the benefits of the cloud service:

  • No need to maintain it
  • Although both Wikibase and SMW support SPARQL, Wikibase Cloud offers it out of the box.
  • Although both Wikibase and SMW support Elasticsearch, Wikibase Cloud offers it out of the box.


What Wikibase (Cloud) is not

  • Wikibase is primarily a data repository. While access to data is provided in a number of ways, through its wiki structure, a SPARQL end point and an API, none of these features are intended to provide an interface that is geared towards the non-technical end user. They are intended to facilitate the creation of such interfaces.
  • It does not offer native support for text formatting in string-type data, whether in HTML or wikitext. Which is to say, tags such as HTML tags for italics, are not prohibited, but there is no native handler for rendering them. It is not impossible, however, that a third party running an external query on the repository can still choose to render them by decoding HTML entities locally.

Aims in the short term

Forthcoming

See also