Implementation notes: MW and types
In the specifications of the Reconciliation API, a type is generically described as something that “represents a category of entities”. While services are not required to work with types, they can be efficienty used to improve the discoverability of your data. What then would be the equivalent of a type in the MediaWiki context and how should it be implemented?
Requirements and options
The definition is generic enough to leave room for interpretation and the specificities of implementation on behalf of the providing service. It does not dictate any specific ontological model, although it is evident that some data structures are better suited to LOD than others.
That said, in the course of the guidelines, it also becomes clear that a type may, should or must adhere to a couple of functional guidelines.
- 1a. If provided, a type must be identifiable with the following fields:
id
(required): a unique identifiername
(required): a human-readable name
- 1b. A type may be identifiable with the following field:
broader
(optional): an array of types that represent (single-level) 'broader' categories of entities. Represented in a hierarchy tree, those types would be immediate parents of the type at hand. The documentation does not insist on any specific interpretation of broad vs narrow. If skos:Concept was intended as a hint, the meaning remains flexible.
- 2. Ideally then, types allow for the scope of a query to be broadened.
- 3. An entity may be identifiable with one or multiple appropriate types. A record for an entity must come with a
type
field but it is okay for the array it holds to be empty. - 4. Ideally, it should be possible for the suggest service to suggest types.
- 5. Through
defaultTypes
, it should be possible for the service manifest to suggest a default array of types to start out with. If types are not used at all, a generic name may be used to indicate that all “all entities in the database are instances of this type”.
MediaWiki equivalents
What about MediaWiki? In the ecosystem of MediaWiki and its extensions, there is more than one candidate to serve as an equivalent of type:
- it can be a MediaWiki Category (here capitalised to distinguish it from the everyday term)
- in Semantic MediaWiki, it can be
- a Concept (again capitalised, for the same reason)
- a value from a property of type Page or Text (or Monolingual Text) provided the property to be used was announced to the system. The property is user-defined and there is no generally accepted name though many have settled on "Class", "Has class" or similar variants.
- in Wikibase/Wikidata, it can point to a value of 'instance of' (Property:P31) or perhaps 'subclass of'.
(1) MediaWiki Categories
Categories have been a recognisable and flexible part of MediaWiki’s system of organising wiki pages since very early days.
Wikipedia’s own use of Categories may be described as a mix of different appproaches to classification. Sometimes a Category gathers pages about subjects of the same type (e.g. a person, author, etc.); sometimes, it simply lumps together pages, e.g. people, events and works of interest, based on their common relationship to same general subject area, e.g. "Science"; often it does both. None of this diminishes the fact that it can also be used to gather pages that represent the same type, e.g. a person, an object, etc.
- Pros
- It is a tried and tested system of MediaWiki core, which does not require any extensions to be installed.
- [?] What potentially makes a Category ideally suited is that it supports hierarchical inheritance: pages in a subcategory are automatically inherited by its parent category (don't get confused by the image of parents inheriting from their children!), and so on.
- Cons
There are some potential downsides:
- What they lack is the granularity that comes with semantic data management.
- Categories have been known to update slowly in the database, at least in the past.
(2) Semantic MediaWiki: 'classes' with a dedicated property
@todo
- Pros
- Cons
(3) Semantic MediaWiki: Concepts
A Concept is a collection of pages that is the result of a user-defined semantic query.
- Pros
Because a Concept is defined and stored on a wiki page, it could represent another natural answer to type. It allows for query-based granularity, if that’s what’s needed, and supports caching, which should improve the response time of a query.
- Cons
Unfortunately, what makes the deployment of Concepts as types challenging is that they are more of an afterthought rather than an intrinsic part of the data structure. For the service to suggest relevant Concepts, those Concepts would need to become part of the network of semantic relationships:
- Concepts, unlike Categories, are not usually linked to broader Concepts. Concepts could be harnessed in this way, but it requires a bit of effort and knowhow to set things up judiciously. In order to support 'broader', Concepts comes with an additional property, e.g. "Has broader concept". The name of the property should be added to the configuration.
- Concepts are not usually linked to the entities they cover (the results of the query). For an entity to be identifiable through a
type
field (see above), a Concept must be linked somehow. Again
- ? @todo: set up setting variable
Table
Category | Concept | Class page | |
---|---|---|---|
associate entity with type | works | only with additional effort and rules | works |
A neutral 'type'?
defaultTypes
The service manifest
hierarchies
'types' are potentially organised in a hierarchy.
- Supported by MediaWiki categories
To consider
Understandably, the API does not distinguish between the various mechanisms that a particular service may choose to use when working with types. But the service needs to be able to recognise which mechanism is used in a given context.
- Naming:
- If the id starts with Category:... (canonical name), we can be sure it's a Category.
- If the id starts with Concept:... (canonical), it's a Concept.
- Profile
- In most other cases, the profile itself should be our go-to.
See also
Notes on '7.2 Data Extension Property Proposals (optional)'