SMW tests on wildcard searches
Preliminary
- Property:Has chain title is a property of type Text, so at least the first 40 characters should be searchable
- Special:Version
- Link to Github report: https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/5480
- Working assumptions :
- This site has enabled full-text search which means that the appropriate indexing, notably tokenisation, has been done and strings, or substrings, of 3+ characters are evaluated in a MATCH ... AGAINST condition, not LIKE (the short summary).
- If a LIKE condition is wanted instead, there is the alternative
like:/not like
: we can use, as demonstrated in the right columns below. - In a LIKE condition, SMW uses standard wildcards: asterisks represent 1 or multiple characters, question marks any single character. There is no special wildcard for 0, 1 or multiple characters, like the percentage symbol. This makes substring matching potentially more difficult.
- In a MATCH condition, the wildcard should be appended. Below we will explore some examples where the wildcard is prefixed regardless and check its effect on the outcome of the query. We are not trying out the combination of tilde + boolean operator + wildcard (e.g.
~-*foo
) because that will result in a fatal error.
Asterisk wildcards on either side
General note: the use of an asterisk to prefix a string may not be supported by MATCH / AGAINST. It is typically used at the end of a token.Same using like
[[Has chain title::~*la*]]
Works with two characters la [two characters, meaning it defaults to LIKE not MATCH. We are using the default of 3 chars.]:
Has chain title | |
---|---|
bérla na filed | bérla na filed |
Breton lais | Breton lais |
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
cartularies | cartularies |
FURTHER RESULTS… |
[[Has chain title::like:*la*]]
Works:
Has chain title | |
---|---|
bérla na filed | bérla na filed |
Breton lais | Breton lais |
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
cartularies | cartularies |
FURTHER RESULTS… |
[[Has chain title::~*law*]]
Does not work with law [Update 2024: after migrating to a new server and updating SMW, this example was found to be working !]
Has chain title | |
---|---|
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
early Irish legal texts | Irish tracts, institutions / early Irish law |
medieval Welsh law | medieval Welsh law |
penitentials | Misc. / canon law and penitentials / penitentials |
[[Has chain title::like:*law*]]
Does work (as opposed to the version using tilde notation):
Has chain title | |
---|---|
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
early Irish legal texts | Irish tracts, institutions / early Irish law |
medieval Welsh law | medieval Welsh law |
penitentials | Misc. / canon law and penitentials / penitentials |
[[Has chain title::~*Mi*]]
Now using the substring Mi. Works. Note: appears to be case-sensitive because it does not match on 'Primitive Irish' or 'missals' 9don't get confused by the highlighting, which is case-insensitive). Compare the following lowercase example :
Has chain title | |
---|---|
acrostics and abecedarii | Misc. / acrostics and abecedarii |
antiphons | Misc. / liturgical and devotional / antiphons |
apocryphal and pseudepigraphical literature | Misc. / religious / apocryphal and pseudepigraphical literature |
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
FURTHER RESULTS… |
[[Has chain title::like:*Mi*]]
Has chain title | |
---|---|
acrostics and abecedarii | Misc. / acrostics and abecedarii |
antiphons | Misc. / liturgical and devotional / antiphons |
apocryphal and pseudepigraphical literature | Misc. / religious / apocryphal and pseudepigraphical literature |
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
FURTHER RESULTS… |
[[Has chain title::~*mi*]]
Has chain title | |
---|---|
interpretationes nominum hebraicorum | Misc. / theology and exegesis / interpretationes nominum hebraicorum |
manuscript miscellanies | Misc. / manuscript miscellanies |
minor Irish prose tales (foscéla) | Irish narrative literature / minor tales (foscéla) |
miracula and mirabilia | Misc. / miracula and mirabilia |
miscellaneous Irish learning and lore | Irish literature and learning / misc. learning and lore |
FURTHER RESULTS… |
[[Has chain title::like:*mi*]]
Has chain title | |
---|---|
interpretationes nominum hebraicorum | Misc. / theology and exegesis / interpretationes nominum hebraicorum |
manuscript miscellanies | Misc. / manuscript miscellanies |
minor Irish prose tales (foscéla) | Irish narrative literature / minor tales (foscéla) |
miracula and mirabilia | Misc. / miracula and mirabilia |
miscellaneous Irish learning and lore | Irish literature and learning / misc. learning and lore |
FURTHER RESULTS… |
[[Has chain title::~*Misc*]]
Now using the substring Misc. Works even if there are zero characters in front of the string:
Has chain title | |
---|---|
acrostics and abecedarii | Misc. / acrostics and abecedarii |
antiphons | Misc. / liturgical and devotional / antiphons |
apocryphal and pseudepigraphical literature | Misc. / religious / apocryphal and pseudepigraphical literature |
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
FURTHER RESULTS… |
[[Has chain title::like:*Misc*]]
Has chain title | |
---|---|
acrostics and abecedarii | Misc. / acrostics and abecedarii |
antiphons | Misc. / liturgical and devotional / antiphons |
apocryphal and pseudepigraphical literature | Misc. / religious / apocryphal and pseudepigraphical literature |
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
FURTHER RESULTS… |
Now lowercase:
[[Has chain title::~*misc*]]
Now using the substring misc (lowercase).
Has chain title | |
---|---|
acrostics and abecedarii | Misc. / acrostics and abecedarii |
antiphons | Misc. / liturgical and devotional / antiphons |
apocryphal and pseudepigraphical literature | Misc. / religious / apocryphal and pseudepigraphical literature |
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
FURTHER RESULTS… |
[[Has chain title::like:*misc*]]
Has chain title | |
---|---|
manuscript miscellanies | Misc. / manuscript miscellanies |
miscellaneous Irish learning and lore | Irish literature and learning / misc. learning and lore |
Now without initial character. Does the wildcard actually work?
[[Has chain title::~*isc*]]
[[Has chain title::like:*isc*]]
Has chain title | |
---|---|
acrostics and abecedarii | Misc. / acrostics and abecedarii |
antiphons | Misc. / liturgical and devotional / antiphons |
apocryphal and pseudepigraphical literature | Misc. / religious / apocryphal and pseudepigraphical literature |
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
FURTHER RESULTS… |
Tentative conclusion
When the tilde notation is used, the use of the asterisk wildcard is non-functional. In other words, it is allowed but simply ignored.
Asterisk wildcard at the front only:
Same using like:
[[Has chain title::~*aw]]
The prior assumption that the asterisk can only be appended does not hold true:
Has chain title | |
---|---|
canon law | Misc. / canon law and penitentials / canon law |
early Irish legal texts | Irish tracts, institutions / early Irish law |
medieval Welsh law | medieval Welsh law |
[[Has chain title::like:*aw]]
Has chain title | |
---|---|
canon law | Misc. / canon law and penitentials / canon law |
early Irish legal texts | Irish tracts, institutions / early Irish law |
medieval Welsh law | medieval Welsh law |
[[Has chain title::~*la]]
Because the default setting of $smwgFulltextSearchMinTokenSize is used (3 characters), it is to be expected that this query does not produce results.
[[Has chain title::like:*la]]
[[Has chain title::~*law]]
Has chain title | |
---|---|
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
early Irish legal texts | Irish tracts, institutions / early Irish law |
medieval Welsh law | medieval Welsh law |
penitentials | Misc. / canon law and penitentials / penitentials |
[[Has chain title::like:*law]]
Has chain title | |
---|---|
canon law | Misc. / canon law and penitentials / canon law |
early Irish legal texts | Irish tracts, institutions / early Irish law |
medieval Welsh law | medieval Welsh law |
[[Has chain title::~law]]
Has chain title | |
---|---|
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
early Irish legal texts | Irish tracts, institutions / early Irish law |
medieval Welsh law | medieval Welsh law |
penitentials | Misc. / canon law and penitentials / penitentials |
[[Has chain title::like:law]]
Asterisk wildcard in medial position
[[Has chain title::~l*w]]
[[Has chain title::like:l*w]]
[[Has chain title::~l?w]]
[[Has chain title::like:*l?w*]]
Has chain title | |
---|---|
canon law | |
canon law and penitentials | |
early Irish legal texts | |
histories | |
medieval Welsh law | |
FURTHER RESULTS… |
Result count: 7
Where has the printout gone?
Asterisk wildcard at the end
Same using like
[[Has chain title::~med*]]
Substring: med. Not just at the beginning of the full string, but also following a word boundary :
Has chain title | |
---|---|
Breton medicine and medical writing | Breton sciences / medicine |
early and medieval Welsh poetry | Welsh poetry / early and medieval |
early Welsh poetry | Welsh poetry / early and medieval / c.600–c.1099 |
Gogynfeirdd poetry | Welsh poetry / early and medieval / c.1100–c.1600 / Gogynfeirdd |
Irish medicine and medical writing | Irish sciences / medicine |
FURTHER RESULTS… |
[[Has chain title::like:med*]]
Has chain title | |
---|---|
medieval Welsh law | medieval Welsh law |
[[Has chain title::~law*]]
Substring: law.
Has chain title | |
---|---|
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
early Irish legal texts | Irish tracts, institutions / early Irish law |
medieval Welsh law | medieval Welsh law |
penitentials | Misc. / canon law and penitentials / penitentials |
[[Has chain title::like:law*]]
[[Has chain title::~Misc*]]
Has chain title | |
---|---|
acrostics and abecedarii | Misc. / acrostics and abecedarii |
antiphons | Misc. / liturgical and devotional / antiphons |
apocryphal and pseudepigraphical literature | Misc. / religious / apocryphal and pseudepigraphical literature |
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
FURTHER RESULTS… |
[[Has chain title::like:Misc*]]
Has chain title | |
---|---|
acrostics and abecedarii | Misc. / acrostics and abecedarii |
antiphons | Misc. / liturgical and devotional / antiphons |
apocryphal and pseudepigraphical literature | Misc. / religious / apocryphal and pseudepigraphical literature |
canon law | Misc. / canon law and penitentials / canon law |
canon law and penitentials | Misc. / canon law and penitentials |
FURTHER RESULTS… |
Case and accent folding
[[Concept:Narrative worlds]] [[Display title of::~*guaire*]]
Display title of"Display title of" is a predefined property that can assign a distinct display title to an entity and is provided by Semantic MediaWiki. | |
---|---|
Cycle of Gúaire Aidne mac Colmáin | Cycle of Gúaire Aidne mac Colmáin |
[[Concept:Narrative worlds]] [[Display title of::like:*guaire*]]
[[Concept:Narrative worlds]] [[Display title of::~guaire]]
Display title of"Display title of" is a predefined property that can assign a distinct display title to an entity and is provided by Semantic MediaWiki. | |
---|---|
Cycle of Gúaire Aidne mac Colmáin | Cycle of Gúaire Aidne mac Colmáin |
[[Concept:Narrative worlds]] [[Display title of::like:guaire]]
Now write two words
Without boolean operator, the space between these words gets interpreted as OR.
[[Selection title::~*Guaire mac*]]
Selection title | |
---|---|
Ahlqvist, Anders, “A rhetorical poem in Longes mac nUislenn”, in Rhetoric and reality in medieval Celtic literature (2014) | Ahlqvist, A., (2014) A rhetorical poem in Longes mac nUislenn… |
Anderson, Peter John, “Ewen MacLachlan”, Aberdeen University Library Bulletin 18 (1918) | Anderson, P. J., (1918) Ewen MacLachlan: librarian to the University and K… |
Anscombe, A., “Dr. MacCarthy’s Lunar computations”, Zeitschrift für celtische Philologie 4 (1903) | Anscombe, A., (1903) Dr. MacCarthy’s Lunar computations… |
d'Arbois de Jubainville, H., “La mort violente de Fergus Mac Lete”, Zeitschrift für celtische Philologie 4 (1903) | d’Arbois de Jubainville, H., (1903) La mort violente de Fergus Mac Lete… |
Mac Carthy, Bartholomew, Annala Uladh, vol. 2 (1893) | Mac Carthy, B., (1893) Annala Uladh: Annals of Ulster, otherwise Annala S… |
FURTHER RESULTS… |
[[Selection title::like:*Guaire mac*]]
Selection title | |
---|---|
Picard, Jean-Michel, “The strange death of Guaire mac Áedáin”, in Sages, saints and storytellers (1989) | Picard, J., (1989) The strange death of Guaire mac Áedáin… |
Phrase matching (double quotes)
FT only. Works fine, except for the highlighting, which fails to recognise Gúaire with the accented vowel.
[[Selection title::~"guaire mac"]]
Selection title | |
---|---|
Picard, Jean-Michel, “The strange death of Guaire mac Áedáin”, in Sages, saints and storytellers (1989) | Picard, J., (1989) The strange death of Guaire mac Áedáin… |
Sayers, William, “Teithi Hen, Gúaire mac Áedáin, Grettir Ásmundarson”, Studia Celtica 41 (2007) | Sayers, W., (2007) Teithi Hen, Gúaire mac Áedáin, Grettir Ásmundarson… |
Boolean operator
FT only:
[[Has chain title::~+med* +early]] [[Class::+]]
Has chain title | |
---|---|
early and medieval Welsh poetry | Welsh poetry / early and medieval |
early Welsh poetry | Welsh poetry / early and medieval / c.600–c.1099 |
Gogynfeirdd poetry | Welsh poetry / early and medieval / c.1100–c.1600 / Gogynfeirdd |
legendary poems from the Book of Taliesin | Welsh poetry / early and medieval / legendary poems from the Book of Taliesin |
medieval Welsh poetry, c.1100-c.1600 | Welsh poetry / early and medieval / c.1100-c.1600 |
Concusions?
Improvements:
- I have expanded and somewhat the documentation over on SMW, which is not to say it is perfect.
- The highlighting feature is not optimised for diacritics