SMW tests on wildcard searches

Preliminary

  • Property:Has chain title is a property of type Text, so at least the first 40 characters should be searchable
  • Special:Version
  • Link to Github report: https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/5480
  • Working assumptions :
    • This site has enabled full-text search which means that the appropriate indexing, notably tokenisation, has been done and strings, or substrings, of 3+ characters are evaluated in a MATCH ... AGAINST condition, not LIKE (the short summary).
    • If a LIKE condition is wanted instead, there is the alternative like:/not like: we can use, as demonstrated in the right columns below.
    • In a LIKE condition, SMW uses standard wildcards: asterisks represent 1 or multiple characters, question marks any single character. There is no special wildcard for 0, 1 or multiple characters, like the percentage symbol. This makes substring matching potentially more difficult.
    • In a MATCH condition, the wildcard should be appended. Below we will explore some examples where the wildcard is prefixed regardless and check its effect on the outcome of the query. We are not trying out the combination of tilde + boolean operator + wildcard (e.g. ~-*foo) because that will result in a fatal error.

Asterisk wildcards on either side

General note: the use of an asterisk to prefix a string may not be supported by MATCH / AGAINST. It is typically used at the end of a token.

Same using like


[[Has chain title::~*la*]]

Works with two characters la [two characters, meaning it defaults to LIKE not MATCH. We are using the default of 3 chars.]:

 Has chain title
bérla na filedbérla na filed
Breton laisBreton lais
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials
cartulariescartularies

FURTHER RESULTS…
Result count: 30
[[Has chain title::like:*la*]]

Works:

 Has chain title
bérla na filedbérla na filed
Breton laisBreton lais
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials
cartulariescartularies

FURTHER RESULTS…
Results: 30


[[Has chain title::~*law*]]

Does not work with law [Update 2024: after migrating to a new server and updating SMW, this example was found to be working !]

 Has chain title
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials
early Irish legal textsIrish tracts, institutions / early Irish law
medieval Welsh lawmedieval Welsh law
penitentialsMisc. / canon law and penitentials / penitentials
Result count: 5
[[Has chain title::like:*law*]]

Does work (as opposed to the version using tilde notation):

 Has chain title
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials
early Irish legal textsIrish tracts, institutions / early Irish law
medieval Welsh lawmedieval Welsh law
penitentialsMisc. / canon law and penitentials / penitentials
Results: 5


[[Has chain title::~*Mi*]]

Now using the substring Mi. Works. Note: appears to be case-sensitive because it does not match on 'Primitive Irish' or 'missals' 9don't get confused by the highlighting, which is case-insensitive). Compare the following lowercase example :

 Has chain title
acrostics and abecedariiMisc. / acrostics and abecedarii
antiphonsMisc. / liturgical and devotional / antiphons
apocryphal and pseudepigraphical literatureMisc. / religious / apocryphal and pseudepigraphical literature
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials

FURTHER RESULTS…
Result count: 59
[[Has chain title::like:*Mi*]]
 Has chain title
acrostics and abecedariiMisc. / acrostics and abecedarii
antiphonsMisc. / liturgical and devotional / antiphons
apocryphal and pseudepigraphical literatureMisc. / religious / apocryphal and pseudepigraphical literature
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials

FURTHER RESULTS…
Result count: 59


[[Has chain title::~*mi*]]
 Has chain title
interpretationes nominum hebraicorumMisc. / theology and exegesis / interpretationes nominum hebraicorum
manuscript miscellaniesMisc. / manuscript miscellanies
minor Irish prose tales (foscéla)Irish narrative literature / minor tales (foscéla)
miracula and mirabiliaMisc. / miracula and mirabilia
miscellaneous Irish learning and loreIrish literature and learning / misc. learning and lore

FURTHER RESULTS…
Result count: 12
[[Has chain title::like:*mi*]]
 Has chain title
interpretationes nominum hebraicorumMisc. / theology and exegesis / interpretationes nominum hebraicorum
manuscript miscellaniesMisc. / manuscript miscellanies
minor Irish prose tales (foscéla)Irish narrative literature / minor tales (foscéla)
miracula and mirabiliaMisc. / miracula and mirabilia
miscellaneous Irish learning and loreIrish literature and learning / misc. learning and lore

FURTHER RESULTS…
Result count: 12


[[Has chain title::~*Misc*]]

Now using the substring Misc. Works even if there are zero characters in front of the string:

 Has chain title
acrostics and abecedariiMisc. / acrostics and abecedarii
antiphonsMisc. / liturgical and devotional / antiphons
apocryphal and pseudepigraphical literatureMisc. / religious / apocryphal and pseudepigraphical literature
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials

FURTHER RESULTS…
Result count: 52
[[Has chain title::like:*Misc*]]
 Has chain title
acrostics and abecedariiMisc. / acrostics and abecedarii
antiphonsMisc. / liturgical and devotional / antiphons
apocryphal and pseudepigraphical literatureMisc. / religious / apocryphal and pseudepigraphical literature
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials

FURTHER RESULTS…
Result count: 51


Now lowercase:

[[Has chain title::~*misc*]]

Now using the substring misc (lowercase).

 Has chain title
acrostics and abecedariiMisc. / acrostics and abecedarii
antiphonsMisc. / liturgical and devotional / antiphons
apocryphal and pseudepigraphical literatureMisc. / religious / apocryphal and pseudepigraphical literature
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials

FURTHER RESULTS…
Result count: 52
[[Has chain title::like:*misc*]]
 Has chain title
manuscript miscellaniesMisc. / manuscript miscellanies
miscellaneous Irish learning and loreIrish literature and learning / misc. learning and lore
Result count: 2


Now without initial character. Does the wildcard actually work?

[[Has chain title::~*isc*]]
Query run but nothing found.
Result count: 0
[[Has chain title::like:*isc*]]
 Has chain title
acrostics and abecedariiMisc. / acrostics and abecedarii
antiphonsMisc. / liturgical and devotional / antiphons
apocryphal and pseudepigraphical literatureMisc. / religious / apocryphal and pseudepigraphical literature
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials

FURTHER RESULTS…
Result count: 52


Tentative conclusion

When the tilde notation is used, the use of the asterisk wildcard is non-functional. In other words, it is allowed but simply ignored.


Asterisk wildcard at the front only:

Same using like:

[[Has chain title::~*aw]]

The prior assumption that the asterisk can only be appended does not hold true:

 Has chain title
canon lawMisc. / canon law and penitentials / canon law
early Irish legal textsIrish tracts, institutions / early Irish law
medieval Welsh lawmedieval Welsh law
Result count: 3
[[Has chain title::like:*aw]]
 Has chain title
canon lawMisc. / canon law and penitentials / canon law
early Irish legal textsIrish tracts, institutions / early Irish law
medieval Welsh lawmedieval Welsh law
Result count: 3


[[Has chain title::~*la]]

Because the default setting of $smwgFulltextSearchMinTokenSize is used (3 characters), it is to be expected that this query does not produce results.

Query run but nothing found.
Result count: 0
[[Has chain title::like:*la]]
Query run but nothing found.
Result count: 0


[[Has chain title::~*law]]
 Has chain title
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials
early Irish legal textsIrish tracts, institutions / early Irish law
medieval Welsh lawmedieval Welsh law
penitentialsMisc. / canon law and penitentials / penitentials
5
[[Has chain title::like:*law]]
 Has chain title
canon lawMisc. / canon law and penitentials / canon law
early Irish legal textsIrish tracts, institutions / early Irish law
medieval Welsh lawmedieval Welsh law
Result count: 3
cf.
[[Has chain title::~law]]
 Has chain title
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials
early Irish legal textsIrish tracts, institutions / early Irish law
medieval Welsh lawmedieval Welsh law
penitentialsMisc. / canon law and penitentials / penitentials
Did not produce results but does now (post-upgrade). Why did it not produce results? Working assumption at the time: law is only three characters long. Full-text search may not index words shorter than four even if a partial search did produce a result above (much less likely: it was recognised as a stopword). Anyway, it does work now.
cf.
[[Has chain title::like:law]]
Query run but nothing found.
Result count: 0 (none expected)



Asterisk wildcard in medial position

[[Has chain title::~l*w]]
Query run but nothing found.
Result count: 0
[[Has chain title::like:l*w]]
Query run but nothing found.
Result count: 0 (none expected because not tokenised)


Cf. use of ?
[[Has chain title::~l?w]]
Query run but nothing found.
Result count: 242
Cf. (not an equivalent)
[[Has chain title::like:*l?w*]]
 Has chain title
canon law 
canon law and penitentials 
early Irish legal texts 
histories 
medieval Welsh law 

FURTHER RESULTS…

Result count: 7

Where has the printout gone?



Asterisk wildcard at the end

Same using like

[[Has chain title::~med*]]

Substring: med. Not just at the beginning of the full string, but also following a word boundary :

 Has chain title
Breton medicine and medical writingBreton sciences / medicine
early and medieval Welsh poetryWelsh poetry / early and medieval
early Welsh poetryWelsh poetry / early and medieval / c.600–c.1099
Gogynfeirdd poetryWelsh poetry / early and medieval / c.1100–c.1600 / Gogynfeirdd
Irish medicine and medical writingIrish sciences / medicine

FURTHER RESULTS…
[[Has chain title::like:med*]]
 Has chain title
medieval Welsh lawmedieval Welsh law


[[Has chain title::~law*]]

Substring: law.

 Has chain title
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials
early Irish legal textsIrish tracts, institutions / early Irish law
medieval Welsh lawmedieval Welsh law
penitentialsMisc. / canon law and penitentials / penitentials
[[Has chain title::like:law*]]
Query run but nothing found.
[[Has chain title::~Misc*]]
 Has chain title
acrostics and abecedariiMisc. / acrostics and abecedarii
antiphonsMisc. / liturgical and devotional / antiphons
apocryphal and pseudepigraphical literatureMisc. / religious / apocryphal and pseudepigraphical literature
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials

FURTHER RESULTS…
Result count: 52
[[Has chain title::like:Misc*]]
 Has chain title
acrostics and abecedariiMisc. / acrostics and abecedarii
antiphonsMisc. / liturgical and devotional / antiphons
apocryphal and pseudepigraphical literatureMisc. / religious / apocryphal and pseudepigraphical literature
canon lawMisc. / canon law and penitentials / canon law
canon law and penitentialsMisc. / canon law and penitentials

FURTHER RESULTS…
Result count: 51



Case and accent folding

[[Concept:Narrative worlds]] [[Display title of::~*guaire*]]
 Display title of"Display title of" is a predefined property that can assign a distinct display title to an entity and is provided by Semantic MediaWiki.
Cycle of Gúaire Aidne mac ColmáinCycle of Gúaire Aidne mac Colmáin
[[Concept:Narrative worlds]] [[Display title of::like:*guaire*]]
Query run but nothing found.
[[Concept:Narrative worlds]] [[Display title of::~guaire]]
 Display title of"Display title of" is a predefined property that can assign a distinct display title to an entity and is provided by Semantic MediaWiki.
Cycle of Gúaire Aidne mac ColmáinCycle of Gúaire Aidne mac Colmáin
[[Concept:Narrative worlds]] [[Display title of::like:guaire]]
Query run but nothing found.
(none expected)



Now write two words

Without boolean operator, the space between these words gets interpreted as OR.

[[Selection title::~*Guaire mac*]]
 Selection title
Ahlqvist, Anders, “A rhetorical poem in Longes mac nUislenn”, in Rhetoric and reality in medieval Celtic literature (2014)Ahlqvist, A., (2014) A rhetorical poem in Longes mac nUislenn…
Anderson, Peter John, “Ewen MacLachlan”, Aberdeen University Library Bulletin 18 (1918)Anderson, P. J., (1918) Ewen MacLachlan: librarian to the University and K…
Anscombe, A., “Dr. MacCarthy’s Lunar computations”, Zeitschrift für celtische Philologie 4 (1903)Anscombe, A., (1903) Dr. MacCarthy’s Lunar computations…
d'Arbois de Jubainville, H., “La mort violente de Fergus Mac Lete”, Zeitschrift für celtische Philologie 4 (1903)d’Arbois de Jubainville, H., (1903) La mort violente de Fergus Mac Lete…
Mac Carthy, Bartholomew, Annala Uladh, vol. 2 (1893)Mac Carthy, B., (1893) Annala Uladh: Annals of Ulster, otherwise Annala S…

FURTHER RESULTS…
[[Selection title::like:*Guaire mac*]]
 Selection title
Picard, Jean-Michel, “The strange death of Guaire mac Áedáin”, in Sages, saints and storytellers (1989)Picard, J., (1989) The strange death of Guaire mac Áedáin…

Phrase matching (double quotes)

FT only. Works fine, except for the highlighting, which fails to recognise Gúaire with the accented vowel.

[[Selection title::~"guaire mac"]]
 Selection title
Picard, Jean-Michel, “The strange death of Guaire mac Áedáin”, in Sages, saints and storytellers (1989)Picard, J., (1989) The strange death of Guaire mac Áedáin…
Sayers, William, “Teithi Hen, Gúaire mac Áedáin, Grettir Ásmundarson”, Studia Celtica 41 (2007)Sayers, W., (2007) Teithi Hen, Gúaire mac Áedáin, Grettir Ásmundarson…

Boolean operator

FT only:

[[Has chain title::~+med* +early]] [[Class::+]]
 Has chain title
early and medieval Welsh poetryWelsh poetry / early and medieval
early Welsh poetryWelsh poetry / early and medieval / c.600–c.1099
Gogynfeirdd poetryWelsh poetry / early and medieval / c.1100–c.1600 / Gogynfeirdd
legendary poems from the Book of TaliesinWelsh poetry / early and medieval / legendary poems from the Book of Taliesin
medieval Welsh poetry, c.1100-c.1600Welsh poetry / early and medieval / c.1100-c.1600

Concusions?

Improvements:

  • I have expanded and somewhat the documentation over on SMW, which is not to say it is perfect.
  • The highlighting feature is not optimised for diacritics