Hi, I got an issue while filtering in a field that has approximate search enabled.
Given: Field with value: VSMA 24# 78280679
Input for filter: VSMA 24# 78280679 => return nothing.
Input for filter: VSMA 24 78280679 => return correctly.
How can I fix this issue?
Hi @son-fast-shard,
It depends on the search engine you are using, in case you are using solr, then please check your app configuration and make sure you have this:
mgmtp.a12.dataservices.search.analysis.fullText.ngrams.enabled=true
These are the search configuration that we’re using:
mgmtp.a12.dataservices.search.analysis.fullText.ngrams.enabled=true
mgmtp.a12.dataservices.search.index.initialization.mode=REGULAR
mgmtp.a12.dataservices.search.service=lucene
mgmtp.a12.dataservices.search.lucene.homeDir=./cosmo-claims/database/lucene
We don’t upgrade to the 2023.06 yet so we still used Lucene.
Can you try adding the following configs:
mgmtp.a12.dataservices.search.solr.urls=http://localhost:8983/solr
mgmtp.a12.dataservices.search.service=solrclient
mgmtp.a12.dataservices.search.index.initialization.mode=REBUILD_INDEX
mgmtp.a12.dataservices.search.analysis.fullText.ngrams.enabled=true
Hi @son-fast-shard ,
please check your schema.xml in the Solr. What tokenizers are used for indexing and querying for approximate search? I’m afraid that once it’s whitespace one and then another tokenizer, which avoids the ability to use the special characters for search using ngrams.
Currently, we use the default configuration of A12. So we don’t have any customized schema.xml.
Yes, then the definition is:
<fieldType name="fulltextNGram" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="40"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
and it’s not possible to use tokenizing characters for search
Is there any way to enable it? And how does it affect to the current search?
Hi @son-fast-shard ,
You can specify how the approximate match behaves by changing type for dynamic field *_APPROXIMATE_MATCH
<dynamicField name="*_APPROXIMATE_MATCH" type="fulltextNGram" indexed="true" stored="false" multiValued="true"/>
to any type you would like. @petr-high-peak provided the default configuration.
By looking into your use case, I would suggest trying different tokenizer for index analyzer. I cannot recommend one just by looking at the one input string. The tokenizer (index analyzer) create searchable tokens from the field values. I.e.: whitespace tokenizer creates as many tokens as there are white spaces in the field value, standard tokenizer adds token splits based on the special characters and keyword tokenizer takes whole field value as single token.
I would suggest trying out a couple of configuration from the schema.xml on your data (in some test) environment until you get the results that you find acceptable. Please keep in mind that changing of schema.xml will require restart of DS server and enabling re-indexing of documents.
The answer is valid for DS versions 33.0 - 37.0