-
-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Is it possible to remove stopwords from query terms before the actual search? #74
Comments
Hi! Can you run Sonic in "debug" mode, and issue both queries separately, and report debug log there? On the stopwords removal in the libraries, that is possible but we'd be better off keeping it in Sonic itself for performance and uniformity reasons (otherwise there's added maintenance overhead on the libraries). |
Debug for "report weeks meeting":
|
Debug for "last week meetings"
|
I agree 100% with this <3 |
Thanks for the debug logs, I have all I need 👍 I'll handle this later today / or next week. My guess is that the lexer is taking "last" as a stopwords when you ingest the text, and thus it's not in index. If the search query is not long enough, the detected locale may not be correct, and thus the stopword "last" from query may not be eluded, thus hitting the search index. What do you think of adding a way to add an optional language hint to the search QUERY command, which will force the lexer detected language to the one passed? This would fix your issue. |
@valeriansaliou went arread and logged the output for when I ingest the term:
|
That would be perfect! |
Really love Sonic's concept and plan to start learning Rust just to help you guys. |
Closing this, as this issue is now handled from #75 |
Hi @andersonsantos! Just a heads up to let you know this has been implemented in #75 and will be released today in |
Hi @valeriansaliou,
I made some tests in Portuguese (Brazil) and in English (US) and found that if you ingest a text like:
"this is the report from our last weeks meeting"
A query for "last weeks meeting" would return empty (because it includes a stopword) but a query for "report weeks meeting" would return the object.
If we remove the stop words before querying, "last weeks meeting" would also return the object.
What would be the performance implications for this change?
Other option, would be to add a removeStopwords + lang function into the Sonic Channel NPM module and make the query cleanup process in there. Any thoughts about this?
Thanks!
The text was updated successfully, but these errors were encountered: