-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Search for URLs in messages doesn't return any results #3024
Comments
Plain SQL query returns the message like a charm:
|
Just ran into this same issue. Tried searching a room for a specific link and it wasn't finding anything but I found it when searching manually. |
Managed to reproduce in Riot on my homeserver. Will try upgrading server to see if the issue persists. (Though shouldn't be Riot-specific since the request itself seems to be failing.) |
Also occurs on latest develop version when running live. |
This bit looks suspicious: https://github.com/matrix-org/synapse/blob/master/synapse/storage/search.py#L398 ts_search appears to be doing some kind of stopword removal and such. I can imagine URI's might get mangled by that https://www.postgresql.org/docs/9.5/textsearch-controls.html |
And the tests were passing since they are run against SQLite, which uses a somewhat more straightforward approach. |
SELECT vector FROM event_search WHERE vector @@ to_tsquery('english', 'www.youtube.com'); Curiously, this does seem to produce a couple results, suggesting to_tsquery isn't the roadbloack I thought it might be. |
Yep: PostgreSQL is rather iffy when it comes to handling URI's. You end up with queries like This is made worse by the fact that Synapse mangles a query string like "www.youtube.com" into something like |
How about we pre-process the search index like this?
This results in a breakdown like this:
Much nicer parsing result. |
Removing special characters would happen around here then, probably: https://github.com/matrix-org/synapse/blob/master/synapse/storage/search.py#L326 Note that we're essentially already doing that here: https://github.com/matrix-org/synapse/blob/master/synapse/storage/search.py#L695 |
…3024. Signed-off-by: Werner Kroneman <werner@wernerkroneman.nl>
… DB, should improve search performance with regards to matrix-org#3024. Added regression test as well, which passes. Signed-off-by: Werner Kroneman <werner@wernerkroneman.nl>
How odd... When entering the URI in Riot search it doesn't seem to always work, yet it seems mostly reliable when running the tests. Yes, I'm running with |
🤦♂️ I forgot how federation worked, was running the search against a non-upgraded server. On a more positive note, URI search seems to work very well! |
Note that the improved search will only work on messages newly inserted into the search index. (Old messages will behave the same as before.) It will also only work on PostgreSQL. SQlite seemed to already be sorta working before so i didn't touch it. |
I've just had a quick look and it's still an issue. For what it's worth by looking at the search vector and testing locally a bit it looks like dropping the
|
Description
Message text search doesn't work with URL. Even if you search for URL exactly as it was in a message, search will not return any results.
Steps to reproduce
The search results will not include the message with the link, which is unexpected.
Version information
The text was updated successfully, but these errors were encountered: