Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Activate persistent_connections_limit generate query error #2352

Open
5 tasks done
daikoz opened this issue Jun 30, 2024 · 13 comments
Open
5 tasks done

Activate persistent_connections_limit generate query error #2352

daikoz opened this issue Jun 30, 2024 · 13 comments
Assignees
Labels
bug est::size_M rel::6.3.6 waiting Waiting for the original poster (in most cases) or something else

Comments

@daikoz
Copy link
Contributor

daikoz commented Jun 30, 2024

Bug Description:

I upload data and configuration on s3 (like https://manual.manticoresearch.com/Reporting_bugs#How-do-I-install-debug-symbols?)

In summary, I have 2 servers: SERVER1 SERVER2 with same manticore configuration.

I use a distributed index like:

index XXXX
{
     type             = distributed
     agent_persistent = 192.168.2.12:9312|192.168.2.13:9312:Advertisements
     agent_retry_count = 1
     ha_strategy      = nodeads
}

Scenario:
1/ Configure SERVER1 SERVER2 and modify the ip of agent_persistent
2/ Launch manticore on 2 servers:

searchd --console --config manticore.conf

manticore.conf is provided on s3

3/ on server1, Launch 2 times:

while IFS= read -r ligne; do mariadb -h0 -P9306 -s -N -e "$ligne" 2>&1 >/dev/null || echo $ligne; done < "query.log"

=> All is OK; No errors

4/ Now, uncomment persistent_connections_limit in manticore.conf, restart manticore on 2 servers and launch 2 times on server1:

while IFS= read -r ligne; do mariadb -h0 -P9306 -s -N -e "$ligne" 2>&1 >/dev/null || echo $ligne; done < "query.log"

=> ERRORS:

ERROR 1064 (42000) at line 1: internal error: column 'keywords/keywords' not found in result set schema

Manticore Search Version:

Manticore 6.3.2 c296dc7c8@24062606

Operating System Version:

Debian 12 bookworm uptodate

Have you tried the latest development version?

No

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated
@sanikolaev
Copy link
Collaborator

Thanks @daikoz

We'll look into it.

@tomatolog
Copy link
Contributor

I see the master daemon crashed on merging result sets either with or without persistent_connections_limit option

@tomatolog
Copy link
Contributor

the issue was fixed at 950e16a you need to update package from the dev repository after CI finished and package will be published to get issue fixed

@sanikolaev sanikolaev added the rel::upcoming Upcoming release label Jul 12, 2024
@sanikolaev
Copy link
Collaborator

Reopening to complete the checklist items

@sanikolaev sanikolaev reopened this Jul 12, 2024
@tomatolog
Copy link
Contributor

we need to add to clt test the case for the query into 2 indexes with OPTION max_query_time, like

select id from idx_small, idx_large ... OPTION max_query_time=100

should crash daemon or produces internal error message

The indexes sizes should be:

  • the query into the only idx_small should finish in 10 - 50 ms
  • the query into the only idx_large should finish in 1 sec - 2 sec

The case is the result set from idx_small returns but searching idx_large exits by timeout without any result set that causes daemon to crash or produces internal error message

We need:

  • populate data similar to clt-tests\comparison-overhead-json-sql\test-comparison-overhead-json-sql.rec but post into idx_small 1000 documents and into idx_large all documents
  • flush posted data into disk chunks of the RT indexes
  • check the queries into the indexes alone
    -- select id from idx_small where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') finish in 10 - 50 ms
    -- select id from idx_large where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') finish in 1 - 2 seconds
  • check the query into the combined index cause daemon crash or internal error message select id from idx_small, idx_large where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') OPTION max_query_time=100
  • check that recent daemon version does not have that crash

@PavelShilin89
Copy link
Contributor

I wrote a Clt-test and tested different variants of OPTION max_query_time, however, even in the case of getting warning the daemon failure is not confirmed.

––– input –––
rm -f /var/log/manticore/searchd.log; searchd --stopwait > /dev/null; searchd; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
Manticore %{SEMVER} %{COMMITDATE}#!/(\sdev)?\s/!#(columnar %{SEMVER} %{COMMITDATE}) (secondary %{SEMVER} %{COMMITDATE}) (knn %{SEMVER} %{COMMITDATE})
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-%{YEAR}, Manticore Software LTD (https://manticoresearch.com)
[#!/[0-9]{2}:[0-9]{2}.[0-9]{3}/!#] [#!/[0-9]+/!#] using config file '/etc/manticoresearch/manticore.conf' (%{NUMBER} chars)...
starting daemon version '%{SEMVER} %{COMMITDATE}#!/(\sdev)?\s/!#(columnar %{SEMVER} %{COMMITDATE}) (secondary %{SEMVER} %{COMMITDATE}) (knn %{SEMVER} %{COMMITDATE})' ...
listening on %{IPADDR}:9312 for sphinx and http(s)
listening on %{IPADDR}:9306 for mysql
listening on %{IPADDR}:9308 for sphinx and http(s)
Buddy started!
––– input –––
mysql -h0 -P9306 -e "SET GLOBAL qcache_max_bytes=0;"
––– output –––
––– input –––
php -d memory_limit=-1 ./test/clt-tests/comparison-overhead-json-sql/load_us_names.php 100 1 1000 1 > /dev/null
––– output –––
––– input –––
mysql -h0 -P9306 -e "ALTER TABLE name RENAME idx_small;"
––– output –––
––– input –––
mysql -h0 -P9306 -e "flush ramchunk idx_small;"
––– output –––
––– input –––
php -d memory_limit=-1 ./test/clt-tests/comparison-overhead-json-sql/load_us_names.php 100 1 1000000 1 > /dev/null
––– output –––
––– input –––
mysql -h0 -P9306 -e "ALTER TABLE name RENAME idx_large;"
––– output –––
––– input –––
mysql -h0 -P9306 -e "flush ramchunk idx_large;"
––– output –––
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM idx_small;"
––– output –––
+----------+
| count(*) |
+----------+
|     1000 |
+----------+
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM idx_large;"
––– output –––
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
––– input –––
mysql -h0 -P9306 -e "select id from idx_small where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') OPTION max_query_time=10; show warnings;"
––– output –––
+------+
| id   |
+------+
|  112 |
|  380 |
|  707 |
|  822 |
|  311 |
|  323 |
|  570 |
|  630 |
|  950 |
|   62 |
|  115 |
|  189 |
|  191 |
|  226 |
|  273 |
|  291 |
|  346 |
|  392 |
|  396 |
|  397 |
+------+
––– input –––
mysql -h0 -P9306 -e "select id from idx_large where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') OPTION max_query_time=10; show warnings;"
––– output –––
+---------+------+------------------------------------+
| Level   | Code | Message                            |
+---------+------+------------------------------------+
| warning | 1000 | query time exceeded max_query_time |
+---------+------+------------------------------------+
––– input –––
mysql -h0 -P9306 -e "select id from idx_small, idx_large where match('a*|b*|c*|d*|e*|f*|g*|h*|i*|j*|k*|l*|m*|n*|o*|p*|q*|r*|s*|t*|u*|v*|w*|x*|y*|z*') OPTION max_query_time=5;"
––– output –––
+------+
| id   |
+------+
|  112 |
|  380 |
|  707 |
|  822 |
|  311 |
|  323 |
|  570 |
|  630 |
|  950 |
|   62 |
|  115 |
|  189 |
|  191 |
|  226 |
|  273 |
|  291 |
|  346 |
|  392 |
|  396 |
|  397 |
+------+

@daikoz
Copy link
Contributor Author

daikoz commented Jul 31, 2024

Hi,
6.3.4 is out today. I don't see this fix. Is this an oversight in the changelog ?
thx

@tomatolog
Copy link
Contributor

6.3.4 is service release and does not include all fixes from the master. I sure this fix along with all other fixes will be in the next major release.

You could use package from the dev repository to get this crash fixed.

sanikolaev pushed a commit that referenced this issue Aug 2, 2024
…ime when query multiple indexes; fixed sorter schema difference on early exit; fixed #2352
@sanikolaev sanikolaev added rel::6.3.6 and removed rel::upcoming Upcoming release labels Aug 2, 2024
@sanikolaev
Copy link
Collaborator

The fix has been included in 6.3.6.

@daikoz
Copy link
Contributor Author

daikoz commented Aug 20, 2024

After test with 6.3.6, I always the issue when I add:

persistent_connections_limit = 32

in searchd section

@sanikolaev sanikolaev reopened this Aug 21, 2024
@sanikolaev
Copy link
Collaborator

@daikoz I can't reproduce the issue in 6.3.7:

snikolaev@dev2:~/issue-2352$ while IFS= read -r ligne; do mysql -h0 -P29306 -s -N -e "$ligne" 2>&1 >/dev/null || echo $ligne; done < "query.log"
ERROR 1064 (42000) at line 1: unknown local table(s) 'AdvertisementsLBLB' in search request
SELECT 1 2 manticore.conf query.log FROM AdvertisementsLBLB WHERE MATCH('maison') LIMIT 0,100; /bin /boot /dev /etc /home /installimage.conf /installimage.debug /lib /lib32 /lib64 /libx32 /lost+found /manticore /media /mnt /opt /proc /root /run /sbin /srv /sys /tmp /usr /var /work agents=(0.070) 1/ 2/
ERROR 1064 (42000) at line 1: unknown local table(s) 'AdvertisementsLBLB' in search request
SELECT 1 2 manticore.conf query.log FROM AdvertisementsLBLB WHERE MATCH('maison') LIMIT 0,100; /bin /boot /dev /etc /home /installimage.conf /installimage.debug /lib /lib32 /lib64 /libx32 /lost+found /manticore /media /mnt /opt /proc /root /run /sbin /srv /sys /tmp /usr /var /work agents=(0.019) 1/ 2/
ERROR 1064 (42000) at line 1: unknown local table(s) 'AdvertisementsLBLB' in search request
...

and so on and no error like:

ERROR 1064 (42000) at line 1: internal error: column 'keywords/keywords' not found in result set schema

My config is:

mysql> desc AdvertisementsLB;
+--------------------------------+-------------------+
| Agent                          | Type              |
+--------------------------------+-------------------+
| localhost:29312:Advertisements | remote_1_mirror_1 |
| localhost:39312:Advertisements | remote_1_mirror_2 |
+--------------------------------+-------------------+
2 rows in set (0.00 sec)

mysql> desc SearchKeywordsLB;
+--------------------------------+-------------------+
| Agent                          | Type              |
+--------------------------------+-------------------+
| localhost:29312:SearchKeywords | remote_1_mirror_1 |
| localhost:39312:SearchKeywords | remote_1_mirror_2 |
+--------------------------------+-------------------+
2 rows in set (0.00 sec)

mysql> show tables;
+---------------------+-------------+
| Index               | Type        |
+---------------------+-------------+
| Advertisements      | distributed |
| AdvertisementsDelta | local       |
| AdvertisementsLB    | distributed |
| AdvertisementsMain  | local       |
| SearchKeywords      | local       |
| SearchKeywordsLB    | distributed |
+---------------------+-------------+
6 rows in set (0.00 sec)

on each node.

The mentioned setting is uncommented on both nodes:

snikolaev@dev2:~/issue-2352$ grep pers 1/manticore.conf 2/manticore.conf
1/manticore.conf:    persistent_connections_limit=10
1/manticore.conf:     agent_persistent = localhost:29312|localhost:39312:Advertisements
1/manticore.conf:     agent_persistent = localhost:29312|localhost:39312:SearchKeywords
2/manticore.conf:    persistent_connections_limit=10
2/manticore.conf:     agent_persistent = localhost:29312|localhost:39312:Advertisements
2/manticore.conf:     agent_persistent = localhost:29312|localhost:39312:SearchKeywords

The Manticore version is:

snikolaev@dev2:~/issue-2352$ searchd -v
Manticore 6.3.7 0ebcf8c82@24080614 dev (columnar 2.3.1 eb62283@24072718) (secondary 2.3.1 eb62283@24072718) (knn 2.3.1 eb62283@24072718)

Please check if the issue persists in this (or newer) version for you (https://mnt.cr/dev/nightly) and if it does - provide an instruction how to reproduce it.

@sanikolaev sanikolaev added the waiting Waiting for the original poster (in most cases) or something else label Aug 22, 2024
@daikoz
Copy link
Contributor Author

daikoz commented Aug 23, 2024

I tested on 6.3.7. I can reproduce this issue.

# searchd -v
Manticore 6.3.7 66626616f@24082118 dev
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2024, Manticore Software LTD (https://manticoresearch.com)

In the scenario, I launch the 2 command while on same server.

We can do a zoom, I will share my screen.

Same issue on production when I try to activate persistent_connections_limit.
Without persistent_connections_limit, no issue now.

@tomatolog
Copy link
Contributor

could you create docker-compose file with indexes and configs we could run locally and catch this issue?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
bug est::size_M rel::6.3.6 waiting Waiting for the original poster (in most cases) or something else
Projects
None yet
Development

No branches or pull requests

4 participants