Skip to content

Wrong results returned from RUM index with order_by_attach=TRUE #79

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
michaelwu0505 opened this issue Feb 22, 2020 · 0 comments
Open

Comments

@michaelwu0505
Copy link

michaelwu0505 commented Feb 22, 2020

Below are two cases showing wrong results returned from RUM index with order_by_attach=TRUE. In both test cases, if order_by_attach=FALSE, correct results will be returned.

Tested with PostgreSQL 12.2 & newest commit e34375a.

CASE 1:

CREATE TABLE test (
	id bigint NOT NULL,
	folder bigint NOT NULL,
	time bigint NOT NULL,
	tsv tsvector NOT NULL
)
INSERT INTO test (id, folder, time, tsv) VALUES (1, 10, 100, to_tsvector('wordA'));
INSERT INTO test (id, folder, time, tsv) VALUES (2, 20, 200, to_tsvector('wordB'));
INSERT INTO test (id, folder, time, tsv) VALUES (3, 10, 300, to_tsvector('wordA'));
INSERT INTO test (id, folder, time, tsv) VALUES (4, 20, 400, to_tsvector('wordB'));

Below shows expected results when select without index:

SELECT * FROM test WHERE tsv @@ (to_tsquery('wordA')) AND (folder = 10::bigint)

Returns rows with id 1 and 3.

SELECT * FROM test WHERE tsv @@ (to_tsquery('wordA')) AND (folder = 10::bigint) ORDER BY time <=| 500::bigint

Returns rows with id 3 and 1.

After creating the following index, wrong results will be returned:

CREATE INDEX test_idx ON test USING rum(folder, tsv rum_tsvector_addon_ops, time) with (attach = 'time', to = 'tsv', order_by_attach=TRUE);
SET enable_seqscan = OFF;
SELECT * FROM test WHERE tsv @@ (to_tsquery('wordA')) AND (folder = 10::bigint)

Wrong result: Returns only row with id 1. (Expects 1 and 3)

SET enable_seqscan = OFF;
SELECT * FROM test WHERE tsv @@ (to_tsquery('wordA')) AND (folder = 10::bigint) ORDER BY time <=| 500::bigint

Wrong result: Returns nothing. (Expects rows with id 3 and 1)

CASE 2:

CREATE TABLE test2 (
	id bigint NOT NULL,
	time bigint NOT NULL,
	tsv tsvector NOT NULL
)
CREATE OR REPLACE PROCEDURE test2_init()
AS $$
DECLARE
	counter INTEGER := 1; 
	str TEXT;
	time BIGINT;
BEGIN
	WHILE counter <= 1000 LOOP
	
		IF counter % 10 = 0 THEN
			str := 'wordA wordB';
		ELSEIF counter % 11 = 0 THEN
			str := 'wordA wordB wordC';
		ELSE
			str := 'wordA wordD';
		END IF;
		
		-- insert rows with alternating time
		IF counter % 2 = 0 THEN
			time := counter;
		ELSE
			time := -counter;
		END IF;
		
		
		INSERT INTO test2 (id, time, tsv) VALUES (counter, time, to_tsvector(str));
		
		counter := counter + 1;
	END LOOP;
END
$$
LANGUAGE plpgsql;

CALL test2_init();

Below shows expected results when select without index:

SELECT * from test2 where tsv @@ (to_tsquery('wordA & wordB')) ORDER BY time <=| 1001::bigint 

Returned 181 rows.

After creating the following index, wrong results will be returned:

CREATE INDEX test2_idx ON test2 USING rum(tsv rum_tsvector_addon_ops, time) with (attach = 'time', to = 'tsv', order_by_attach=TRUE);
SET enable_seqscan = OFF;
SELECT * from test2 where tsv @@ (to_tsquery('wordA & wordB')) ORDER BY time <=| 1001::bigint 

Wrong result: Returns only 153 rows. (Expects 181 rows)

If the above query changes from ORDER BY time <=| 1001::bigint to ORDER BY <=> 1001::bigint, then correct number of rows will be returned.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant