Simplify retrieval of My Shots data, hopefully speeding things up #3374

jaredhirsch · 2017-08-18T00:15:01Z

Looking at the DB queries that generate the My Shots page, the query that gets all devices for the current user seems problematic, based on the EXPLAIN output (note the Seq Scan):

jhirsch=# EXPLAIN ANALYZE SELECT DISTINCT devices.id FROM devices, devices AS devices2 WHERE devices.id = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438' OR (devices.accountid = devices2.accountid AND devices2.id = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438');
 Unique  (cost=22395.73..22407.76 rows=2405 width=33) (actual time=40.324..40.768 rows=1 loops=1)
   ->  Sort  (cost=22395.73..22401.75 rows=2405 width=33) (actual time=40.324..40.511 rows=2420 loops=1)
         Sort Key: devices.id
         Sort Method: quicksort  Memory: 286kB
         ->  Nested Loop  (cost=5.19..22260.67 rows=2405 width=33) (actual time=0.039..39.835 rows=2420 loops=1)
               ->  Seq Scan on devices devices2  (cost=0.00..81.05 rows=2405 width=451) (actual time=0.006..1.167 rows=2420 loops=1)
               ->  Bitmap Heap Scan on devices  (cost=5.19..9.21 rows=1 width=451) (actual time=0.015..0.015 rows=1 loops=2420)
                     Recheck Cond: (((id)::text = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438'::text) OR ((accountid)::text = (devices2.accountid)::text))
                     Filter: (((id)::text = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438'::text) OR (((accountid)::text = (devices2.accountid)::text) AND ((devices2.id)::text = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438'::text)))
                     Heap Blocks: exact=2420
                     ->  BitmapOr  (cost=5.19..5.19 rows=1 width=0) (actual time=0.014..0.014 rows=0 loops=2420)
                           ->  Bitmap Index Scan on devices_pkey  (cost=0.00..4.29 rows=1 width=0) (actual time=0.014..0.014 rows=1 loops=2420)
                                 Index Cond: ((id)::text = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438'::text)
                           ->  Bitmap Index Scan on devices_accountid_idx  (cost=0.00..0.31 rows=1 width=0) (actual time=0.000..0.000 rows=0 loops=2420)
                                 Index Cond: ((accountid)::text = (devices2.accountid)::text)
 Planning time: 0.162 ms
 Execution time: 40.799 ms

Ian pointed out that we already have the accountId, if it exists. Modifying the query to take advantage of accountId dramatically reduces the cost of the query from around 20000 to around 10:

jhirsch=# EXPLAIN ANALYZE SELECT DISTINCT devices.id FROM devices WHERE devices.id = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438' OR devices.accountid = 'd4f047ab839511e79498784f4376aa68';
 Unique  (cost=12.60..12.61 rows=1 width=33) (actual time=0.077..0.078 rows=1 loops=1)
   ->  Sort  (cost=12.60..12.61 rows=1 width=33) (actual time=0.077..0.078 rows=1 loops=1)
         Sort Key: id
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on devices  (cost=8.58..12.59 rows=1 width=33) (actual time=0.069..0.070 rows=1 loops=1)
               Recheck Cond: (((id)::text = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438'::text) OR ((accountid)::text = 'd4f047ab839511e79498784f4376aa68'::text))
               Heap Blocks: exact=1
               ->  BitmapOr  (cost=8.58..8.58 rows=1 width=0) (actual time=0.028..0.028 rows=0 loops=1)
                     ->  Bitmap Index Scan on devices_pkey  (cost=0.00..4.29 rows=1 width=0) (actual time=0.023..0.023 rows=1 loops=1)
                           Index Cond: ((id)::text = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438'::text)
                     ->  Bitmap Index Scan on devices_accountid_idx  (cost=0.00..4.29 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=1)
                           Index Cond: ((accountid)::text = 'd4f047ab839511e79498784f4376aa68'::text)
 Planning time: 10.119 ms
 Execution time: 0.105 ms

I've been advocating taking a less piecemeal approach to optimization, but eliminating a full table scan seems like a clear win.

ianb

I'm okay just moving forward with this and seeing how it goes, but a couple comments...

ianb · 2017-08-18T16:06:21Z

server/src/servershot.js

    `,
-    [deviceId]
+    [deviceId, accountId]


I guess technically if accountId is null (as it is when you aren't logged in), then devices.accountid = NULL will always be false (I guess that's just SQL doing what it's supposed to do), but that always seemed like an unusual corner of SQL. At least worth a comment so a later reader who isn't aware of how NULL is handled will realize the code works as it should.

We could probably go further than this, and do a join later on and do the whole thing in one query... but that might actually reintroduce the performance issue.

Should we also add CREATE INDEX devices_accountid_idx ON devices (accountid) ? Though we could batch up a bunch of these indexes later tool.

I guess technically if accountId is null (as it is when you aren't logged in), then devices.accountid = NULL will always be false (I guess that's just SQL doing what it's supposed to do), but that always seemed like an unusual corner of SQL. At least worth a comment so a later reader who isn't aware of how NULL is handled will realize the code works as it should.

Sure, I'll add a note here.

We could probably go further than this, and do a join later on and do the whole thing in one query... but that might actually reintroduce the performance issue.

Yeah, good point. Maybe once we're sure there aren't other sequential scans lurking elsewhere, we can focus on reducing the number of round trips to the database.

Should we also add CREATE INDEX devices_accountid_idx ON devices (accountid) ? Though we could batch up a bunch of these indexes later tool.

The index already exists, but wasn't getting used. I was thinking of adding lookup tables (device to account, and account to device), to provide a hint to the query planner, but this SO answer suggests using compound indexes to do the same job (account + device and device + account indexes).

That same SO answer points out that varchar is not the most efficient data type for indexes or sorting, which is something else to maybe consider in the future.

ianb approved these changes Aug 18, 2017

View reviewed changes

Simplify retrieval of My Shots data, hopefully speeding things up

5f5f4d8

jaredhirsch force-pushed the my-shots-tweak branch from 872dd8d to 5f5f4d8 Compare August 18, 2017 16:29

jaredhirsch merged commit f11b478 into master Aug 18, 2017

jaredhirsch deleted the my-shots-tweak branch August 18, 2017 16:31

This was referenced Aug 23, 2017

DOMContentLoaded of 7+ seconds on My Shots #3281

Closed

Investigate slow page load times on production #3337

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify retrieval of My Shots data, hopefully speeding things up #3374

Simplify retrieval of My Shots data, hopefully speeding things up #3374

jaredhirsch commented Aug 18, 2017 •

edited

Loading

ianb left a comment

ianb Aug 18, 2017

jaredhirsch Aug 18, 2017

Simplify retrieval of My Shots data, hopefully speeding things up #3374

Simplify retrieval of My Shots data, hopefully speeding things up #3374

Conversation

jaredhirsch commented Aug 18, 2017 • edited Loading

ianb left a comment

Choose a reason for hiding this comment

ianb Aug 18, 2017

Choose a reason for hiding this comment

jaredhirsch Aug 18, 2017

Choose a reason for hiding this comment

jaredhirsch commented Aug 18, 2017 •

edited

Loading