Skip to content
This repository has been archived by the owner on Jan 17, 2023. It is now read-only.

Simplify retrieval of My Shots data, hopefully speeding things up #3374

Merged
merged 1 commit into from
Aug 18, 2017

Conversation

jaredhirsch
Copy link
Member

@jaredhirsch jaredhirsch commented Aug 18, 2017

Looking at the DB queries that generate the My Shots page, the query that gets all devices for the current user seems problematic, based on the EXPLAIN output (note the Seq Scan):

jhirsch=# EXPLAIN ANALYZE SELECT DISTINCT devices.id FROM devices, devices AS devices2 WHERE devices.id = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438' OR (devices.accountid = devices2.accountid AND devices2.id = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438');
 Unique  (cost=22395.73..22407.76 rows=2405 width=33) (actual time=40.324..40.768 rows=1 loops=1)
   ->  Sort  (cost=22395.73..22401.75 rows=2405 width=33) (actual time=40.324..40.511 rows=2420 loops=1)
         Sort Key: devices.id
         Sort Method: quicksort  Memory: 286kB
         ->  Nested Loop  (cost=5.19..22260.67 rows=2405 width=33) (actual time=0.039..39.835 rows=2420 loops=1)
               ->  Seq Scan on devices devices2  (cost=0.00..81.05 rows=2405 width=451) (actual time=0.006..1.167 rows=2420 loops=1)
               ->  Bitmap Heap Scan on devices  (cost=5.19..9.21 rows=1 width=451) (actual time=0.015..0.015 rows=1 loops=2420)
                     Recheck Cond: (((id)::text = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438'::text) OR ((accountid)::text = (devices2.accountid)::text))
                     Filter: (((id)::text = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438'::text) OR (((accountid)::text = (devices2.accountid)::text) AND ((devices2.id)::text = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438'::text)))
                     Heap Blocks: exact=2420
                     ->  BitmapOr  (cost=5.19..5.19 rows=1 width=0) (actual time=0.014..0.014 rows=0 loops=2420)
                           ->  Bitmap Index Scan on devices_pkey  (cost=0.00..4.29 rows=1 width=0) (actual time=0.014..0.014 rows=1 loops=2420)
                                 Index Cond: ((id)::text = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438'::text)
                           ->  Bitmap Index Scan on devices_accountid_idx  (cost=0.00..0.31 rows=1 width=0) (actual time=0.000..0.000 rows=0 loops=2420)
                                 Index Cond: ((accountid)::text = (devices2.accountid)::text)
 Planning time: 0.162 ms
 Execution time: 40.799 ms

Ian pointed out that we already have the accountId, if it exists. Modifying the query to take advantage of accountId dramatically reduces the cost of the query from around 20000 to around 10:

jhirsch=# EXPLAIN ANALYZE SELECT DISTINCT devices.id FROM devices WHERE devices.id = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438' OR devices.accountid = 'd4f047ab839511e79498784f4376aa68';
 Unique  (cost=12.60..12.61 rows=1 width=33) (actual time=0.077..0.078 rows=1 loops=1)
   ->  Sort  (cost=12.60..12.61 rows=1 width=33) (actual time=0.077..0.078 rows=1 loops=1)
         Sort Key: id
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on devices  (cost=8.58..12.59 rows=1 width=33) (actual time=0.069..0.070 rows=1 loops=1)
               Recheck Cond: (((id)::text = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438'::text) OR ((accountid)::text = 'd4f047ab839511e79498784f4376aa68'::text))
               Heap Blocks: exact=1
               ->  BitmapOr  (cost=8.58..8.58 rows=1 width=0) (actual time=0.028..0.028 rows=0 loops=1)
                     ->  Bitmap Index Scan on devices_pkey  (cost=0.00..4.29 rows=1 width=0) (actual time=0.023..0.023 rows=1 loops=1)
                           Index Cond: ((id)::text = 'anoneb12496a-e845-4f09-8fa6-0e877b7d4438'::text)
                     ->  Bitmap Index Scan on devices_accountid_idx  (cost=0.00..4.29 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=1)
                           Index Cond: ((accountid)::text = 'd4f047ab839511e79498784f4376aa68'::text)
 Planning time: 10.119 ms
 Execution time: 0.105 ms

I've been advocating taking a less piecemeal approach to optimization, but eliminating a full table scan seems like a clear win.

Copy link
Contributor

@ianb ianb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay just moving forward with this and seeing how it goes, but a couple comments...

`,
[deviceId]
[deviceId, accountId]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess technically if accountId is null (as it is when you aren't logged in), then devices.accountid = NULL will always be false (I guess that's just SQL doing what it's supposed to do), but that always seemed like an unusual corner of SQL. At least worth a comment so a later reader who isn't aware of how NULL is handled will realize the code works as it should.

We could probably go further than this, and do a join later on and do the whole thing in one query... but that might actually reintroduce the performance issue.

Should we also add CREATE INDEX devices_accountid_idx ON devices (accountid) ? Though we could batch up a bunch of these indexes later tool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess technically if accountId is null (as it is when you aren't logged in), then devices.accountid = NULL will always be false (I guess that's just SQL doing what it's supposed to do), but that always seemed like an unusual corner of SQL. At least worth a comment so a later reader who isn't aware of how NULL is handled will realize the code works as it should.

Sure, I'll add a note here.

We could probably go further than this, and do a join later on and do the whole thing in one query... but that might actually reintroduce the performance issue.

Yeah, good point. Maybe once we're sure there aren't other sequential scans lurking elsewhere, we can focus on reducing the number of round trips to the database.

Should we also add CREATE INDEX devices_accountid_idx ON devices (accountid) ? Though we could batch up a bunch of these indexes later tool.

The index already exists, but wasn't getting used. I was thinking of adding lookup tables (device to account, and account to device), to provide a hint to the query planner, but this SO answer suggests using compound indexes to do the same job (account + device and device + account indexes).

That same SO answer points out that varchar is not the most efficient data type for indexes or sorting, which is something else to maybe consider in the future.

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants