Speed up table counts when displaying index and database pages #2398

simonw · 2024-08-20T16:10:07Z

Original title: The query that counts all matching rows on table page should have a shorter time limit

I thought it did already, but apparently not.

simonw · 2024-08-20T16:37:57Z

This code here:

Lines 1309 to 1316 in d444b6a

    
           # Otherwise run a select count(*) ... 
        
           if count_sql and count is None and not nocount: 
        
               try: 
        
                   count_rows = list(await db.execute(count_sql, from_sql_params)) 
        
                   count = count_rows[0][0] 
        
               except QueryInterrupted: 
        
                   pass 
        
           return count

Currently it can take as long as the default SQL time limit, which means that if a table is huge you could wait a second or more for the table page to fail to count the rows.

simonw · 2024-08-20T16:41:27Z

There are other things that could happen here, for example:

If count times out, link to a SQL query page that runs the full count
Counts occur via a fetch() call and show a loading indicator while they are running
Caching! Counts could be cached for the given query filters
Super-fancy caching: any time the table is modified (captured via triggers) the count cache is invalidated, using a mechanism like simonw/sqlite-chronicle

I probably won't implement any of the fancy options as part of this initial issue.

simonw · 2024-08-20T16:43:00Z

What should the count timeout be? I'm inclined to add a new setting for it, to join sql_time_limit_ms and facet_time_limit_ms and facet_suggest_time_limit_ms

Could call it sql_count_time_limit_ms.

simonw · 2024-08-20T16:44:17Z

We do have a count time limit elsewhere, but that's for listing tables:

datasette/datasette/database.py

Lines 370 to 390 in 7d8dd2a

    
           async def table_counts(self, limit=10): 
        
               if not self.is_mutable and self.cached_table_counts is not None: 
        
                   return self.cached_table_counts 
        
               # Try to get counts for each table, $limit timeout for each count 
        
               counts = {} 
        
               for table in await self.table_names(): 
        
                   try: 
        
                       table_count = ( 
        
                           await self.execute( 
        
                               f"select count(*) from [{table}]", 
        
                               custom_time_limit=limit, 
        
                           ) 
        
                       ).rows[0][0] 
        
                       counts[table] = table_count 
        
                   # In some cases I saw "SQL Logic Error" here in addition to 
        
                   # QueryInterrupted - so we catch that too: 
        
                   except (QueryInterrupted, sqlite3.OperationalError, sqlite3.DatabaseError): 
        
                       counts[table] = None 
        
               if not self.is_mutable: 
        
                   self._cached_table_counts = counts 
        
               return counts

simonw · 2024-08-20T16:46:49Z

Currently we have a ?_nocount=1 query string option which can be used to suppress that count query - e.g. compare:

https://datasette.io/content/assets?_sort=id&size__gt=100

To https://datasette.io/content/assets?_sort=id&_nocount=1&size__gt=100

simonw · 2024-08-20T16:47:09Z

I think that second one should say "Rows where size > 100", not just "where size > 100".

simonw · 2024-08-20T16:47:40Z

Maybe add a ?_count=1 option which forces the count to happen, which we can then link to for cases where the count timed out.

simonw · 2024-08-21T01:12:18Z

Another option here: I could have the count limit to a max of 1000 using this pattern:

with original as (select ...),
truncated as (select * from original limit 1001)
select count(*) from truncated;

Demo: https://datasette.io/content?sql=with+original+as+%28select+*+from+releases%29%2C%0D%0Atruncated+as+%28select+*+from+original+limit+101%29%0D%0Aselect+count%28*%29+from+truncated%3B

Then if that comes back as 1001 I can show >1,000. This should have a pretty tight limit on how much work the count operation ends up doing, even for brute force table scans as those would only have to consult 1,000 records max.

Would need to change the JSON API to include a count_truncated field if you request ?_extra=count and it got truncated. Maybe have a ?_extra=full_count that ignores that logic?

simonw · 2024-08-21T21:39:04Z

This tiny diff makes a HUGE difference all over the place:

diff --git a/datasette/database.py b/datasette/database.py
index c761dad7..428d8703 100644
--- a/datasette/database.py
+++ b/datasette/database.py
@@ -376,7 +376,7 @@ class Database:
             try:
                 table_count = (
                     await self.execute(
-                        f"select count(*) from [{table}]",
+                        f"select count(*) from (select * from [{table}] limit 10001)",
                         custom_time_limit=limit,
                     )
                 ).rows[0][0]

simonw · 2024-08-21T21:58:02Z

I've decided to go with a limit of 10,000, which seems to make everything feel really snappy while still showing useful results for most databases.

I'm now showing "Many rows" if the time limit was hit, ">10,000 rows" if the row count was 10001 and the actual number of rows otherwise.

simonw · 2024-08-21T21:58:40Z

I'm going to deploy this branch to Datasette Cloud and see how it feels.

simonw · 2024-08-21T22:25:40Z

It feels really good. I'm happy with this change.

Refs #2320, #2342, #2398, #2399, #2400, #2403, #2404, #2405, #2406, #2407, #2408, #2414, #2415, #2420

simonw added enhancement performance labels Aug 20, 2024

simonw changed the title ~~The query that counts all matching rows on table page should have a shorter time limit~~ Speed up table counts when displaying index and database pages Aug 21, 2024

simonw added a commit that referenced this issue Aug 21, 2024

Stop counting at 10,000 rows when listing tables, refs #2398

dc1d152

simonw closed this as completed Aug 21, 2024

simonw mentioned this issue Aug 21, 2024

Stop counting at 10,000 on table view as well #2408

Closed

simonw added a commit that referenced this issue Sep 6, 2024

Release 1.0a16

0bc6a2a

Refs #2320, #2342, #2398, #2399, #2400, #2403, #2404, #2405, #2406, #2407, #2408, #2414, #2415, #2420

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up table counts when displaying index and database pages #2398

Speed up table counts when displaying index and database pages #2398

simonw commented Aug 20, 2024 •

edited

Loading

simonw commented Aug 20, 2024

simonw commented Aug 20, 2024

simonw commented Aug 20, 2024

simonw commented Aug 20, 2024

simonw commented Aug 20, 2024

simonw commented Aug 20, 2024

simonw commented Aug 20, 2024

simonw commented Aug 21, 2024 •

edited

Loading

simonw commented Aug 21, 2024

simonw commented Aug 21, 2024

simonw commented Aug 21, 2024

simonw commented Aug 21, 2024

Speed up table counts when displaying index and database pages #2398

Speed up table counts when displaying index and database pages #2398

Comments

simonw commented Aug 20, 2024 • edited Loading

simonw commented Aug 20, 2024

simonw commented Aug 20, 2024

simonw commented Aug 20, 2024

simonw commented Aug 20, 2024

simonw commented Aug 20, 2024

simonw commented Aug 20, 2024

simonw commented Aug 20, 2024

simonw commented Aug 21, 2024 • edited Loading

simonw commented Aug 21, 2024

simonw commented Aug 21, 2024

simonw commented Aug 21, 2024

simonw commented Aug 21, 2024

simonw commented Aug 20, 2024 •

edited

Loading

simonw commented Aug 21, 2024 •

edited

Loading