Avoid streaming entire IDB tables #486

turbocrime · 2024-02-08T22:24:56Z

After #312 and #481 we are now iterating over streamed queries, instead of loading the entire query into memory

This is an improvement but could still get better. Inspecting and filtering items in the table should, as much as possible, be done by the database, not the caller. IDB can handle more specific queries, if we prepare index by an object's attribute.

For example, in transaction-info.ts,

web/packages/router/src/grpc/view-protocol-server/transaction-info.ts

Lines 8 to 14 in f866ef4

    
           for await (const txInfo of indexedDb.iterateTransactionInfo()) { 
        
             // filter transactions between startHeight and endHeight, inclusive 
        
             if (txInfo.height < req.startHeight || (req.endHeight && txInfo.height > req.endHeight)) 
        
               continue; 
        
             yield { txInfo }; 
        
           }

Which is backed by our idb query helper iterateTransactionInfo,

web/packages/storage/src/indexed-db/index.ts

Lines 187 to 198 in 8e13b82

    
           async *iterateTransactionInfo() { 
        
             for await (const { value } of this.db.transaction('TRANSACTION_INFO').store) { 
        
               yield TransactionInfo.fromJson(value); 
        
             } 
        
           } 
        
           async saveTransactionInfo(tx: TransactionInfo): Promise<void> { 
        
             await this.u.update({ 
        
               table: 'TRANSACTION_INFO', 
        
               value: tx.toJson() as Jsonified<TransactionInfo>, 
        
             }); 
        
           }

We could instead query a key range, if we prepared the schema to index transactions by height, the way we index swaps, notes, &c by index

web/packages/storage/src/indexed-db/index.ts

Line 74 in f866ef4

db.createObjectStore('TRANSACTION_INFO', { keyPath: 'id.inner' });

would become

        db.createObjectStore('TRANSACTION_INFO', { keyPath: 'id.inner' }).createIndex('height', 'height');

with schema

  TRANSACTION_INFO: {
    key: Jsonified<Required<TransactionInfo>['id']['inner']>; // base64
    value: Jsonified<TransactionInfo>;
    indexes: {
      height: Jsonified<Required<TransactionInfo>['height']>;
    };
  };

and expose the index appropriately through our queriers

  async *iterateTransactionInfo(lower: bigint, upper: bigint) {
    const dbTx = this.db.transaction('TRANSACTION_INFO');
    const dbIdx = dbTx.store.index('height');
    for await (const { value } of dbIdx.iterate(IDBKeyRange.bound(lower, upper)))
      yield TransactionInfo.fromJson(value);
  }

which would greatly simplify transaction-info.ts

import type { Impl } from '.';
import { servicesCtx } from '../../ctx';

export const transactionInfo: Impl['transactionInfo'] = async function* (req, ctx) {
  const services = ctx.values.get(servicesCtx);
  const { indexedDb } = await services.getWalletServices();

  yield* indexedDb.iterateTransactionInfo(req.startHeight, req.endHeight);
};

this technique could be applied generally through most of our full-table idb methods.

some situations involving compound indicies can be handled, but idb does not have good support for complex compound indicies. things like assets.ts with multiple fields

web/packages/router/src/grpc/view-protocol-server/assets.ts

Lines 19 to 52 in f866ef4

    
           const patterns: { 
        
             includeReq: boolean; 
        
             pattern: RegExp; 
        
           }[] = [ 
        
             { 
        
               includeReq: includeLpNfts, 
        
               pattern: assetPatterns.lpNftPattern, 
        
             }, 
        
             { 
        
               includeReq: includeDelegationTokens, 
        
               pattern: assetPatterns.delegationTokenPattern, 
        
             }, 
        
             { 
        
               includeReq: includeProposalNfts, 
        
               pattern: assetPatterns.proposalNftPattern, 
        
             }, 
        
             { 
        
               includeReq: includeUnbondingTokens, 
        
               pattern: assetPatterns.unbondingTokenPattern, 
        
             }, 
        
             { 
        
               includeReq: includeVotingReceiptTokens, 
        
               pattern: assetPatterns.votingReceiptPattern, 
        
             }, 
        
             ...includeSpecificDenominations.map(d => ({ 
        
               includeReq: true, 
        
               pattern: new RegExp(`^${d.denom}$`), 
        
             })), 
        
           ].filter(i => i.includeReq); 
        
           for await (const metadata of indexedDb.iterateAssetsMetadata()) { 
        
             if (filtered && !patterns.find(p => p.pattern.test(metadata.display))) continue; 
        
             yield { denomMetadata: metadata }; 
        
           }

would not be easy to provide indexes for. you can see more detail on this subject

https://yathit.github.io/ydn-db/doc/query/cmpidx.html which is documentation for another library, but it does provide insight into the limitations of idb compound queries
https://stackoverflow.com/questions/12084177/in-indexeddb-is-there-a-way-to-make-a-sorted-compound-query a stackoverflow answer that is thorough but more complex to understand

grod220 · 2024-02-09T11:24:54Z

Yep, this makes sense. @Valentine1898 perhaps you can pick this up when you're next available? Thinking you may be a good candidate for this as it may impact the testing tracking issue you are working on? #333

Valentine1898 · 2024-02-09T11:30:02Z

Yeah, sure, I'll start that today

Valentine1898 · 2024-02-12T11:47:57Z

We have only a few streaming queries at the moment

iterateSpendableNotes
Allows filtering by several fields: assetId, addressIndex, includeSpent, amountToSpend
It is impossible to add an index for amountToSpend, because it is analogous to the SQL function SUM BY.
And for assetId, addressIndex, includeSpent we need a compound index, which is poorly supported in indexed-db. Also, we should take into account that all filtering fields may not be defined, rpc specification allows it.
iterateTransactionInfo
In iterateTransactionInfo there are only 2 filtering parameters startHeight and endHeight, and both parameters are about one field in indexed-db. But, both parameters are optional
This means that we can have 4 cases

startHeight and endHeight are not defined and we should not iterate over the index.
Only startHeight is defined and we should only do IDBKeyRange.bound(lower)
only endHeight is defined and we should only do IDBKeyRange.bound(upper)
both defined and IDBKeyRange.bound(lower, upper)

iterateAssetsMetadata
Difficult filtering conditions with patern search, not sure if index implementation is possible
iterateSwaps
No filtering

Based on this, only iterateTransactionInfo is a potential candidate for adding an index, and using iteration on the index. But taking into account that filtering parameters are optional, it seems that we will get code complication and testing, instead of minor query optimization.
My suggestion is to leave everything as it is.

grod220 · 2024-02-29T16:38:24Z

@Valentine1898 think we can close this issue?

Valentine1898 · 2024-03-06T15:32:58Z

@Valentine1898 think we can close this issue?

Missed the comment, I think yes

github-project-automation bot added this to Penumbra web Feb 8, 2024

github-project-automation bot moved this to 🗄️ Backlog in Penumbra web Feb 8, 2024

turbocrime added the refactor Improving existing system with new design label Feb 8, 2024

Valentine1898 self-assigned this Feb 9, 2024

Valentine1898 moved this from 🗄️ Backlog to 📝 Todo in Penumbra web Feb 9, 2024

Valentine1898 moved this from 📝 Todo to 🏗 In progress in Penumbra web Feb 9, 2024

Valentine1898 moved this from 🏗 In progress to 🗄️ Backlog in Penumbra web Feb 15, 2024

grod220 closed this as completed Mar 6, 2024

github-project-automation bot moved this from 🗄️ Backlog to ✅ Done in Penumbra web Mar 6, 2024

turbocrime mentioned this issue Jan 31, 2025

Optimization: more idb queries could use indicies or key ranges #2009

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid streaming entire IDB tables #486

Avoid streaming entire IDB tables #486

turbocrime commented Feb 8, 2024 •

edited

Loading

grod220 commented Feb 9, 2024

Valentine1898 commented Feb 9, 2024

Valentine1898 commented Feb 12, 2024

grod220 commented Feb 29, 2024

Valentine1898 commented Mar 6, 2024

Avoid streaming entire IDB tables #486

Avoid streaming entire IDB tables #486

Comments

turbocrime commented Feb 8, 2024 • edited Loading

grod220 commented Feb 9, 2024

Valentine1898 commented Feb 9, 2024

Valentine1898 commented Feb 12, 2024

grod220 commented Feb 29, 2024

Valentine1898 commented Mar 6, 2024

turbocrime commented Feb 8, 2024 •

edited

Loading