Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Avoid streaming entire IDB tables #486

Closed
turbocrime opened this issue Feb 8, 2024 · 5 comments
Closed

Avoid streaming entire IDB tables #486

turbocrime opened this issue Feb 8, 2024 · 5 comments
Assignees
Labels
refactor Improving existing system with new design

Comments

@turbocrime
Copy link
Collaborator

turbocrime commented Feb 8, 2024

After #312 and #481 we are now iterating over streamed queries, instead of loading the entire query into memory

This is an improvement but could still get better. Inspecting and filtering items in the table should, as much as possible, be done by the database, not the caller. IDB can handle more specific queries, if we prepare index by an object's attribute.

For example, in transaction-info.ts,

for await (const txInfo of indexedDb.iterateTransactionInfo()) {
// filter transactions between startHeight and endHeight, inclusive
if (txInfo.height < req.startHeight || (req.endHeight && txInfo.height > req.endHeight))
continue;
yield { txInfo };
}

Which is backed by our idb query helper iterateTransactionInfo,

async *iterateTransactionInfo() {
for await (const { value } of this.db.transaction('TRANSACTION_INFO').store) {
yield TransactionInfo.fromJson(value);
}
}
async saveTransactionInfo(tx: TransactionInfo): Promise<void> {
await this.u.update({
table: 'TRANSACTION_INFO',
value: tx.toJson() as Jsonified<TransactionInfo>,
});
}

We could instead query a key range, if we prepared the schema to index transactions by height, the way we index swaps, notes, &c by index

db.createObjectStore('TRANSACTION_INFO', { keyPath: 'id.inner' });

would become

        db.createObjectStore('TRANSACTION_INFO', { keyPath: 'id.inner' }).createIndex('height', 'height');

with schema

  TRANSACTION_INFO: {
    key: Jsonified<Required<TransactionInfo>['id']['inner']>; // base64
    value: Jsonified<TransactionInfo>;
    indexes: {
      height: Jsonified<Required<TransactionInfo>['height']>;
    };
  };

and expose the index appropriately through our queriers

  async *iterateTransactionInfo(lower: bigint, upper: bigint) {
    const dbTx = this.db.transaction('TRANSACTION_INFO');
    const dbIdx = dbTx.store.index('height');
    for await (const { value } of dbIdx.iterate(IDBKeyRange.bound(lower, upper)))
      yield TransactionInfo.fromJson(value);
  }

which would greatly simplify transaction-info.ts

import type { Impl } from '.';
import { servicesCtx } from '../../ctx';

export const transactionInfo: Impl['transactionInfo'] = async function* (req, ctx) {
  const services = ctx.values.get(servicesCtx);
  const { indexedDb } = await services.getWalletServices();

  yield* indexedDb.iterateTransactionInfo(req.startHeight, req.endHeight);
};

this technique could be applied generally through most of our full-table idb methods.

some situations involving compound indicies can be handled, but idb does not have good support for complex compound indicies. things like assets.ts with multiple fields

const patterns: {
includeReq: boolean;
pattern: RegExp;
}[] = [
{
includeReq: includeLpNfts,
pattern: assetPatterns.lpNftPattern,
},
{
includeReq: includeDelegationTokens,
pattern: assetPatterns.delegationTokenPattern,
},
{
includeReq: includeProposalNfts,
pattern: assetPatterns.proposalNftPattern,
},
{
includeReq: includeUnbondingTokens,
pattern: assetPatterns.unbondingTokenPattern,
},
{
includeReq: includeVotingReceiptTokens,
pattern: assetPatterns.votingReceiptPattern,
},
...includeSpecificDenominations.map(d => ({
includeReq: true,
pattern: new RegExp(`^${d.denom}$`),
})),
].filter(i => i.includeReq);
for await (const metadata of indexedDb.iterateAssetsMetadata()) {
if (filtered && !patterns.find(p => p.pattern.test(metadata.display))) continue;
yield { denomMetadata: metadata };
}

would not be easy to provide indexes for. you can see more detail on this subject

@github-project-automation github-project-automation bot moved this to 🗄️ Backlog in Penumbra web Feb 8, 2024
@turbocrime turbocrime added the refactor Improving existing system with new design label Feb 8, 2024
@grod220
Copy link
Contributor

grod220 commented Feb 9, 2024

Yep, this makes sense. @Valentine1898 perhaps you can pick this up when you're next available? Thinking you may be a good candidate for this as it may impact the testing tracking issue you are working on? #333

@Valentine1898
Copy link
Contributor

Yeah, sure, I'll start that today

@Valentine1898 Valentine1898 self-assigned this Feb 9, 2024
@Valentine1898 Valentine1898 moved this from 🗄️ Backlog to 📝 Todo in Penumbra web Feb 9, 2024
@Valentine1898 Valentine1898 moved this from 📝 Todo to 🏗 In progress in Penumbra web Feb 9, 2024
@Valentine1898
Copy link
Contributor

We have only a few streaming queries at the moment

  1. iterateSpendableNotes
    Allows filtering by several fields: assetId, addressIndex, includeSpent, amountToSpend
    It is impossible to add an index for amountToSpend, because it is analogous to the SQL function SUM BY.
    And for assetId, addressIndex, includeSpent we need a compound index, which is poorly supported in indexed-db. Also, we should take into account that all filtering fields may not be defined, rpc specification allows it.

  2. iterateTransactionInfo
    In iterateTransactionInfo there are only 2 filtering parameters startHeight and endHeight, and both parameters are about one field in indexed-db. But, both parameters are optional
    This means that we can have 4 cases

  • startHeight and endHeight are not defined and we should not iterate over the index.
  • Only startHeight is defined and we should only do IDBKeyRange.bound(lower)
  • only endHeight is defined and we should only do IDBKeyRange.bound(upper)
  • both defined and IDBKeyRange.bound(lower, upper)
  1. iterateAssetsMetadata
    Difficult filtering conditions with patern search, not sure if index implementation is possible

  2. iterateSwaps
    No filtering

Based on this, only iterateTransactionInfo is a potential candidate for adding an index, and using iteration on the index. But taking into account that filtering parameters are optional, it seems that we will get code complication and testing, instead of minor query optimization.
My suggestion is to leave everything as it is.

@Valentine1898 Valentine1898 moved this from 🏗 In progress to 🗄️ Backlog in Penumbra web Feb 15, 2024
@grod220
Copy link
Contributor

grod220 commented Feb 29, 2024

@Valentine1898 think we can close this issue?

@grod220 grod220 closed this as completed Mar 6, 2024
@github-project-automation github-project-automation bot moved this from 🗄️ Backlog to ✅ Done in Penumbra web Mar 6, 2024
@Valentine1898
Copy link
Contributor

@Valentine1898 think we can close this issue?

Missed the comment, I think yes

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
refactor Improving existing system with new design
Projects
Archived in project
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants