-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Decide on DB schema + implement #25
Comments
If we focus the schema change to just include the properties we need to implement the capabilities, and updating names to be consistent with our new terminology then we end up with:
|
issuer | resource | link | size | origin? | ucanCID | insertedAt |
---|---|---|---|---|---|---|
did:key:space |
did:key:space |
bagy...1 |
101 |
bagy...0 |
baf...ucan |
2022-11-23T... |
string | string | string | number | string | string | string (ISO 8601) |
agent DID | space DID | stored CAR CID | CAR size bytes | prev shard CAR CID | invocation CID | wen db record |
domain mapped flavour - name columns after how we expect them to be commonly used.
agentDID | spaceDID | carCID | size | origin? | ucanCID | insertedAt |
---|---|---|---|---|---|---|
did:key:space |
did:key:space |
bagy...1 |
101 |
bagy...0 |
baf...ucan |
2022-11-23T... |
string | string | string | number | string | string | string (ISO 8601) |
agent DID | space DID | stored CAR CID | CAR size bytes | prev shard CAR CID | invocation CID | wen db record |
notes
ucanCID
is the CID for the complete, validated UCAN invocation that was sent to us and caused this row to be inserted. The current proposal is to store the bytes of that UCAN in s3 keyed by it's (raw
) CID.
open questions
- which flavour do you like
- store size as number or string
- what other queries outside of those needed to implement the capabilities do we want to support... see @gobengo comment...
Did some looking around on how we might try to aggregate the entries of upload-api 'store' entries into a per-customer sum of total amount stored that we can use to calculate billing. Especially given that upload-api currently uses DynamoDB and that DynamoDB querying doesn't inherently support aggregation queries like sum , it's not as easy as if we used a different datastore.
– https://filecoinproject.slack.com/archives/C02BZPRS9HP/p1669074957770049
References
|
|
wrt the thing I mentioned about billing. From what I grok about the way the store protocol works, I think the aggregate we'll probably need is
FWIW here is the 'usage reports' api stripe has, which we'll want to to update if we use stripe as the usage-based billing provider. |
The proposed dynamodb schema: /** @type import('@serverless-stack/resources').TableProps */
export const storeTableProps = {
fields: {
space: 'string', // `did:key:space`
car: 'string', // `bagy...1`
size: 'number', // `101`
origin: 'string', // `bagy...0` (prev CAR CID. optional)
agent: 'string', // `did:key:agent` (issuer of ucan)
ucan: 'string', // `baf...ucan` (CID of invcation UCAN)
insertedAt: 'string', // `2022-12-24T...`
},
// space + car must be unique to satisfy index constraint
primaryIndex: { partitionKey: 'space', sortKey: 'car' },
}
/** @type import('@serverless-stack/resources').TableProps */
export const uploadTableProps = {
fields: {
space: 'string', // `did:key:space`
sk: 'string', // `root#shard` + space must be unique for dynamo index constraint
root: 'string', // `baf...x`
shard: 'string', // `bagy...1
agent: 'string', // `did:key:agent` (issuer of ucan)
ucan: 'string', // `baf...ucan` (CID of invcation UCAN)
insertedAt: 'string', // `2022-12-24T...`
},
// space + sk must be unique to satisfy index constraint
primaryIndex: { partitionKey: 'space', sortKey: 'sk' },
}
References
|
Few notes:
|
Yep! I'm gonna leave that out from this iteration, and add it in once we tackle
I agree. I'm happy to call it
I was assuming some need to show "which identity actually invoked the thing" but we could drop it for now.
I can deal with calling it
I think folks are struggling with However, I'm generally in favour of having the db labels match the labels in the invocation wherever sensible. I'll rename it
not currently but i think we should track that somewhere. Do you have a use-case in mind? My assumption we'd want this for at least tracking metrics, and debugging. Would an aggregated count here be sufficient or would we want dump them in a bucket, or track them in a full on db table?
Can do! Is the use-case here analytics and debugging? or something more?
ACK. Yes, tracking "when does the ObjectCreated event occur in s3" is on my radar. Tracking the expire time for the URL is a good idea! |
That gives us an updated dynamodb schema like so: /** @type TableProps */
export const storeTableProps = {
fields: {
space: 'string', // `did:key:space`
link: 'string', // `bagy...1`
size: 'number', // `101`
origin: 'string', // `bagy...0` (prev CAR CID. optional)
issuer: 'string', // `did:key:agent` (issuer of ucan)
invocation: 'string', // `baf...ucan` (CID of invcation UCAN)
insertedAt: 'string', // `2022-12-24T...`
},
// space + car must be unique to satisfy index constraint
primaryIndex: { partitionKey: 'space', sortKey: 'link' },
}
/** @type TableProps */
export const uploadTableProps = {
fields: {
space: 'string', // `did:key:space`
sk: 'string', // `root#shard` + space must be unique for dynamo index constraint
root: 'string', // `baf...x`
shard: 'string', // `bagy...1
issuer: 'string', // `did:key:agent` (issuer of ucan)
invocation: 'string', // `baf...ucan` (CID of invcation UCAN)
insertedAt: 'string', // `2022-12-24T...`
},
// space + sk must be unique to satisfy index constraint
primaryIndex: { partitionKey: 'space', sortKey: 'sk' },
} |
I agree we also have an issue on file suggesting that small cars should be inlined instead, in which case
Sounds good! Just to be clear though, it will affect primary index & we bill who hired a provider.
I think alternatively it could be called In regards to other info I brought up. I don’t have specific use cases, it’s just all these invocations have certain state that we don’t seem to capture. It would be difficult to tell what the state of the individual invocation is, or how did invocation progressed through the system. we also wanted to include some info in the receipt so introspection seems important |
Update the dynamoDB table schema to the following: ```js /** @type TableProps */ export const storeTableProps = { fields: { space: 'string', // `did:key:space` link: 'string', // `bagy...1` size: 'number', // `101` origin: 'string', // `bagy...0` (prev CAR CID. optional) issuer: 'string', // `did:key:agent` (issuer of ucan) invocation: 'string', // `baf...ucan` (CID of invcation UCAN) insertedAt: 'string', // `2022-12-24T...` }, // space + link must be unique to satisfy index constraint primaryIndex: { partitionKey: 'space', sortKey: 'link' }, } /** @type TableProps */ export const uploadTableProps = { fields: { space: 'string', // `did:key:space` sk: 'string', // `root#shard` + space must be unique for dynamo index constraint root: 'string', // `baf...x` shard: 'string', // `bagy...1 issuer: 'string', // `did:key:agent` (issuer of ucan) invocation: 'string', // `baf...ucan` (CID of invcation UCAN) insertedAt: 'string', // `2022-12-24T...` }, // space + sk must be unique to satisfy index constraint primaryIndex: { partitionKey: 'space', sortKey: 'sk' }, } ``` - `uploaderDID` becomes `space` - `payloadCID` becomes `link` to match the capability property - `dataCID` becomes `root` to match the terminology in the upload-client. - `carCID` becomes `shard` to match the terminology in the upload-client. - adds `issuer` - the DID of the agent that issued the invocation - adds `invocation` - the CID for the invocation ucan - use `insertedAt` as all we are really capturing here is the time at which we inserted the doc into the db. - align property names with the `upload-client`, our capability properties and the `access-api` data model - ordered such the partitionKey and sortKey come first (not that it matters) and our accounting metadata last - Fixed up types 🎉 ...we can add more fields. Its a document db. The only fields we have to specify here are ones referenced in an index, but more fields are given here to give the flavour. ### References - https://www.dynamodbguide.com/key-concepts - w3 ucan caps https://github.com/web3-storage/w3protocol/blob/aef55e718eb8756ed95beb237780ef7125b3aace/packages/access/src/capabilities/upload.js#L37 - w3-access api db schema https://github.com/web3-storage/w3protocol/blob/cc0f4f44c06a5884c6af50cf739dad6164d8e468/packages/access-api/migrations/0000_create_spaces_table.sql - upload-client naming https://github.com/web3-storage/w3protocol/blob/aef55e718eb8756ed95beb237780ef7125b3aace/packages/upload-client/README.md#uploadadd - dynamodb data types https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.DataTypes - current w3up schema https://github.com/web3-storage/w3up/wiki/(current)-Ingestion-Database-Schema - previous proposed schema update https://github.com/web3-storage/w3up/wiki/Desired-Ingestion-Schema fixes #25 License: MIT Signed-off-by: Oli Evans <oli@protocol.ai>
Currently, uploaderDID tracks the resource that an upload is invoked
with
. When we are using spaces, thewith
is the space DID. so UploaderDID is the space DID which feels wrong.see: https://github.com/web3-storage/upload-api/blob/75f0b5327d3dc2cface553bf60f9e9ebc17598c9/api/test/service/store.test.js#L85-L86
The text was updated successfully, but these errors were encountered: