Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Defining an IPLD format for encrypted data #4

Open
hannahhoward opened this issue Feb 18, 2025 · 0 comments
Open

Defining an IPLD format for encrypted data #4

hannahhoward opened this issue Feb 18, 2025 · 0 comments
Assignees

Comments

@hannahhoward
Copy link
Member

What

In the latest code in #3, we are encoding a large encrypted file with two separate uploads to Storacha:

  1. A blob of bytes containing the original file encrypted with a cypher
  2. A second blob that contains JSON metadata, stringified into text. The metadata fields are as follows:
    1. encryptedDataCID: the root CID returned by the storacha uploader when we upload the first blob. This CID represents the root CID for the blob of bytes encrypted to UnixFS, an IPLD DAG format for storing arbitrary size files and directory folder structures
    2. dataToEncryptHash: the hash of the cypher used to encrypt the main file returned by lit
    3. cyphertext: the cypher used to encrypt the main file, re-encrypted with a cypher by lit protocol
    4. accessControlConditions: a JSON object of access control conditions for decrypting the cypher with lit protocol

The second blob is also encoded in UnixFS and uploaded to Storacha, which is a pretty inefficient way to represent some JSON metadata, and moreover, we can understand this JSON metadata sematically server side cause from Storacha's understanding it's simply text.

Instead, let's define an IPLD format specifically for this second block. IPLD is a data model for representing JSON like linked data structures and encoding them with arbitrary serialization formats including JSON and CBOR (a more compact binary object representation). SPEC: https://ipld.io/docs/

IPLD also defines a system for representing schemas -- typed structs and other more complex kinds of structured data.

In the IPLD schema DSL, the second blob's data could be writen as follows:

type EncryptedMetadata struct {
   encryptedDataCID Link
   dataToEncryptHash Bytes
   cypherText Bytes
   accessControlConditions [{String: Any}]
}

Fortunately when working with IPLD with JS, you can just use native JSON objects with IPLD tooling (js-multiformats) for serialization. You can also use ucanto to define specific IPLD schemas, to enforce type checking when serializing and deserializing

My recommendation is we encode the metadata object in CBOR, then hash to make a block, then put it into a CAR file, and upload to storacha using uploadCAR. For a good example of doing all this (defining a schema, encoding/decoding to blocks, writing and reading as a car file) I recommend looking at the library we've developed for encoding and decoding sharded dag indexers (https://github.com/storacha/upload-service/tree/main/packages/blob-index) -- this is a much simpler structure but this can be a useful reference.

For this ticket, we can keep the two seperate uploads, as long as the second one uses uploadCar to upload a CAR file with just the single metadata block. A future endeavor can run UnixFS conversion on the first blob manually to encode the entire thing as a single upload. (the cool thing is just by directly encoding the metadata block in proper ipld, we can download from freeway a CAR file containing both the metadata and the encrypted blob by fetching https://{metadataRootCID}.w3s.link/?format=car

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants