Defining an IPLD format for encrypted data #4

hannahhoward · 2025-02-18T08:56:54Z

What

In the latest code in #3, we are encoding a large encrypted file with two separate uploads to Storacha:

A blob of bytes containing the original file encrypted with a cypher
A second blob that contains JSON metadata, stringified into text. The metadata fields are as follows:
1. encryptedDataCID: the root CID returned by the storacha uploader when we upload the first blob. This CID represents the root CID for the blob of bytes encrypted to UnixFS, an IPLD DAG format for storing arbitrary size files and directory folder structures
2. dataToEncryptHash: the hash of the cypher used to encrypt the main file returned by lit
3. cyphertext: the cypher used to encrypt the main file, re-encrypted with a cypher by lit protocol
4. accessControlConditions: a JSON object of access control conditions for decrypting the cypher with lit protocol

The second blob is also encoded in UnixFS and uploaded to Storacha, which is a pretty inefficient way to represent some JSON metadata, and moreover, we can understand this JSON metadata sematically server side cause from Storacha's understanding it's simply text.

Instead, let's define an IPLD format specifically for this second block. IPLD is a data model for representing JSON like linked data structures and encoding them with arbitrary serialization formats including JSON and CBOR (a more compact binary object representation). SPEC: https://ipld.io/docs/

IPLD also defines a system for representing schemas -- typed structs and other more complex kinds of structured data.

In the IPLD schema DSL, the second blob's data could be writen as follows:

type EncryptedMetadata struct {
   encryptedDataCID Link
   dataToEncryptHash Bytes
   cypherText Bytes
   accessControlConditions [{String: Any}]
}

Fortunately when working with IPLD with JS, you can just use native JSON objects with IPLD tooling (js-multiformats) for serialization. You can also use ucanto to define specific IPLD schemas, to enforce type checking when serializing and deserializing

My recommendation is we encode the metadata object in CBOR, then hash to make a block, then put it into a CAR file, and upload to storacha using uploadCAR. For a good example of doing all this (defining a schema, encoding/decoding to blocks, writing and reading as a car file) I recommend looking at the library we've developed for encoding and decoding sharded dag indexers (https://github.com/storacha/upload-service/tree/main/packages/blob-index) -- this is a much simpler structure but this can be a useful reference.

For this ticket, we can keep the two seperate uploads, as long as the second one uses uploadCar to upload a CAR file with just the single metadata block. A future endeavor can run UnixFS conversion on the first blob manually to encode the entire thing as a single upload. (the cool thing is just by directly encoding the metadata block in proper ipld, we can download from freeway a CAR file containing both the metadata and the encrypted blob by fetching https://{metadataRootCID}.w3s.link/?format=car

The text was updated successfully, but these errors were encountered:

hannahhoward added this to Storacha Project Planning Feb 18, 2025

hannahhoward moved this to Sprint Backlog in Storacha Project Planning Feb 18, 2025

hannahhoward assigned BravoNatalie Feb 18, 2025

BravoNatalie mentioned this issue Mar 2, 2025

feat/define IPLD format for encrypted data #5

Open

BravoNatalie moved this from Sprint Backlog to In Progress in Storacha Project Planning Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defining an IPLD format for encrypted data #4

Defining an IPLD format for encrypted data #4

hannahhoward commented Feb 18, 2025

Defining an IPLD format for encrypted data #4

Defining an IPLD format for encrypted data #4

Comments

hannahhoward commented Feb 18, 2025

What