Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

add blobaccess for Spanner/GCS #194

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

srago
Copy link

@srago srago commented Mar 10, 2024

A blobaccess implementation for Spanner/GCS. Spanner holds metadata and inlined small objects for blobs. Larger blobs are written to GCS instead of being inlined.

Copy link
Member

@EdSchouten EdSchouten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're using Spanner as a flat key-value data store, where the CAS and AC are represented as two completely disjoint tables. As far as I know, Spanner is supposed to be a relational database. Why aren't we using any of its relational capabilities?

@@ -194,6 +194,9 @@ message BlobAccessConfiguration {

// Refer to a BlobAccess object declared through 'with_labels'.
string label = 27;

// Actually a hybrid of spanner and GCS.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we start off with naming something A, while needing to put a label above it that says: "Actually a B", then we should have simply named it B. Please call this backend SpannerGCSBlobAccess.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

}
}

message SpannerBlobAccessConfiguration {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though we currently don't have a lot of meaningful client options for GCP, could you please add pkg/proto/configuration/cloud/gcp/gcp.proto's ClientOptionsConfiguration here as well? Just grep for gcp.NewClientOptionsFromConfiguration for an example on how to use it.

Comment on lines 318 to 329
message StorageType {
enum Value {
// Invalid value.
UNKNOWN = 0;

// Action Cache.
ACTION_CACHE = 1;

// Content-Addressable Store.
CASTORE = 2;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having such an enumeration should not be necessary. The storage type is already implied. Please take a look at BlobAccessCreator.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into that interface. It didn't exist in the older code base we started our project with.

StorageType.Value storage_type = 3;

// Expiration time in days. The default is 7 days.
uint64 expiration_days = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use google.protobuf.Duration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little more general than we need, but I'll look into it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I can imagine that nobody is interested in setting expirations in nanoseconds. But the advantage of using types like these is that it's well typed. Furthermore, it's becomes self-documenting, meaning we don't need to use suffixes like _days.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do realize that the GCS and Spanner expiration times are measured in days, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And the last time I checked, days are a duration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course. But specifying the number of days as a Duration means that users have to convert the days into seconds to configure the expiration time. Then the driver needs to convert them back into days and have some policy for handling invalid values. If the user specifies 3 hours, does the driver fail that or does it round up to 1 day? I've already started working on these changes, but it feels like the wrong approach to me.

Copy link
Member

@EdSchouten EdSchouten Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that our existing config file parser/protojson only accepts the '1s' syntax for seconds is a limitation of those systems specifically. It should not be a reason against using google.protobuf.Duration. For all we know, the user wants to generate a config file using Python, and there they'd use datetime.timedelta(days=14) in combination with google.protobuf.duration_pb2.Duration.FromTimedelta().

The Pkl configuration language actually has a native data type for durations. There you can just write 14.d to get a literal that corresponds to 14 days. Converting that to "1209600s" to appease the protojson parser would happen automatically as part of rendering.

Rounding it up to full days seems like the most sensible choice to me.

Comment on lines 13 to 16
// a time incurs too much overhead in spanner, so we perform bulk operations. The downside of
// this approach is that we can lose reference updates if the servers reboot while updates are
// still queued. Worst case, these objects will be evicted and need to be rebuilt the next time
// they are needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this tradeoff is acceptable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance in unacceptable without this (databases make poor file systems). Potentially rebuilding stuff that is evicted earlier than it might be is an acceptable tradeoff, because it still results in correct operation. However, I'm open to ideas on how we can mitigate this.

Comment on lines 378 to 385
func (ba *spannerBlobAccess) keyToBlobSize(key string) (int64, error) {
s := strings.Split(key, "-")
sz, err := strconv.ParseInt(s[1], 10, 64)
if err != nil {
return -1, err
}
return sz, nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this hold for the Action Cache? In the case of the Action Cache the Action Digest pertains to the size of the Action message. Not the ActionResult.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice catch. It's basically dumb luck that this doesn't cause problems, because action cache entries and action messages are small, so we end up using Spanner. The only place I validate the buffer size is in the CAS.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote the code to use the knowledge of where the blobs live. On the write side, we use the size to determine where the blob should be stored. On the read side, we use the existence of InlineData in the Spanner record to determine where the blob lives. I have an MR out to (optionally) write small blobs to GCS also so that they can be accessed from GCS-FUSE. As soon as I get verification that it works, I can send a PR upstream.

storageClient, err := storage.NewClient(ctx)
if err != nil {
spannerClient.Close()
log.Printf("Can't create GCS client: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either log errors or propagate them, but not both.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That only works if the caller receiving the error logs/prints it. In this case, it's enabling us to debug start-up problems by simply looking at the k8s logs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also install things like middleware for gRPC to do the logging. That way we don't need to plaster our codebase with logging statements.

Comment on lines 531 to 552
func (ba *spannerBlobAccess) GetCapabilities(ctx context.Context, instanceName digest.InstanceName) (*remoteexecution.ServerCapabilities, error) {
if ba.storageType == pb.StorageType_ACTION_CACHE {
return &remoteexecution.ServerCapabilities{
CacheCapabilities: &remoteexecution.CacheCapabilities{
ActionCacheUpdateCapabilities: &remoteexecution.ActionCacheUpdateCapabilities{
UpdateEnabled: true,
},
SymlinkAbsolutePathStrategy: remoteexecution.SymlinkAbsolutePathStrategy_ALLOWED,
},
}, nil
} else if ba.storageType == pb.StorageType_CASTORE {
return &remoteexecution.ServerCapabilities{
CacheCapabilities: &remoteexecution.CacheCapabilities{
DigestFunctions: digest.SupportedDigestFunctions,
},
}, nil
} else {
// Code in pkg/blobaccess/configuration/new_blob_access.go should prevent this, but
// since that lives in a separate place, let's be sure.
panic("Invalid Spanner storage Type configured")
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this can't leverage capabilities.Provider?

Copy link
Author

@srago srago Mar 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting weird results in the capabilities result message, so I rolled my own. I'll try to get you an example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The anomaly occurs regardless of whether I use capabilities.Provider or roll my own code:

Capabilities for gs.com received:
cache_capabilities:{digest_functions:MD5 digest_functions:SHA1 digest_functions:SHA256 digest_functions:8 digest_functions:SHA384 digest_functions:SHA512 action_cache_update_capabilities:{} symlink_absolute_path_strategy:ALLOWED} deprecated_api_version:{major:2} low_api_version:{major:2} high_api_version:{major:2 minor:3}

Capabilities for gs.com/unreviewed received:
cache_capabilities:{digest_functions:MD5 digest_functions:SHA1 digest_functions:SHA256 digest_functions:8 digest_functions:SHA384 digest_functions:SHA512 action_cache_update_capabilities:{update_enabled:true} symlink_absolute_path_strategy:ALLOWED} execution_capabilities:{digest_function:SHA256 exec_enabled:true execution_priority_capabilities:{priorities:{min_priority:-2147483648 max_priority:2147483647}} 5:"\x03\x02\x01\x08\x05\x06"} deprecated_api_version:{major:2} low_api_version:{major:2} high_api_version:{major:2 minor:3}

Notice the 5:"\x03\x02\x01\x08\x05\x06" that occurs in the execution_capabilities in the second case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever tool you used to dump these message, make sure it uses an up-to-date version of remote_execution.proto. Once you do, you'll notice that field 5 in ExecutionCapabilities is actually digest_functions:

bazelbuild/remote-apis@64cc5e9

And what Buildbarn does here is expected. Namely, it reports the list of digest functions that are supported by the scheduler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, rebuilding the tool prints the hash functions. I hadn't realized the remote APIs has changed that much.

Comment on lines 698 to 709
statedSize, err := ba.keyToBlobSize(key)
if err != nil {
spannerMalformedKeyCount.Inc()
log.Printf("Put Blob: invalid digest size, key %s: %v", key, err)
b.Discard()
return err
}
if statedSize != size {
log.Printf("Put Blob: MISMATCH between buffer size (%d) and digest size (%d)", size, statedSize)
b.Discard()
return fmt.Errorf("Invalid contents digest %s size %d", digest, size)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need code like this? pkg/blobstore/buffer already has logic for validating data integrity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

}

start := time.Now()
_, err = ba.spannerClient.Apply(context.Background(), []*spanner.Mutation{insertMut})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use context.Background() here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should be ctx. I use context.Background when I want to perform an action, but have it unaffected if the context gets cancelled.

@srago
Copy link
Author

srago commented Mar 10, 2024

It looks like you're using Spanner as a flat key-value data store, where the CAS and AC are represented as two completely disjoint tables. As far as I know, Spanner is supposed to be a relational database. Why aren't we using any of its relational capabilities?

We have one table that is shared between the AC and CAS. The driver will behave differently when accessing the table for AC-based operations versus CAS-based operations.

@EdSchouten
Copy link
Member

EdSchouten commented Mar 10, 2024

We have one table that is shared between the AC and CAS. The driver will behave differently when accessing the table for AC-based operations versus CAS-based operations.

Sure, I get that. But what I'm saying is that the references between ActionResult messages and objects stored in the CAS could also be expressed as a relation (foreign key) between two tables in a database. If we did that, then we could most likely eliminate CompletenessCheckingBlobAccess for these kinds of setups. AC entries could be dropped automatically when associated CAS objects expire.

@srago
Copy link
Author

srago commented Mar 17, 2024

We have one table that is shared between the AC and CAS. The driver will behave differently when accessing the table for AC-based operations versus CAS-based operations.

Sure, I get that. But what I'm saying is that the references between ActionResult messages and objects stored in the CAS could also be expressed as a relation (foreign key) between two tables in a database. If we did that, then we could most likely eliminate CompletenessCheckingBlobAccess for these kinds of setups. AC entries could be dropped automatically when associated CAS objects expire.

That isn't immediately obvious to me. We should discuss...

@srago srago requested a review from EdSchouten March 20, 2024 01:31
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants