-
Notifications
You must be signed in to change notification settings - Fork 3.7k
GH-31603: [C++][Python] Wrap encryption keys in secure string #46017
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
|
@EnricoMi Do you want a preliminary review on this? |
For the record, OpenSSL allows allocating memory from a secure heap, but that requires calling process-wide initialization functions. I've asked about this. |
Yes, please! This is in a complete PoC state (Linux build and test is green). Is this the right direction before I roll this out to Python? |
I agree, that |
Yes!
I would rather deprecate the existing methods.
You probably want to add a
Well, currently the OpenSSL function is very crude :)
What I'm worried about is different libraries in the same process all trying to use the |
{ | ||
std::string tiny("abc"); | ||
SecureString::secure_clear(tiny); | ||
assert_securely_cleared(tiny); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more stringent testing, I think we also want to examine the string area before it is cleared (or moved, etc.).
Something like (untested):
std::string_view StringArea(const std::string& string) {
return std::string_view(string.data(), string.capacity());
}
void AssertSecurelyCleared(std::string_view area) {
// the entire area is filled with zeros
std::string zeros(area.size(), '\0');
ASSERT_EQ(area, std::string_view(zeros));
}
void AssertSecurelyCleared(const std::string& string) {
AssertSecurelyCleared(StringArea(string));
}
TEST(TestSecureString, SecureClearCheck) {
// short string
{
std::string tiny("abc");
auto old_area = StringArea(tiny);
SecureString::SecureClear(tiny);
AssertSecurelyCleared(tiny);
AssertSecurelyCleared(old_area);
}
// etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But tiny.data()
and old_area.data()
point to the same address, what is the point testing old_area
? Just in case pointers change? If they would change, then old_area
would point to a freed memory area, right?
ASSERT_FALSE(secret_from_secret.empty()); | ||
ASSERT_EQ(secret_from_secret, secret_from_move_secret); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably also want a test that a SecureString was cleared on destruction ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would I do that, the memory where SecureString.as_span
points to will be freed on deconstruction. How can I access that memory securely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standardese "true" way would be to template SecureString on an allocator so that you could intercept destruction of the bytes it stores. That's entirely too much here; we could just add a debug assertion inside ~SecureString (maybe only enabled by an env var or #if defined(ARROW_ASSERT_SECURE_STRING_ZEROED)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, alternatively: since every STL I'm aware of implements std::string with an inline short string case, we could use:
// Most std::string impls store their contents inline, so we can check
// the object representation for a crib to ensure that destruction does
// wipe all sensitive bytes.
constexpr auto kCrib = "01CRIB01";
alignas(SecureString) char object_repr[sizeof(SecureString)];
std::string_view object_repr_view{object_repr, sizeof(object_repr)};
SecureString* cribbed = new (&object_repr) SecureString(std::string(kCrib));
bool crib_found_before = object_repr_view.find(kCrib) != object_repr_view.npos;
cribbed->~SecureString(); // NOTE: do not allow this call to be skipped!
bool crib_found_after = object_repr_view.find(kCrib) != object_repr_view.npos;
if (crib_found_before) {
ASSERT_FALSE(crib_found_after);
}
I think that's well-defined: https://godbolt.org/z/31cY59EM3
std::string aad_prefix_; | ||
std::shared_ptr<AADPrefixVerifier> aad_prefix_verifier_; | ||
|
||
const std::string empty_string_ = ""; | ||
const encryption::SecureString empty_string_ = encryption::SecureString(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is empty_string_
useful at all? It looks rather dubious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, method FileDecryptionProperties::column_key
returns an empty string as the encryption key when the key id is not found. It should rather throw a meaningful key not found
exception. I may fix that while I am touching this code anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a holdover from the first draft of encryption; IIRC column_key() returned references to strings at that time (so return "";
would fail to compile)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still needed, improved name.
9848539
to
b303afa
Compare
std::string aad_prefix_; | ||
std::shared_ptr<AADPrefixVerifier> aad_prefix_verifier_; | ||
|
||
const std::string empty_string_ = ""; | ||
const encryption::SecureString empty_string_ = encryption::SecureString(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a holdover from the first draft of encryption; IIRC column_key() returned references to strings at that time (so return "";
would fail to compile)
@@ -46,7 +46,7 @@ int32_t Decryptor::CiphertextLength(int32_t plaintext_len) const { | |||
|
|||
int32_t Decryptor::Decrypt(::arrow::util::span<const uint8_t> ciphertext, | |||
::arrow::util::span<uint8_t> plaintext) { | |||
return aes_decryptor_->Decrypt(ciphertext, str2span(key_), str2span(aad_), plaintext); | |||
return aes_decryptor_->Decrypt(ciphertext, key_.as_span(), str2span(aad_), plaintext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since str2span
is no longer used, we could delete it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still being used to convert aad
and encrypted_key
strings to spans.
std::string kFooterEncryptionKey_ = std::string(kFooterEncryptionKey); | ||
std::string kColumnEncryptionKey1_ = std::string(kColumnEncryptionKey1); | ||
std::string kColumnEncryptionKey2_ = std::string(kColumnEncryptionKey2); | ||
SecureString kFooterEncryptionKey_ = kFooterEncryptionKey; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
seems warranted here
SecureString kFooterEncryptionKey_ = kFooterEncryptionKey; | |
const SecureString kFooterEncryptionKey_ = kFooterEncryptionKey; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that hold for all k...
members in this class (TestDecryptionConfiguration
) and TestEncryptionConfiguration
as well? Is it worth it given this are tests?
// by the compiler. | ||
static const volatile auto memset_v = &memset; | ||
memset_v(data, 0, size); | ||
__asm__ __volatile__("" ::"r"(data) : "memory"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems questionable to me that the ultimate fallback should not be a simple loop
__asm__ __volatile__("" ::"r"(data) : "memory"); | |
for (size_t i = 0; i < size; ++i) { | |
data[i] = 0; | |
} |
I don't know how cross platform this __asm__
block is.
Also, since this code is borrowed from another project it needs to be mentioned in LICENSE.txt, for example
Lines 1541 to 1543 in 7df396e
A function gettimeofday in utilities.cc is based on | |
http://www.google.com/codesearch/p?hl=en#dR3YEbitojA/COPYING&q=GetSystemTimeAsFileTime%20license:bsd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about
volatile char *p = data;
while (size--) *p++ = 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given https://github.com/BLAKE2/libb2/blob/master/src/blake2-impl.h is CC0 Public Domain Dedication, and our code deviates a lot, you could say it is as much inspired by that implementation than it is by all those variations available on stackoverflow.
I don't think this qualifies as a copy or derived work.
SecureString() noexcept = default; | ||
SecureString(SecureString&&) noexcept; | ||
SecureString(const SecureString&) noexcept = default; | ||
explicit SecureString(std::string&&) noexcept; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicit SecureString(std::string&&) noexcept; | |
explicit SecureString(std::string&&) noexcept; | |
explicit SecureString(size_t size) : SecureString{std::string(size, '\0')} {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added SecureString(size_t, char) to mimic basic_string<char>
constructors.
ASSERT_FALSE(secret_from_secret.empty()); | ||
ASSERT_EQ(secret_from_secret, secret_from_move_secret); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standardese "true" way would be to template SecureString on an allocator so that you could intercept destruction of the bytes it stores. That's entirely too much here; we could just add a debug assertion inside ~SecureString (maybe only enabled by an env var or #if defined(ARROW_ASSERT_SECURE_STRING_ZEROED)
)
#include "arrow/util/span.h" | ||
#include "parquet/platform.h" | ||
|
||
namespace parquet::encryption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please move this to arrow/util/
? I think it will be useful outside parquet
if (key.empty()) return this; | ||
|
||
DCHECK(!key.empty()); | ||
key_ = key; | ||
DCHECK(key_.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looked suspicious, all other setter methods check if the member is unset.
Here, the check is always true as it checks the input.
This is a breaking change if user code calls this setter twice with non-empty keys.
std::string column_path, | ||
encryption::SecureString key, | ||
std::string key_metadata) | ||
: column_path_(std::move(column_path)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This moves member initialization from the body, which is the more standard way. However, the body could accidentally check input arguments, which after std::move
are "empty".
While this is correct:
encrypted_with_footer_key_(encrypted && key.empty()),
key_(std::move(key)),
if members are shuffled around, then this becomes incorrect:
key_(std::move(key)),
encrypted_with_footer_key_(encrypted && key.empty()),
as encrypted_with_footer_key_
is initialized on the "empty" key
.
Do you consider this pattern too risky for future bugs? Happy to revert this initialization.
if (column_decryption_properties_.find(column_path) != | ||
column_decryption_properties_.end()) { | ||
auto column_prop = column_decryption_properties_.at(column_path); | ||
if (column_prop != nullptr) { | ||
return column_prop->key(); | ||
} | ||
} | ||
return empty_string_; | ||
return no_key_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are still returning a reference to a SecureString
. I tried to return some std::optional
, to replace empty SecureString
notation for "no key given" with std::nullopt
, but this opened a can of worms. Not going down this rabbit whole in this PR.
At least the naming is improved.
|
2 similar comments
|
|
Rationale for this change
Arrow deals with secrets like encryption / decryption keys which must be kept private. One way of leaking such secrets is through memory allocation where another process allocates memory that previously hold the secret, because that memory was not cleared before being freed.
What changes are included in this PR?
Uses various implementations of securely clearing memory, notably
SecureZeroMemory
(Windows)memset_s
(STDC)OPENSSL_cleanse
(OpenSSL >= 3)explicit_bzero
(glibc 2.25+)memset
(fallback).Are these changes tested?
Unit tests.
Are there any user-facing changes?
APIs that hand over secrets to Arrow require the secret to be encapsulated in a
SecureString
.This PR includes breaking changes to public APIs.
TODO:
SecureString
available to Python APIarrow/util
Supersedes EXP: SecureZero helper to securely clear memory #12890.