feat(encryption) [5/N] Support encryption: Encryption Manager#2383
feat(encryption) [5/N] Support encryption: Encryption Manager#2383xanderbailey wants to merge 10 commits intoapache:mainfrom
Conversation
b7fe819 to
74f9e2f
Compare
| default = moka::future::Cache::builder().time_to_live(DEFAULT_CACHE_TTL).build(), | ||
| setter(skip) | ||
| )] | ||
| kek_cache: moka::future::Cache<String, SensitiveBytes>, |
There was a problem hiding this comment.
Put a cache here for the same reason Java does, KMS is not aware of the key_id and since we populate the cache when we create_kek it would be hard to do this purely within a CachingKmsService
|
Thanks @xanderbailey! Here's my summary of reviewing with Claude:
|
|
We don't have |
Do we think we actually need a comment here? The code seems self documenting here? |
I think UUID is fine here maybe? both are 16 bytes and random |
Agreed. |
Agreed. |
Yeah, seems fine. |
blackmwk
left a comment
There was a problem hiding this comment.
Thanks @xanderbailey for this pr!
| default = moka::future::Cache::builder().time_to_live(DEFAULT_CACHE_TTL).build(), | ||
| setter(skip) | ||
| )] | ||
| kek_cache: moka::future::Cache<String, SensitiveBytes>, |
There was a problem hiding this comment.
Please don't use fully qualified name here, use them in imports as much as possible.
There was a problem hiding this comment.
For simplicity should we just rename this module to io?
There was a problem hiding this comment.
Should we just rename this file to manager.rs?
| table_key_id: String, | ||
| /// All encryption keys from table metadata (KEKs and wrapped key metadata entries). | ||
| #[builder(default)] | ||
| encryption_keys: HashMap<String, EncryptedKey>, |
There was a problem hiding this comment.
nit: For collections like this, we could use mutations in TypedBuilder so that user facing api is like add_encryption_key(EncryptedKey) .
| /// Transparently decrypts on read. | ||
| pub struct EncryptedInputFile { | ||
| inner: InputFile, | ||
| decryptor: Arc<AesGcmFileDecryptor>, |
There was a problem hiding this comment.
I think we should just store StandardKeyMetadata here?
There was a problem hiding this comment.
And just use that to construct the the FileRead when we need it?
There was a problem hiding this comment.
The AesGcmFileDecryptor is the thing we actually need at read time so is it a good idea to just construct it once?
| } | ||
|
|
||
| /// Returns the key metadata bytes (for storage in manifest/data files). | ||
| pub fn key_metadata(&self) -> &[u8] { |
There was a problem hiding this comment.
We should return StandardKeyMetadata
| /// Returns a [`StandardKeyMetadata`] containing a fresh DEK and AAD prefix. | ||
| /// The caller should pass this to the Parquet writer to configure | ||
| /// `FileEncryptionProperties`, and serialize it for storage in the manifest. | ||
| pub fn generate_native_key_metadata(&self) -> Result<StandardKeyMetadata> { |
There was a problem hiding this comment.
Let's make it mod private first/
|
|
||
| /// Decrypt an encrypted input file, returning an [`EncryptedInputFile`] | ||
| /// that transparently decrypts on read. | ||
| pub fn decrypt(&self, input: InputFile, key_metadata: &[u8]) -> Result<EncryptedInputFile> { |
There was a problem hiding this comment.
| pub fn decrypt(&self, input: InputFile, key_metadata: &[u8]) -> Result<EncryptedInputFile> { | |
| pub fn decrypt(&self, input: EncryptedInputeFile) -> Result<EncryptedInputFile> { |
There was a problem hiding this comment.
This seems cyclical to me no?
| /// write) or the existing KEK expired (rotation). When `Some`, the | ||
| /// caller must also persist this KEK in table metadata so that future | ||
| /// `unwrap_key_metadata` calls can find it. | ||
| pub async fn wrap_key_metadata( |
There was a problem hiding this comment.
It's deprecated in java, remove it.
There was a problem hiding this comment.
I think wrapKey and unwrapKey are deprecated. wrap_key_metadata maps to EncryptionUtil.encryptManifestListKeyMetadata() and unwrap_key_metadata maps to EncryptionUtil.decryptManifestListKeyMetadata() but I've tried to avoid the EncryptionUtil grab-bag in my implementation.
| /// Given an `EncryptedKey` entry (from a manifest list or snapshot) and | ||
| /// the full map of encryption keys from `TableMetadata`, returns the | ||
| /// unwrapped key metadata bytes (e.g. serialized `StandardKeyMetadata`). | ||
| pub async fn unwrap_key_metadata( |
5311801 to
32fdd3a
Compare
Which issue does this PR close?
Part of #2034
What changes are included in this PR?
Stacked on #2340. Adds
EncryptionManager— handles two-layer envelope encryption (master key → KEK → DEK) for Iceberg tables.New files
encryption_manager.rs—EncryptionManagerwithtyped_builderconstruction:encrypt()/decrypt()— AGS1 stream file encryptiongenerate_native_key_metadata()— generatesStandardKeyMetadata(DEK + AAD prefix) for Parquet Modular Encryption (PME). Callers pass this directly to the Parquet writer'sFileEncryptionPropertiesand serialize it for manifest storage.wrap_key_metadata()/unwrap_key_metadata()— KEK envelope wrapping for manifest list key metadataencrypted_io.rs—EncryptedInputFile,EncryptedOutputFilewrappers for transparent AGS1 stream encrypt/decrypt on read/writeDesign decisions
NativeEncryptedInputFile/NativeEncryptedOutputFilewrappers — For PME,StandardKeyMetadataalready carries the DEK and AAD prefix. Wrapper types that just bundle anInputFile/OutputFilewith the same data are unnecessary indirection. Readers deserializeStandardKeyMetadatadirectly from key_metadata bytes; writers receive(OutputFile, &StandardKeyMetadata)separately.generate_native_key_metadata()is sync — unlike AGS1 encryption, PME key generation doesn't touch KMS. It just generates random bytes locally.How this differs from Java's
StandardEncryptionManageraddManifestListKeyMetadata()mutates an internal map and callers need to downcast toStandardEncryptionManagerto access the keys. Ourwrap_key_metadata()returns(wrapped_key, Option<new_kek>)directly. no hidden mutation, no downcasting. JavaEncryptionUtilgrab-bag: Java needs a static utility class (decryptManifestListKeyMetadata,encryptManifestListKeyMetadata, etc.) because the interface is too narrow. Here those are just methods onEncryptionManager.table_key_idenforced at compile time: Required viatyped_builder, can't forget it. Unencrypted tables useOption<EncryptionManager>instead of Java'sPlaintextEncryptionManager.Notes on usage for how this is used:
load_manifest_listdoes:load_manifestdoes:Are these changes tested?
Yes