Skip to content

feat(iceberg): Enable delete files processing in snapshot producer#2367

Open
CTTY wants to merge 4 commits intoapache:mainfrom
CTTY:ctty/rewrite-process-delete
Open

feat(iceberg): Enable delete files processing in snapshot producer#2367
CTTY wants to merge 4 commits intoapache:mainfrom
CTTY:ctty/rewrite-process-delete

Conversation

@CTTY
Copy link
Copy Markdown
Collaborator

@CTTY CTTY commented Apr 24, 2026

Which issue does this PR close?

What changes are included in this PR?

  • Support writing delete manifest in SnapshotProducer

Are these changes tested?

Added uts in snapshot.rs, we should add end-to-end tests when actual rewrite/delete actions are supported

Comment thread crates/iceberg/src/transaction/snapshot.rs
Comment thread crates/iceberg/src/transaction/snapshot.rs Outdated
Comment thread crates/iceberg/src/transaction/snapshot.rs Outdated
Comment thread crates/iceberg/src/transaction/snapshot.rs Outdated
@CTTY CTTY marked this pull request as ready for review April 24, 2026 22:24
Copy link
Copy Markdown
Contributor

@xanderbailey xanderbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! Took a pass at reviewing, let me know if it makes sense!

Comment on lines -87 to 94
let snapshot_producer = SnapshotProducer::new(
table,
self.commit_uuid.unwrap_or_else(Uuid::now_v7),
self.key_metadata.clone(),
self.snapshot_properties.clone(),
self.added_data_files.clone(),
);
let snapshot_producer = SnapshotProducer::builder()
.with_table(table)
.with_commit_uuid(self.commit_uuid.unwrap_or_else(Uuid::now_v7))
.with_key_metadata(self.key_metadata.clone())
.with_snapshot_properties(self.snapshot_properties.clone())
.with_added_data_files(self.added_data_files.clone())
.build();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change

.default_partition_spec()
.as_ref()
.partition_spec_by_id(spec_id)
.ok_or(Error::new(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok_or will allocate the error on the success path, ok_or_else might be better?

/// Reference to [`Snapshot`].
pub type SnapshotRef = Arc<Snapshot>;
#[derive(Debug, Default, Serialize, Deserialize, PartialEq, Eq, Clone)]
#[derive(Debug, Default, Serialize, Deserialize, PartialEq, Eq, Clone, Hash)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be Hash now?


summary_collector.set_partition_summary_limit(partition_summary_limit);

for data_file in &self.added_data_files {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include these changes in the summary here?

if added_files.is_empty() {
return Err(Error::new(
ErrorKind::PreconditionFailed,
"No added data files found when write an added manifest file",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be updated to added data / delete files?

async fn existing_manifest(
&self,
snapshot_produce: &SnapshotProducer<'_>,
snapshot_produce: &mut SnapshotProducer<'_>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be mut?

Comment on lines +342 to +344
if deleted_entries.is_empty() {
Ok(Vec::new())
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we early return here to avoid the nesting?

Suggested change
if deleted_entries.is_empty() {
Ok(Vec::new())
} else {
if deleted_entries.is_empty() {
return Ok(Vec::new())
}

}
}

pub(crate) fn validate_added_data_files(&self) -> Result<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Process delete files when writing snapshots

2 participants