Core: Support Hadoop bulk delete API. by steveloughran · Pull Request #15436 · apache/iceberg

steveloughran · 2026-02-24T19:06:03Z

Reflection-based used of Hadoop 3.4.1+ BulkDelete API so that S3 object deletions can be done in pages of objects, rather than one at a time.

Configuration option "iceberg.hadoop.bulk.delete.enabled" to switch to bulk deletes

This switch is on by default to help test through the spark versions and verify fallback.

In production it might be best if it not only off, but the code changed so if bulk delete wasn't available then there'd be no fallback, just an error "bulk delete requested but not available due to hadoop library too old".

Avoids any ambiguity about why it doesn't work.
Only of relevance for cloud connectors with the feature (currently: s3a)

Reflection-based used of Hadoop 3.4.1+ BulkDelete API so that S3 object deletions can be done in pages of objects, rather than one at a time. * Configuration option "iceberg.hadoop.bulk.delete.enabled" to switch to bulk deletes.

steveloughran · 2026-02-24T19:09:51Z

There's something else to consider here. Do we need full reflection given the method is available at compile time? Instead, only use the operations if enabled, catch link failures and report better.

then there'd be spark tests where 4.0 and 4.1 verify the operation is there, 3.x expect failure when requested.

Uses the API directly in iceberg-core, which is compiled at hadoop 3.4.3 But this is isolated to one class, org.apache.iceberg.hadoop.BulkDeleter, which is only loaded when bulk delete is enabled with "iceberg.hadoop.bulk.delete.enabled" There's no attempt at a graceful fallback. If it is enabled and not found, bulk delete will fail.

Core: Support Hadoop bulk delete API.

80e9d47

Reflection-based used of Hadoop 3.4.1+ BulkDelete API so that S3 object deletions can be done in pages of objects, rather than one at a time. * Configuration option "iceberg.hadoop.bulk.delete.enabled" to switch to bulk deletes.

steveloughran marked this pull request as draft February 24, 2026 19:06

github-actions bot added core docs labels Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: Support Hadoop bulk delete API.#15436

Core: Support Hadoop bulk delete API.#15436
steveloughran wants to merge 2 commits intoapache:mainfrom
steveloughran:pr/12055-bulk-delete-2026

steveloughran commented Feb 24, 2026

Uh oh!

steveloughran commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

steveloughran commented Feb 24, 2026

Uh oh!

steveloughran commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant