Core: Support Hadoop bulk delete API.#15436
Draft
steveloughran wants to merge 2 commits intoapache:mainfrom
Draft
Core: Support Hadoop bulk delete API.#15436steveloughran wants to merge 2 commits intoapache:mainfrom
steveloughran wants to merge 2 commits intoapache:mainfrom
Conversation
Reflection-based used of Hadoop 3.4.1+ BulkDelete API so that S3 object deletions can be done in pages of objects, rather than one at a time. * Configuration option "iceberg.hadoop.bulk.delete.enabled" to switch to bulk deletes.
Contributor
Author
|
There's something else to consider here. Do we need full reflection given the method is available at compile time? Instead, only use the operations if enabled, catch link failures and report better. then there'd be spark tests where 4.0 and 4.1 verify the operation is there, 3.x expect failure when requested. |
Uses the API directly in iceberg-core, which is compiled at hadoop 3.4.3 But this is isolated to one class, org.apache.iceberg.hadoop.BulkDeleter, which is only loaded when bulk delete is enabled with "iceberg.hadoop.bulk.delete.enabled" There's no attempt at a graceful fallback. If it is enabled and not found, bulk delete will fail.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reflection-based used of Hadoop 3.4.1+ BulkDelete API so that S3 object deletions can be done in pages of objects, rather than one at a time.
Configuration option "iceberg.hadoop.bulk.delete.enabled" to switch to bulk deletes
This switch is on by default to help test through the spark versions and verify fallback.
In production it might be best if it not only off, but the code changed so if bulk delete wasn't available then there'd be no fallback, just an error "bulk delete requested but not available due to hadoop library too old".