-
Notifications
You must be signed in to change notification settings - Fork 80
feat: add FastAppend #516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add FastAppend #516
Conversation
| /// When committing, these changes will be applied to the latest table snapshot. Commit | ||
| /// conflicts will be resolved by applying the changes to the new latest snapshot and | ||
| /// reattempting the commit. | ||
| class ICEBERG_EXPORT AppendFiles { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, we don't need this class. The Java impl's AppendFiles extends SnapshotUpdate so that one has more methods to call. We can just return FastAppend from NewFastAppend() and then return AppendFiles|MergeAppend from NewAppend().
| /// \return Reference to this for method chaining | ||
| FastAppend& Set(const std::string& property, const std::string& value); | ||
|
|
||
| Kind kind() const override { return Kind::kUpdateSnapshot; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be moved to the base SnapshotUpdate class?
| /// | ||
| /// \param branch The branch name | ||
| /// \return Reference to this for method chaining | ||
| FastAppend& ToBranch(const std::string& branch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify reuse SetTargetBranch or rename it to ToBranch if this is a better name?
| /// \param property The property name | ||
| /// \param value The property value | ||
| /// \return Reference to this for method chaining | ||
| FastAppend& Set(const std::string& property, const std::string& value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we directly define this as a virtual function in the SnapshotUpdate?
| Result<std::vector<ManifestFile>> WriteNewManifests(); | ||
|
|
||
| private: | ||
| struct DataFilePtrHash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move DataFilePtrHash and DataFilePtrEqual to manifest_entry.h (where DataFile is defined) or content_file_util.h (if DataFileSet is needed as below comment)?
| // Cleanup after committing is disabled for FastAppend unless there are | ||
| // rewritten_append_manifests because: | ||
| // 1.) Appended manifests are never rewritten | ||
| // 2.) Manifests which are written out as part of appendFile are already cleaned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // 2.) Manifests which are written out as part of appendFile are already cleaned | |
| // 2.) Manifests which are written out as part of AppendFile are already cleaned |
| new_manifests_.clear(); | ||
| } | ||
|
|
||
| // Clean up only rewritten_append_manifests as they are always owned by the table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Clean up only rewritten_append_manifests as they are always owned by the table | |
| // Clean up only rewritten append manifests as they are always owned by the table |
| } | ||
|
|
||
| // Clean up only rewritten_append_manifests as they are always owned by the table | ||
| // Don't clean up append_manifests as they are added to the manifest list and are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Don't clean up append_manifests as they are added to the manifest list and are | |
| // Don't clean up append manifests as they are added to the manifest list and are |
| std::vector<std::shared_ptr<DataFile>> files; | ||
| files.reserve(data_files.size()); | ||
| std::ranges::copy(data_files, std::back_inserter(files)); | ||
| ICEBERG_ASSIGN_OR_RAISE(auto written_manifests, WriteDataManifests(files, spec)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to revisit the signature of WriteDataManifests to use iterator begin and end of DataFile. We can add a TODO comment to WriteDataManifests for now.
| // If there are new files and manifests were already written, clean them up | ||
| if (has_new_files_ && !new_manifests_.empty()) { | ||
| for (const auto& manifest : new_manifests_) { | ||
| ICEBERG_RETURN_UNEXPECTED(DeleteFile(manifest.manifest_path)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore the error?
| for (const auto& manifest : append_manifests_) { | ||
| ManifestFile updated = manifest; | ||
| updated.added_snapshot_id = snapshot_id; | ||
| manifests.push_back(updated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and std::move
| for (const auto& manifest : rewritten_append_manifests_) { | ||
| ManifestFile updated = manifest; | ||
| updated.added_snapshot_id = snapshot_id; | ||
| manifests.push_back(updated); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| // Clean up new manifests that were written but not committed | ||
| if (!new_manifests_.empty()) { | ||
| for (const auto& manifest : new_manifests_) { | ||
| if (committed.find(manifest.manifest_path) == committed.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (committed.find(manifest.manifest_path) == committed.end()) { | |
| if (!committed.contains(manifest.manifest_path))) { |
| // not compacted | ||
| if (!rewritten_append_manifests_.empty()) { | ||
| for (const auto& manifest : rewritten_append_manifests_) { | ||
| if (committed.find(manifest.manifest_path) == committed.end()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
No description provided.