feat: Wire V3 snapshot producer to row-lineage state#728
feat: Wire V3 snapshot producer to row-lineage state#728zeroshade merged 8 commits intoapache:mainfrom
Conversation
dttung2905
left a comment
There was a problem hiding this comment.
Oh no :sad. I didnt realise you have also put in effort for the similar issue #735. In my PR I have also try to implement full read part and partial write path, can you give it a read and review too? May be we can consolidate it to avoid doubling the effort :D
| sp.snapshotID, parentSnapshot, &nextSequence, 0, newManifests) | ||
| if err != nil { | ||
| return nil, nil, err | ||
| if sp.txn.meta.formatVersion == 3 { |
There was a problem hiding this comment.
Do you think we should result to use the single API WriteManifestList for all 3 version instead of branching out like this ?
There was a problem hiding this comment.
this would be aligned with what other drivers do (rust / java / python)
There was a problem hiding this comment.
For now let's keep things aligned with the other drivers and then we can do another pass for improving it later on
|
I think we can probably split this work and collaborate in a way that avoids overlap or conflicts. Here’s how I was thinking about sequencing row-lineage work in Go:
From what I can see, #735 seems to focus more on broader row-lineage work (read path + more complex write paths). Maybe a reasonable approach would be:
If that sounds reasonable, I’m happy to keep driving 1→3, and you (@dttung2905) could take 4 (and the other remaining pieces) once #728 lands. Open to your ideas (we can park this coordination in slack if you wish) |
|
@zeroshade could you please take a look? I’d also like to align on the plan with @twuebi, since he originally drove the v3 implementation epic. |
zeroshade
left a comment
There was a problem hiding this comment.
In general this seems pretty fine to me, is there any extra wiring up that needs to happen for the recently added Delete writing functionality for maintaining the lineage etc?
Also, what's the plan between this and #735? @dttung2905 is there anything in your PR that isn't covered by this or vice-versa? I think @laskoviymishka's plan is a good one for dividing and conquering the epic if you both want to work that out. I just want to confirm that we've hit as much of the edge cases as possible before merging this and determining whether we should close #735 or not.
|
@zeroshade the #735 intents to cover a broader case than wiring v3 snapshot producer. But I'm happy to get this PR 728 merged and then I can continue working on point 4 after rebasing my PR from main
|
zeroshade
left a comment
There was a problem hiding this comment.
Thanks for chiming in @dttung2905 Sounds good.
This looks good so I'll merge this and the work can be continued and extended in #735
Thanks both of you for this work!
…w-lineage Pointer is aliased and then mutated before encode. Start becomes end of range. With wiring snapshot lineage from writer delta (apache#728), this can cause overlaps/gaps between commits. Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
this PR closes the missing v3 commit-path wiring for row lineage in iceberg-go (part of #727):
snapshot lineage fields were validated downstream but not fully populated in snapshot producer commit flow.
What changed
v3 snapshot producer now:
Added tests for:
Correctness notes
This aligns with strict lineage advancement in metadata builder (next-row-id += snapshot.added-rows).
Behavior is aligned with Iceberg reference implementations’ commit-time row-lineage model (especially Java writer-delta semantics).