Skip to content

feat: Wire V3 snapshot producer to row-lineage state#728

Merged
zeroshade merged 8 commits intoapache:mainfrom
laskoviymishka:v3-row-lineage
Feb 18, 2026
Merged

feat: Wire V3 snapshot producer to row-lineage state#728
zeroshade merged 8 commits intoapache:mainfrom
laskoviymishka:v3-row-lineage

Conversation

@laskoviymishka
Copy link
Contributor

@laskoviymishka laskoviymishka commented Feb 15, 2026

this PR closes the missing v3 commit-path wiring for row lineage in iceberg-go (part of #727):
snapshot lineage fields were validated downstream but not fully populated in snapshot producer commit flow.

What changed

v3 snapshot producer now:

  • sets snapshot first-row-id from table next-row-id,
  • computes snapshot added-rows from manifest-list writer assigned row-id delta (writer.nextRowID - firstRowID),
  • keeps v1/v2 flow unchanged.

Added tests for:

  • v3 single-commit lineage
  • two sequential commits (monotonic, gap-free lineage)
  • merge path where assigned delta includes existing rows
  • manifest-list writer delta behavior including non-data manifest exclusion

Correctness notes

This aligns with strict lineage advancement in metadata builder (next-row-id += snapshot.added-rows).

Behavior is aligned with Iceberg reference implementations’ commit-time row-lineage model (especially Java writer-delta semantics).

@laskoviymishka laskoviymishka marked this pull request as draft February 15, 2026 22:35
@laskoviymishka laskoviymishka marked this pull request as ready for review February 15, 2026 23:28
Copy link
Contributor

@dttung2905 dttung2905 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no :sad. I didnt realise you have also put in effort for the similar issue #735. In my PR I have also try to implement full read part and partial write path, can you give it a read and review too? May be we can consolidate it to avoid doubling the effort :D

sp.snapshotID, parentSnapshot, &nextSequence, 0, newManifests)
if err != nil {
return nil, nil, err
if sp.txn.meta.formatVersion == 3 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should result to use the single API WriteManifestList for all 3 version instead of branching out like this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be aligned with what other drivers do (rust / java / python)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now let's keep things aligned with the other drivers and then we can do another pass for improving it later on

@laskoviymishka
Copy link
Contributor Author

I think we can probably split this work and collaborate in a way that avoids overlap or conflicts.

Here’s how I was thinking about sequencing row-lineage work in Go:

  1. Wire the V3 snapshot producer to the row-lineage state — currently in progress in feat: Wire V3 snapshot producer to row-lineage state #728 (foundation, append path).
  2. Enforce row-lineage invariants at the commit boundary — drafted and planned right after feat: Wire V3 snapshot producer to row-lineage state #728.
  3. Add integration tests for sequential commits — drafted, just not published yet.
  4. Extend support to overwrite/delete/rewrite paths — likely the trickiest part.
  5. CLI visibility + docs — follow-up polish.

From what I can see, #735 seems to focus more on broader row-lineage work (read path + more complex write paths). Maybe a reasonable approach would be:

  1. Merge feat: Wire V3 snapshot producer to row-lineage state #728 first as the base.
  2. Then rebase Add support for row lineage in v3 #735 on top of it and continue with overwrite/rewrite/delete paths and the remaining gaps.

If that sounds reasonable, I’m happy to keep driving 1→3, and you (@dttung2905) could take 4 (and the other remaining pieces) once #728 lands.

Open to your ideas (we can park this coordination in slack if you wish)

@laskoviymishka
Copy link
Contributor Author

@zeroshade could you please take a look?

I’d also like to align on the plan with @twuebi, since he originally drove the v3 implementation epic.

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this seems pretty fine to me, is there any extra wiring up that needs to happen for the recently added Delete writing functionality for maintaining the lineage etc?

Also, what's the plan between this and #735? @dttung2905 is there anything in your PR that isn't covered by this or vice-versa? I think @laskoviymishka's plan is a good one for dividing and conquering the epic if you both want to work that out. I just want to confirm that we've hit as much of the edge cases as possible before merging this and determining whether we should close #735 or not.

@dttung2905
Copy link
Contributor

@zeroshade the #735 intents to cover a broader case than wiring v3 snapshot producer. But I'm happy to get this PR 728 merged and then I can continue working on point 4 after rebasing my PR from main

I think we can probably split this work and collaborate in a way that avoids overlap or conflicts.

Here’s how I was thinking about sequencing row-lineage work in Go:

  1. Wire the V3 snapshot producer to the row-lineage state — currently in progress in feat: Wire V3 snapshot producer to row-lineage state #728 (foundation, append path).
  2. Enforce row-lineage invariants at the commit boundary — drafted and planned right after feat: Wire V3 snapshot producer to row-lineage state #728.
  3. Add integration tests for sequential commits — drafted, just not published yet.
  4. Extend support to overwrite/delete/rewrite paths — likely the trickiest part.
  5. CLI visibility + docs — follow-up polish.

From what I can see, #735 seems to focus more on broader row-lineage work (read path + more complex write paths). Maybe a reasonable approach would be:

  1. Merge feat: Wire V3 snapshot producer to row-lineage state #728 first as the base.
  2. Then rebase Add support for row lineage in v3 #735 on top of it and continue with overwrite/rewrite/delete paths and the remaining gaps.

If that sounds reasonable, I’m happy to keep driving 1→3, and you (@dttung2905) could take 4 (and the other remaining pieces) once #728 lands.

Open to your ideas (we can park this coordination in slack if you wish)

Copy link
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for chiming in @dttung2905 Sounds good.

This looks good so I'll merge this and the work can be continued and extended in #735

Thanks both of you for this work!

@zeroshade zeroshade merged commit 072c1ef into apache:main Feb 18, 2026
11 checks passed
ferhatelmas added a commit to ferhatelmas/apache-iceberg-go that referenced this pull request Feb 18, 2026
…w-lineage

Pointer is aliased and then mutated before encode.
Start becomes end of range. With wiring snapshot
lineage from writer delta (apache#728), this can cause
overlaps/gaps between commits.

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
zeroshade pushed a commit that referenced this pull request Feb 19, 2026
…w-lineage (#741)

Pointer is aliased and then mutated before encode. Start becomes end of
range. With wiring snapshot lineage from writer delta (#728), this can
cause overlaps/gaps between commits.

Signed-off-by: ferhat elmas <elmas.ferhat@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants