Add support for row lineage in v3 by dttung2905 · Pull Request #735 · apache/iceberg-go

dttung2905 · 2026-02-17T18:39:13Z

This should fully support read path and partially support write path
Unsupported write path:

Rewrite/compaction: When overwrite or rewrite copies existing rows into new data files, existing non-null _row_id and _last_updated_sequence_number are not copied into the new files. Row lineage is preserved for appends and for metadata/manifest list; it is not yet preserved when rewriting data files.
Explicit null columns on append: New data files do not write _row_id/_last_updated_sequence_number as null columns (they are omitted); that is allowed by the spec and is not planned in this PR.

A data file with only new rows for the table may omit the _last_updated_sequence_number and _row_id. If the columns are missing, readers should treat both columns as if they exist and are set to null for all rows.

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

laskoviymishka

The read path structure is solid and the Java alignment is largely correct — field IDs, doc strings, manifest list writer semantics, and the Arrow synthesis pipeline all check out.

Three issues need to land before this merges.

First row ID inheritance diverges from Java spec (manifest.go ReadEntry). Java's idAssigner unconditionally executes nextRowId += file.recordCount() for every file — null or explicit. The Go implementation only advances nextFirstRowID when FirstRowIDField == nil, so a file with an explicit first_row_id silently resets the baseline for all subsequent null files in the same manifest, producing overlapping row ID ranges. The fix and the *int64 cleanup land together: initialize nextFirstRowID eagerly in NewManifestReader, then unconditionally advance after the conditional assign.

Wrong sequence number for DataSequenceNumber (scanner.go PlanFiles). e.SequenceNum() is the manifest entry's metadata sequence number; _last_updated_sequence_number per spec requires the data sequence number — entry.dataSequenceNumber() in Java, e.FileSequenceNum() in Go. These are identical for freshly ADDED entries but diverge for EXISTING entries carried across compacted manifests, where the bug silently inflates the reported sequence number.

ManifestFile.FirstRowId() must be FirstRowID() before this public interface is merged. The PR already correctly renames the struct field to FirstRowID; the exported method should follow the same Go acronym convention. Fixing a public interface post-merge requires a breaking change.

manifest.go

table/scanner.go

manifest.go

laskoviymishka · 2026-03-03T12:58:54Z

table/arrow_scanner.go

+	rowOffset *int64,
+	task FileScanTask,
+	batch arrow.RecordBatch,
+	_ *iceberg.Schema,


why does this needed?

table/arrow_scanner.go

manifest.go

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

laskoviymishka

One more thing: memory leak, aside that - all good.

Same root cause as #762 — NewArray() starts at refcount 1, NewRecordBatch retains to refcount 2, local refs are never dropped so memory is never freed. Two places: the production release loop in synthesizeRowLineageColumns and the test setup in TestSynthesizeRowLineageColumns. The test fix is as important as the production fix — NewCheckedAllocator would have caught this immediately and prevents regressions of the same class.

laskoviymishka · 2026-03-04T17:02:33Z

table/arrow_scanner.go

+// first_row_id and data_sequence_number; otherwise the value from the file is kept.
+// rowOffset is the 0-based row index within the current file and is updated so _row_id stays
+// correct across multiple batches from the same file (first_row_id + row_position).
+func synthesizeRowLineageColumns(


Same root cause as #762 — bldr.NewArray() starts at refcount=1, array.NewRecordBatch retains to refcount=2, but the local refs in newCols are never released. Fix needs a release loop after the batch is created:

rec := array.NewRecordBatch(schema, newCols, nrows) for _, c := range newCols { c.Release() } return rec, nil

Thanks for pointing out this . I see that the PR 762 has been approved and waiting to be merged. Let me know once it lands in main so that I can rebase and apply the fix for this PR

laskoviymishka · 2026-03-04T17:05:52Z

table/scanner_internal_test.go

+	defer seqBldr.Release()
+	seqBldr.AppendNulls(nrows)
+
+	batch := array.NewRecordBatch(schema, []arrow.Array{


Inline NewArray() calls go directly into NewRecordBatch with no way to release them afterward — same leak pattern. Fix by assigning to locals first and releasing after batch construction. Then add memory.NewCheckedAllocator + defer mem.AssertSize(t, 0) to make the class of leak self-enforcing going forward — same pattern used in #762 to catch exactly this.

dttung2905 mentioned this pull request Feb 17, 2026

feat: Wire V3 snapshot producer to row-lineage state #728

Merged

dttung2905 force-pushed the row-lineage-v3 branch from 6af257b to 4da5bf5 Compare February 20, 2026 23:01

dttung2905 added 3 commits March 2, 2026 21:09

Add support for row lineage in v3

c991fc4

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

Fix CI failure

9962821

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

Fix CI failure

9e227d2

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

laskoviymishka suggested changes Mar 3, 2026

View reviewed changes

Fixes from codereview

61787dd

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

dttung2905 force-pushed the row-lineage-v3 branch from 9256510 to 61787dd Compare March 3, 2026 23:01

Fixes from codereview

3b3c7e2

Signed-off-by: dttung2905 <ttdao.2015@accountancy.smu.edu.sg>

dttung2905 requested a review from laskoviymishka March 4, 2026 16:45

laskoviymishka suggested changes Mar 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for row lineage in v3#735

Add support for row lineage in v3#735
dttung2905 wants to merge 5 commits intoapache:mainfrom
dttung2905:row-lineage-v3

dttung2905 commented Feb 17, 2026

Uh oh!

laskoviymishka left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laskoviymishka Mar 3, 2026

Uh oh!

Uh oh!

Uh oh!

laskoviymishka left a comment

Uh oh!

laskoviymishka Mar 4, 2026

Uh oh!

dttung2905 Mar 4, 2026

Uh oh!

laskoviymishka Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dttung2905 commented Feb 17, 2026

Uh oh!

laskoviymishka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

laskoviymishka Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

laskoviymishka left a comment

Choose a reason for hiding this comment

Uh oh!

laskoviymishka Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

dttung2905 Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

laskoviymishka Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants