Flink: Add passthroughRecords option to DynamicIcebergSink by sqd · Pull Request #15433 · apache/iceberg

sqd · 2026-02-24T16:02:43Z

When enabled, records are forwarded directly from the record generator to the writer using a forward edge instead of a hash edge. This allows Flink to chain the two operators, avoiding serialization/deserialization overhead and drastically increasing throughput in high-volume pipelines.

Current topology:

Same pipeline, with the new change enabled:

Serdes of Flink RowData can be very expensive:

When enabled, records are forwarded directly from the record generator to the writer using a forward edge instead of a hash edge. This allows Flink to chain the two operators, avoiding serialization/deserialization overhead and drastically increasing throughput in high-volume pipelines.

sqd · 2026-02-24T16:09:25Z

@mxm @pvary I would appreciate if you could please take a look. I'm happy to provide any detail/context. I have tested this on an internal pipeline which processes around 10TB~20TB of data per hour, where this change has drastically reduced the resources usage and increased output.

mxm · 2026-02-25T08:29:03Z

flink/v2.0/flink/src/main/java/org/apache/iceberg/flink/sink/dynamic/DynamicIcebergSink.java

+      if (passthroughRecords) {
+        if (!immediateUpdate) {
+          throw new UnsupportedOperationException(
+              "Immediate update must be enabled to pass through records");
+        }
+        rowDataDataStreamSink = converted.sinkTo(sink).uid(prefixIfNotNull(uidPrefix, "-sink"));
+      } else {


This will ignore DistributionMode and partitioning in DynamicRecord. I saw that you listed this in the docs, but I'm not sure we should diverge too much from the normal mode of operation. I think what we can do, is to add a new chained side output with an extra DynamicWriter for this quick path.

It may be worth adding a new DistributionMode. Currently NONE does a round-robin, which is slightly confusing, we could rename it to ROUND_ROBIN and use NONE for this direct path.

How do we handle DistributionMode in the normal Sink?
We should be consistent

pvary · 2026-02-25T08:37:58Z

@sqd Could you share a bit more about your use case? Ignoring DistributionMode and chaining directly to writers feels quite risky to me, even if the performance gains are tempting.

This approach might work if your input records are already correctly distributed. But any mistake there will lead to small files or skewed writes—fast for the writers, but potentially very costly for the readers.

github-actions bot added flink docs labels Feb 24, 2026

mxm reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Flink: Add passthroughRecords option to DynamicIcebergSink#15433

Flink: Add passthroughRecords option to DynamicIcebergSink#15433
sqd wants to merge 1 commit intoapache:mainfrom
sqd:oss_passthrough_records

sqd commented Feb 24, 2026

Uh oh!

sqd commented Feb 24, 2026

Uh oh!

mxm Feb 25, 2026

Uh oh!

pvary Feb 25, 2026

Uh oh!

pvary commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

sqd commented Feb 24, 2026

Uh oh!

sqd commented Feb 24, 2026

Uh oh!

mxm Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

pvary Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

pvary commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants