Add Qwen3-VL model support + multi-image input support in Qwen VL family by hanbitmyths · Pull Request #2345 · microsoft/Olive

hanbitmyths · 2026-03-04T06:32:43Z

This PR adds support for exporting and optimizing Qwen3-VL (and Qwen2.5-VL) vision-language models through Olive, including new ONNX graph surgery passes, 8-bit quantization enhancements, and a cast chain elimination pass.

Add Qwen3-VL / Qwen2.5-VL model export support via Model Builder and torch export
New pass: CastChainElimination removes redundant Cast→Cast chains (e.g., fp32→fp16→fp32) by collapsing them into a single Cast or eliminating them entirely when source and target types match.
GemmToMatMulAdd graph surgery converts Gemm nodes to MatMul+Add for broader runtime compatibility.
ReciprocalMulToDiv graph surgery fuses Reciprocal→Mul patterns into a single Div node.
DeduplicateSubgraphInitializers graph surgery merges duplicate initializers that share identical tensor data.
DeduplicateNodes graph surgery removes duplicate nodes that have identical op_type, attributes, and inputs.
Add 8-bit integer Gather quantization into RTN quantization.
Skip quantization of unused initializers.

- graph_surgeries.py: add QwenVL-specific graph surgery passes for vision embedding merge and positional encoding fixup - rtn_quantization.py: extend RTN quantization for multimodal models, handle vision encoder exclusion patterns - cast_chain_elimination.py: new pass to eliminate redundant Cast chains in Dynamo-exported models (fp32->fp16->fp32 patterns) - olive_config.json: register new passes

…surgery passes - rtn_quantization.py: Parameterize bits through quantization methods to support 8-bit Gather - common.py: Fix ByteSize() crash for >2GB models, fix FOLDED_FROM_KEY import - graph_surgeries.py: Add ReciprocalMulToDiv, DeduplicateSubgraphInitializers, DeduplicateNodes

olive/passes/onnx/cast_chain_elimination.py

olive/passes/onnx/common.py

olive/passes/onnx/graph_surgeries.py

…author (TD002), fix formatting

- Apply ruff format to 4 files (cast_chain_elimination.py, rtn_quantization.py, test_graph_surgeries.py, test_rtn_quantization.py) - Fix _pack_int8_to_int4 reshape bug: replace global flatten+pack with axis-aware _pack_int4_along_axis that correctly packs zero_point when k_blocks is small (e.g. 1), avoiding ValueError on reshape - Fix test_rtn_quantization_pass_gather assertion: GatherBlockQuantized always uses quantize_axis=data_rank-1, not pass_config['axis']

The upstream tuning_strategies.md page no longer exists, causing the Sphinx linkcheck to fail with -W (warnings-as-errors).

…t#2351)

olive/passes/onnx/cast_chain_elimination.py

@devang-ml

Address PR review feedback from @devang-ml and @justinchuby: use onnxscript.optimizer.optimize() instead of ORT InferenceSession with session.enable_cast_chain_elimination to eliminate redundant Cast chains. - Remove onnxruntime dependency from cast_chain_elimination pass - Use onnxscript.optimizer.optimize() with TypeInferenceError fallback (same pattern as OnnxPeepholeOptimizer) - Update test comment to reflect onnxscript optimizer - Verified: numerically identical outputs (0.00 max abs diff) - Verified: no eval regression (69% on AI2D 100 samples)

Resolve conflict in olive/passes/onnx/common.py: take upstream fix from PR microsoft#2355 (ByteSize EncodeError handling).

justinchuby · 2026-03-13T23:57:37Z

olive/passes/onnx/cast_chain_elimination.py

+            except Exception as e:
+                if "TypeInferenceError" in str(e):
+                    logger.info(
+                        "onnxscript optimizer failed with %s. Rerunning with shape inference disabled.",
+                        str(e),
+                    )
+                    onnx_model = onnxscript.optimizer.optimize(onnx_model, onnx_shape_inference=False)


Why does this happen? Seems over complicated

justinchuby · 2026-03-13T23:58:39Z

test/passes/onnx/test_cast_chain_elimination.py

-        # The pass should produce a valid, runnable model.
-        # Actual cast elimination depends on the ORT version; at minimum the
-        # output graph must not have *more* nodes than the input.
+        # The onnxscript optimizer should fold the redundant fp32→fp16→fp32


Wait: fp32→fp16→fp32 onnxscript optimizer doesn't fold (I don't think), because it is actually clamping the precision with the cast. Instead, I think you can create a rewrite rule for it

hanbitmyths added 7 commits February 26, 2026 11:19

Fix ModelBuilder sys.path for ort-genai builders package import

514362d

Expose real ModelBuilder import error for debugging

cb1987b

Clean up ModelBuilder import fix (expose chain, not debug print)

2c2269e

Remove sys.path hack for onnxruntime-genai builder import

e77864f

Add unit tests for Qwen3-VL graph surgery and quantization passes

4d5283e

github-advanced-security bot found potential problems Mar 4, 2026

View reviewed changes

hanbitmyths mentioned this pull request Mar 4, 2026

Add Qwen3-VL-2B and Qwen2.5-VL-3B builtin optimization recipes microsoft/olive-recipes#254

Open

hanbitmyths and others added 4 commits March 3, 2026 22:55

Fix lintrunner warnings: rename uppercase variables (N806), add TODO …

9fc9bd3

…author (TD002), fix formatting

Merge branch 'main' into sunghcho/qwen3-vl

32cc2ce

Add linkcheck_ignore for broken intel/neural-compressor URL

62544da

The upstream tuning_strategies.md page no longer exists, causing the Sphinx linkcheck to fail with -W (warnings-as-errors).

hanbitmyths mentioned this pull request Mar 4, 2026

Add Qwen3-VL runtime, export, and Python guide support microsoft/onnxruntime-genai#1999

Closed

hanbitmyths and others added 7 commits March 5, 2026 21:50

Merge branch 'main' into sunghcho/qwen3-vl

efe845f

Remove neural-compressor linkcheck_ignore (fixed upstream in microsof…

3d0029c

…t#2351)

Merge branch 'main' into sunghcho/qwen3-vl

5ad0fa4

Trigger CI rebuild

448e8a2

Trigger CI rebuild (lint)

b41c25f

Trigger CI rebuild (all green)

a35f6e9

Trigger CI rebuild (CodeQL)

9846f31

devang-ml reviewed Mar 13, 2026

View reviewed changes

olive/passes/onnx/cast_chain_elimination.py Outdated Show resolved Hide resolved

hanbitmyths added 2 commits March 13, 2026 16:37

Merge origin/main into sunghcho/qwen3-vl

f8146c5

Resolve conflict in olive/passes/onnx/common.py: take upstream fix from PR microsoft#2355 (ByteSize EncodeError handling).

justinchuby reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3-VL model support + multi-image input support in Qwen VL family#2345

Add Qwen3-VL model support + multi-image input support in Qwen VL family#2345
hanbitmyths wants to merge 20 commits intomicrosoft:mainfrom
hanbitmyths:sunghcho/qwen3-vl

hanbitmyths commented Mar 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

justinchuby Mar 13, 2026

Uh oh!

justinchuby Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hanbitmyths commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

justinchuby Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

justinchuby Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hanbitmyths commented Mar 4, 2026 •

edited

Loading

justinchuby Mar 13, 2026 •

edited

Loading