Add Qwen3-VL model support + multi-image input support in Qwen VL family#2345
Open
hanbitmyths wants to merge 20 commits intomicrosoft:mainfrom
Open
Add Qwen3-VL model support + multi-image input support in Qwen VL family#2345hanbitmyths wants to merge 20 commits intomicrosoft:mainfrom
hanbitmyths wants to merge 20 commits intomicrosoft:mainfrom
Conversation
- graph_surgeries.py: add QwenVL-specific graph surgery passes for vision embedding merge and positional encoding fixup - rtn_quantization.py: extend RTN quantization for multimodal models, handle vision encoder exclusion patterns - cast_chain_elimination.py: new pass to eliminate redundant Cast chains in Dynamo-exported models (fp32->fp16->fp32 patterns) - olive_config.json: register new passes
…surgery passes - rtn_quantization.py: Parameterize bits through quantization methods to support 8-bit Gather - common.py: Fix ByteSize() crash for >2GB models, fix FOLDED_FROM_KEY import - graph_surgeries.py: Add ReciprocalMulToDiv, DeduplicateSubgraphInitializers, DeduplicateNodes
…author (TD002), fix formatting
- Apply ruff format to 4 files (cast_chain_elimination.py, rtn_quantization.py, test_graph_surgeries.py, test_rtn_quantization.py) - Fix _pack_int8_to_int4 reshape bug: replace global flatten+pack with axis-aware _pack_int4_along_axis that correctly packs zero_point when k_blocks is small (e.g. 1), avoiding ValueError on reshape - Fix test_rtn_quantization_pass_gather assertion: GatherBlockQuantized always uses quantize_axis=data_rank-1, not pass_config['axis']
The upstream tuning_strategies.md page no longer exists, causing the Sphinx linkcheck to fail with -W (warnings-as-errors).
devang-ml
reviewed
Mar 13, 2026
Address PR review feedback from @devang-ml and @justinchuby: use onnxscript.optimizer.optimize() instead of ORT InferenceSession with session.enable_cast_chain_elimination to eliminate redundant Cast chains. - Remove onnxruntime dependency from cast_chain_elimination pass - Use onnxscript.optimizer.optimize() with TypeInferenceError fallback (same pattern as OnnxPeepholeOptimizer) - Update test comment to reflect onnxscript optimizer - Verified: numerically identical outputs (0.00 max abs diff) - Verified: no eval regression (69% on AI2D 100 samples)
Resolve conflict in olive/passes/onnx/common.py: take upstream fix from PR microsoft#2355 (ByteSize EncodeError handling).
justinchuby
reviewed
Mar 13, 2026
Comment on lines
+92
to
+98
| except Exception as e: | ||
| if "TypeInferenceError" in str(e): | ||
| logger.info( | ||
| "onnxscript optimizer failed with %s. Rerunning with shape inference disabled.", | ||
| str(e), | ||
| ) | ||
| onnx_model = onnxscript.optimizer.optimize(onnx_model, onnx_shape_inference=False) |
Contributor
There was a problem hiding this comment.
Why does this happen? Seems over complicated
justinchuby
reviewed
Mar 13, 2026
| # The pass should produce a valid, runnable model. | ||
| # Actual cast elimination depends on the ORT version; at minimum the | ||
| # output graph must not have *more* nodes than the input. | ||
| # The onnxscript optimizer should fold the redundant fp32→fp16→fp32 |
Contributor
There was a problem hiding this comment.
Wait: fp32→fp16→fp32 onnxscript optimizer doesn't fold (I don't think), because it is actually clamping the precision with the cast. Instead, I think you can create a rewrite rule for it
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds support for exporting and optimizing Qwen3-VL (and Qwen2.5-VL) vision-language models through Olive, including new ONNX graph surgery passes, 8-bit quantization enhancements, and a cast chain elimination pass.