-
Notifications
You must be signed in to change notification settings - Fork 384
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Description
Running supported models via run_llm.py gets the error:
Traceback (most recent call last):
File "/home/zewenl/Documents/pytorch/TensorRT/tools/llm/run_llm.py", line 333, in <module>
trt_model = compile_torchtrt(model, input_ids, args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zewenl/Documents/pytorch/TensorRT/tools/llm/run_llm.py", line 134, in compile_torchtrt
trt_model = torch_tensorrt.dynamo.compile(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zewenl/Documents/pytorch/TensorRT/py/torch_tensorrt/dynamo/_compiler.py", line 798, in compile
trt_gm = compile_module(
^^^^^^^^^^^^^^^
File "/home/zewenl/Documents/pytorch/TensorRT/py/torch_tensorrt/dynamo/_compiler.py", line 1044, in compile_module
trt_module = convert_module(
^^^^^^^^^^^^^^^
File "/home/zewenl/Documents/pytorch/TensorRT/py/torch_tensorrt/dynamo/conversion/_conversion.py", line 343, in convert_module
serialized_interpreter_result = interpret_module_to_result(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/zewenl/Documents/pytorch/TensorRT/py/torch_tensorrt/dynamo/conversion/_conversion.py", line 215, in interpret_module_to_result
raise RuntimeError(
RuntimeError: Failed to extract symbolic shape expressions from source FX graph partition
To Reproduce
python run_llm.py --model Qwen/Qwen2.5-0.5B-Instruct --prompt "What is parallel programming?" --model_precision FP16 --num_tokens 128 --cache static_v2 --benchmark
or
python run_llm.py --model gpt2 --prompt "What is parallel programming?" --model_precision FP16 --num_tokens 128 --cache static_v2 --benchmark
Expected behavior
correct outputs
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- Torch-TensorRT Version (e.g. 1.0.0):
- PyTorch Version (e.g. 1.0):
- CPU Architecture:
- OS (e.g., Linux):
- How you installed PyTorch (
conda,pip,libtorch, source): - Build command you used (if compiling from source):
- Are you using local sources or building from archives:
- Python version:
- CUDA version:
- GPU models and configuration:
- Any other relevant information:
Additional context
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working