-
-
Notifications
You must be signed in to change notification settings - Fork 138
Open
Labels
bugSomething isn't workingSomething isn't working
Description
OS
Linux
GPU Library
CUDA 12.x
Python version
3.12
Describe the bug
When prompting Qwen VL models with a long(>4096 max_seq_len ) enough prompt the call fails with the following error:
Dec 06 23:53:16 ailab llama-swap[3174452]: models-local/qwen3-vl-32b-instruct-exl3. Skipping inline model load.
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.673 INFO: Received chat completion request
Dec 06 23:53:16 ailab llama-swap[3174452]: 96b5accf70144d28907c306816d5513e
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.712 ERROR: Traceback (most recent call last):
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.712 ERROR: File
Dec 06 23:53:16 ailab llama-swap[3174452]: "/home/user1/projects/tabbyAPI/endpoints/OAI/utils/chat_completion.py", line 437,
Dec 06 23:53:16 ailab llama-swap[3174452]: in generate_chat_completion
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.712 ERROR: generations = await
Dec 06 23:53:16 ailab llama-swap[3174452]: asyncio.gather(*gen_tasks)
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.712 ERROR:
Dec 06 23:53:16 ailab llama-swap[3174452]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.712 ERROR: File
Dec 06 23:53:16 ailab llama-swap[3174452]: "/home/user1/projects/tabbyAPI/backends/exllamav3/model.py", line 692, in generate
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.712 ERROR: async for generation in
Dec 06 23:53:16 ailab llama-swap[3174452]: self.stream_generate(
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.712 ERROR: File
Dec 06 23:53:16 ailab llama-swap[3174452]: "/home/user1/projects/tabbyAPI/backends/exllamav3/model.py", line 779, in
Dec 06 23:53:16 ailab llama-swap[3174452]: stream_generate
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.712 ERROR: async for generation_chunk in
Dec 06 23:53:16 ailab llama-swap[3174452]: self.generate_gen(
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.712 ERROR: File
Dec 06 23:53:16 ailab llama-swap[3174452]: "/home/user1/projects/tabbyAPI/backends/exllamav3/model.py", line 968, in
Dec 06 23:53:16 ailab llama-swap[3174452]: generate_gen
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.712 ERROR: raise ValueError(
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.712 ERROR: ValueError: Prompt length 10083 is greater
Dec 06 23:53:16 ailab llama-swap[3174452]: than max_seq_len 4096
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.715 ERROR: Sent to request: Chat completion
Dec 06 23:53:16 ailab llama-swap[3174452]: 96b5accf70144d28907c306816d5513e aborted. Maybe the model was unloaded? Please
Dec 06 23:53:16 ailab llama-swap[3174452]: check the server console.
Dec 06 23:53:16 ailab llama-swap[3174452]: 2025-12-06 23:53:16.716 INFO: 192.168.10.45:0 - "POST /v1/chat/completions
Dec 06 23:53:16 ailab llama-swap[3174452]: HTTP/1.1" 503
Dec 06 23:53:16 ailab llama-swap[3174452]: [WARN] metrics skipped, HTTP status=503, path=/v1/chat/completions
I believe the model configuration may not be assigned correctly to max_seq_len and fails here:
tabbyAPI/backends/exllamav3/model.py
Line 967 in 8b6b793
| if context_len > self.max_seq_len: |
Please let me know if you need more information.
Reproduction steps
Download a version of turboderp/Qwen3-VL-32B-Instruct-exl3 and run a call to endpoint with an image and a text prompt with > 4096 max_seq_len
Expected behavior
The api call should respect the model configuration from config.json
Logs
No response
Additional context
No response
Acknowledgements
- I have looked for similar issues before submitting this one.
- I have read the disclaimer, and this issue is related to a code bug. If I have a question, I will use the Discord server.
- I understand that the developers have lives and my issue will be answered when possible.
- I understand the developers of this program are human, and I will ask my questions politely.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working