[BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill #4770

drslark · 2025-12-08T03:31:49Z

What this PR does / why we need it?

The pad -1 modification is from vllm-project/vllm#25743.

It still has bugs for batched chunked prefill.

So, we just let e2e to be right.

pytest -s tests/e2e/multicard/test_qwen3_next.py::test_models_distributed_Qwen3_NEXT_MTP_TP4_SIMILARITY

Outputs:

========================================================================================================================== warnings summary ===========================================================================================================================
../../usr/local/python3.11.10/lib/python3.11/site-packages/torch_npu/dynamo/torchair/__init__.py:8
  /usr/local/python3.11.10/lib/python3.11/site-packages/torch_npu/dynamo/torchair/__init__.py:8: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================================== 1 passed, 3 warnings in 172.17s (0:02:52) ==============================================================================================================

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

vLLM version: v0.12.0
vLLM main: vllm-project/vllm@ad32e3e

Signed-off-by: drslark <[email protected]>

gemini-code-assist

Code Review

This pull request adapts the Qwen3-Next-MTP model for chunked prefill, primarily by adding a fallback implementation for causal convolution and enabling the corresponding tests. The changes also include patches to work around limitations in torch_npu and vLLM's utility functions.

My review focuses on the maintainability and risks associated with these workarounds. While the changes appear to be functional for the immediate goal, they introduce significant technical debt:

A large block of code for causal convolution has been duplicated from upstream, which will be difficult to maintain.
Two separate monkey patches are introduced for bind_kv_cache and torch.argsort. These are risky and can lead to future breakages.

I've added comments with high severity for these issues, recommending the creation of tracking issues to address the underlying problems and eventually remove these temporary solutions. These measures are crucial for the long-term health of the codebase.

vllm_ascend/ops/triton/mamba/casual_conv1d.py

vllm_ascend/patch/worker/patch_utils.py

vllm_ascend/worker/model_runner_v1.py

github-actions · 2025-12-08T03:56:44Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

wangxiyuan · 2025-12-08T08:23:03Z

vllm_ascend/patch/worker/patch_utils.py

+from vllm.v1.worker.utils import defaultdict, extract_layer_index
+
+
+# Discussed with @MengqingCao, we can patch just for now.


Remove Discussed with @MengqingCao

wangxiyuan · 2025-12-08T08:25:01Z

vllm_ascend/worker/model_runner_v1.py

    ACL_FORMAT = ACL_FORMAT_FRACTAL_ND


+# Discussed with @zzzzwwjj, we can patch argsort just for now.


move to patch module

wangxiyuan · 2025-12-08T08:25:12Z

vllm_ascend/patch/worker/patch_utils.py

@@ -0,0 +1,52 @@
+import torch


rename to patch_qwen3_next

[BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill

f80b954

Signed-off-by: drslark <[email protected]>

gemini-code-assist bot reviewed Dec 8, 2025

View reviewed changes

vllm_ascend/ops/triton/mamba/casual_conv1d.py Show resolved Hide resolved

vllm_ascend/patch/worker/patch_utils.py Show resolved Hide resolved

vllm_ascend/worker/model_runner_v1.py Show resolved Hide resolved

github-actions bot added module:tests module:ops labels Dec 8, 2025

wangxiyuan reviewed Dec 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill #4770

[BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill #4770

drslark commented Dec 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

wangxiyuan Dec 8, 2025

Uh oh!

wangxiyuan Dec 8, 2025

Uh oh!

wangxiyuan Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from vllm.v1.worker.utils import defaultdict, extract_layer_index


		# Discussed with @MengqingCao, we can patch just for now.

		ACL_FORMAT = ACL_FORMAT_FRACTAL_ND


		# Discussed with @zzzzwwjj, we can patch argsort just for now.

[BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill #4770

Are you sure you want to change the base?

[BugFix][main] Adapted Qwen3-Next-MTP to chunked prefill #4770

Conversation

drslark commented Dec 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

wangxiyuan Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drslark commented Dec 8, 2025 •

edited by github-actions bot

Loading