WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@drslark
Copy link
Contributor

@drslark drslark commented Dec 8, 2025

What this PR does / why we need it?

The pad -1 modification is from vllm-project/vllm#25743.

It still has bugs for batched chunked prefill.

So, we just let e2e to be right.

pytest -s tests/e2e/multicard/test_qwen3_next.py::test_models_distributed_Qwen3_NEXT_MTP_TP4_SIMILARITY

Outputs:

========================================================================================================================== warnings summary ===========================================================================================================================
../../usr/local/python3.11.10/lib/python3.11/site-packages/torch_npu/dynamo/torchair/__init__.py:8
  /usr/local/python3.11.10/lib/python3.11/site-packages/torch_npu/dynamo/torchair/__init__.py:8: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:241
  <frozen importlib._bootstrap>:241: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================================== 1 passed, 3 warnings in 172.17s (0:02:52) ==============================================================================================================

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adapts the Qwen3-Next-MTP model for chunked prefill, primarily by adding a fallback implementation for causal convolution and enabling the corresponding tests. The changes also include patches to work around limitations in torch_npu and vLLM's utility functions.

My review focuses on the maintainability and risks associated with these workarounds. While the changes appear to be functional for the immediate goal, they introduce significant technical debt:

  1. A large block of code for causal convolution has been duplicated from upstream, which will be difficult to maintain.
  2. Two separate monkey patches are introduced for bind_kv_cache and torch.argsort. These are risky and can lead to future breakages.

I've added comments with high severity for these issues, recommending the creation of tracking issues to address the underlying problems and eventually remove these temporary solutions. These measures are crucial for the long-term health of the codebase.

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

from vllm.v1.worker.utils import defaultdict, extract_layer_index


# Discussed with @MengqingCao, we can patch just for now.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Discussed with @MengqingCao

ACL_FORMAT = ACL_FORMAT_FRACTAL_ND


# Discussed with @zzzzwwjj, we can patch argsort just for now.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to patch module

@@ -0,0 +1,52 @@
import torch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to patch_qwen3_next

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants