WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@zhangxinyuehfad
Copy link
Contributor

@zhangxinyuehfad zhangxinyuehfad commented Dec 8, 2025

When retrieving the quantization method for MOE (e.g., the quantization file of DeepSeek v3.2 exp do not match the model's naming convention in eager mode), a KeyError is raised: "model.layers.3.mlp.experts.weight not in self.quant_description". However the quantization file is like :

  "model.layers.3.mlp.experts.255.gate_proj.weight": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.gate_proj.weight_scale": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.gate_proj.weight_offset": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.down_proj.weight": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.down_proj.weight_scale": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.down_proj.weight_offset": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.up_proj.weight": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.up_proj.weight_scale": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.up_proj.weight_offset": "W8A8_DYNAMIC",

Co-Authored-By: yangqinghao-cmss [email protected]

…experts (e.g., DeepSeek_v3.2_exp w8a8)

Signed-off-by: hfadzxy <[email protected]>
Co-authored-by: yangqinghao-cmss <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix a bug in retrieving quantization methods for MLP experts in MoE models. The changes in vllm_ascend/quantization/quant_config.py and vllm_ascend/quantization/utils.py add special handling for layer prefixes containing 'experts'.

My review identifies a critical issue in the implementation of this fix. The use of prefix in layer to identify expert sub-layers is not robust and can lead to incorrect layer grouping when one layer's prefix is a substring of another's (e.g., 'layer.1' and 'layer.10'). This could result in applying the wrong quantization configuration, which is a critical correctness bug. I have provided suggestions to use layer.startswith(prefix + '.') for a more precise match. The suggestions also improve efficiency by iterating over dictionary items directly and using more compact expressions.

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@wangxiyuan wangxiyuan merged commit 0d09453 into vllm-project:v0.11.0-dev Dec 9, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants