[bugfix] Fixed the bug in retrieving the quantization method for mlp.… #4797

zhangxinyuehfad · 2025-12-08T11:50:55Z

When retrieving the quantization method for MOE (e.g., the quantization file of DeepSeek v3.2 exp do not match the model's naming convention in eager mode), a KeyError is raised: "model.layers.3.mlp.experts.weight not in self.quant_description". However the quantization file is like :

  "model.layers.3.mlp.experts.255.gate_proj.weight": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.gate_proj.weight_scale": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.gate_proj.weight_offset": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.down_proj.weight": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.down_proj.weight_scale": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.down_proj.weight_offset": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.up_proj.weight": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.up_proj.weight_scale": "W8A8_DYNAMIC",
  "model.layers.3.mlp.experts.255.up_proj.weight_offset": "W8A8_DYNAMIC",

Co-Authored-By: yangqinghao-cmss [email protected]

…experts (e.g., DeepSeek_v3.2_exp w8a8) Signed-off-by: hfadzxy <[email protected]> Co-authored-by: yangqinghao-cmss <[email protected]>

gemini-code-assist

Code Review

This pull request aims to fix a bug in retrieving quantization methods for MLP experts in MoE models. The changes in vllm_ascend/quantization/quant_config.py and vllm_ascend/quantization/utils.py add special handling for layer prefixes containing 'experts'.

My review identifies a critical issue in the implementation of this fix. The use of prefix in layer to identify expert sub-layers is not robust and can lead to incorrect layer grouping when one layer's prefix is a substring of another's (e.g., 'layer.1' and 'layer.10'). This could result in applying the wrong quantization configuration, which is a critical correctness bug. I have provided suggestions to use layer.startswith(prefix + '.') for a more precise match. The suggestions also improve efficiency by iterating over dictionary items directly and using more compact expressions.

vllm_ascend/quantization/quant_config.py

vllm_ascend/quantization/utils.py

github-actions · 2025-12-08T12:06:24Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

[bugfix] Fixed the bug in retrieving the quantization method for mlp.…

592fcff

…experts (e.g., DeepSeek_v3.2_exp w8a8) Signed-off-by: hfadzxy <[email protected]> Co-authored-by: yangqinghao-cmss <[email protected]>

gemini-code-assist bot reviewed Dec 8, 2025

View reviewed changes

vllm_ascend/quantization/quant_config.py Show resolved Hide resolved

vllm_ascend/quantization/utils.py Show resolved Hide resolved

github-actions bot added the module:quantization label Dec 8, 2025

wangxiyuan mentioned this pull request Dec 9, 2025

[bugfix] Fixed the bug in retrieving the quantization method for mlp.experts (e.g., DeepSeek_v3.2_exp w8a8) #4035

Closed

wangxiyuan merged commit 0d09453 into vllm-project:v0.11.0-dev Dec 9, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix] Fixed the bug in retrieving the quantization method for mlp.… #4797

[bugfix] Fixed the bug in retrieving the quantization method for mlp.… #4797

Uh oh!

zhangxinyuehfad commented Dec 8, 2025 •

edited by wangxiyuan

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[bugfix] Fixed the bug in retrieving the quantization method for mlp.… #4797

[bugfix] Fixed the bug in retrieving the quantization method for mlp.… #4797

Uh oh!

Conversation

zhangxinyuehfad commented Dec 8, 2025 • edited by wangxiyuan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhangxinyuehfad commented Dec 8, 2025 •

edited by wangxiyuan

Loading