WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@leo-pony
Copy link
Collaborator

@leo-pony leo-pony commented Dec 8, 2025

What this PR does / why we need it?

As there is not accuracy test for qwen3-235B-A22B model

Test result:
dataset version metric mode vllm-api-general-chat


gsm8k 7cd45e accuracy gen 96.29

Times long for test case running: 30mintues

Does this PR introduce any user-facing change?

How was this patch tested?

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a gsm8k accuracy test for the Qwen3-235B-A22B model in a multi-node setup. It also increases max-num-seqs from 16 to 32 for the server deployments. My review identifies a potential performance issue in the new benchmark configuration. The batch_size for the accuracy test is set to 512, while the vLLM server is configured to handle a maximum of 32 sequences. This mismatch could lead to inefficient test execution or timeouts. I've suggested aligning these values. On a side note, there seems to be a discrepancy between the configuration filename (Qwen3-235B-A3B.yaml) and the model being tested (Qwen3-235B-A22B). While not part of this PR's changes, it would be good to correct this in the future for clarity.

request_conf: vllm_api_general_chat
dataset_conf: gsm8k/gsm8k_gen_0_shot_cot_chat_prompt
max_out_len: 7680
batch_size: 512
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The benchmark batch_size is set to 512, which is significantly larger than the server's configured max-num-seqs of 32 (as seen on lines 26 and 43).

This large discrepancy can lead to:

  • Request queuing: The server will queue the excess requests, as it can only process 32 at a time.
  • Increased test time: The PR notes a 30-minute test time, which might be partly due to this queuing.
  • Potential timeouts: The client might time out waiting for responses.

To ensure efficient testing and avoid potential stability issues, it's recommended to align the batch_size with the server's capacity.

    batch_size: 32

@github-actions
Copy link

github-actions bot commented Dec 8, 2025

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@wangxiyuan
Copy link
Collaborator

@Potabk @zhangxinyuehfad


acc:
case_type: accuracy
dataset_path: vllm-ascend/gsm8k
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dataset_path: vllm-ascend/gsm8k
dataset_path: vllm-ascend/gsm8k-lite

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gsm8k-lite is enough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants