deepseek v3.2 prefill throughput decrease

I tried sglang 0.5.6 to run TP8 colocate on H20, but the throughput decrease unexpectedly compared to 0.5.5.
cmd line:
python3 -m sglang.launch_server --model-path /models/DeepSeek-V3.2 --dp 8 --enable-dp-attention --trust-remote-code --attention-backend  nsa --nsa-prefill flashmla_sparse --nsa-decode flashmla_sparse --max-total-tokens 128000 --enable-metrics --mem-fraction-static 0.9 --max-running-requests 8 --enable-cache-report --page-size 64 --tp-size 8 --skip-server-warmup --disable-overlap-schedule --decode-log-interval 1 --speculative-algorithm EAGLE  --speculative-num-steps 2 --speculative-eagle-topk 1 --speculative-num-draft-tokens 3  --chunked-prefill-size 16384 --disable-chunked-prefix-cache --disable-radix-cache 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

deepseek v3.2 prefill throughput decrease #14621

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

deepseek v3.2 prefill throughput decrease #14621

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions