-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Description
I tried sglang 0.5.6 to run TP8 colocate on H20, but the throughput decrease unexpectedly compared to 0.5.5.
cmd line:
python3 -m sglang.launch_server --model-path /models/DeepSeek-V3.2 --dp 8 --enable-dp-attention --trust-remote-code --attention-backend nsa --nsa-prefill flashmla_sparse --nsa-decode flashmla_sparse --max-total-tokens 128000 --enable-metrics --mem-fraction-static 0.9 --max-running-requests 8 --enable-cache-report --page-size 64 --tp-size 8 --skip-server-warmup --disable-overlap-schedule --decode-log-interval 1 --speculative-algorithm EAGLE --speculative-num-steps 2 --speculative-eagle-topk 1 --speculative-num-draft-tokens 3 --chunked-prefill-size 16384 --disable-chunked-prefix-cache --disable-radix-cache
Metadata
Metadata
Assignees
Labels
No labels