-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendCustomized kernels<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.bugSomething isn't workingSomething isn't workingtriagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Description
System Info
H100
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
build_and_run_ad.py --model THUDM/GLM-4-9B-0414 --args.model-factory AutoModelForCausalLM '--args.model-kwargs={}' --args.tokenizer null --args.world-size 2 --args.compile-backend torch-compile --args.attn-backend flashinfer --args.runtime trtllm --args.skip-loading-weights False --args.transforms.detect-sharding.simple-shard-only False --args.max-seq-len 512 --benchmark.enabled True --benchmark.results-path /jet/logs/basic/auto-deploy-model-coverage_ab-flashinfer_b-true_cb-torch-compile_m-thudm-glm-4-9b-0414_mf-automodelforcausallm_mk--_msl-512_r-trtllm_sso-false_sw-false_t-null_ws-2/extra.json --benchmark.store-results true
Expected behavior
Should pass
actual behavior
0: File "/opt/tensorrt-llm/tensorrt_llm/_torch/auto_deploy/transform/interface.py", line 358, in __call__
0: mod, info_apply = self._apply_per_gm_or_whole_model(mod, cm, factory, shared_config)
0: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0: File "/opt/tensorrt-llm/tensorrt_llm/_torch/auto_deploy/transform/interface.py", line 417, in _apply_per_gm_or_whole_model
0: graph_sub, info_apply = self._apply(graph_sub, cm, factory, shared_config)
0: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0: File "/opt/tensorrt-llm/tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py", line 243, in _apply
0: info += detect_sharding_from_config(gm, transform_container, ShardingSource.FACTORY)
0: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0: File "/opt/tensorrt-llm/tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py", line 666, in detect_sharding_from_config
0: _process_column_sharding(
0: File "/opt/tensorrt-llm/tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py", line 486, in _process_column_sharding
0: fused_weight_dims = [s.args[3] - s.args[2] for s in linear_node.users]
0: ~~~~~~^^^
0: IndexError: tuple index out of range
additional notes
NA
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Metadata
Metadata
Assignees
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendCustomized kernels<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.<NV>Specialized/modified CUDA kernels in TRTLLM for LLM ops, beyond standard TRT. Dev & perf.bugSomething isn't workingSomething isn't workingtriagedIssue has been triaged by maintainersIssue has been triaged by maintainers
Type
Projects
Status
Ready