WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@RezaYazdaniAminabadi
Copy link
Contributor

After refactoring the running scripts, we missed to pass the local_rank argument that the Transformer kernel requires to run on multiple GPUs. I add it to the transformer_kernel configuration. Also the torch.distributed needs to be initialized before the model is created in nvidia_run_squad_deepspeed.py, otherwise, it fails when running the baseline. The rest of the changes is due to the formatting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants