TJ Solergibert
TJ Solergibert
Home
Blog
Experience
Multinode Training
Transformers training in supercomputers with Hugging Face 🤗 and Slurm
In this post we will analyze the scalability with up to a total of 4 nodes of the training of a transformer with PyTorch’s DistributedDataParallel strategy using Hugging Face 🤗 and Slurm
Antoni-Joan Solergibert
Last updated on Nov 29, 2023
10 min read
Cite
×