Accelerate

Transformers training in supercomputers with Hugging Face 🤗 and Slurm

In this post we will analyze the scalability with up to a total of 4 nodes of the training of a transformer with PyTorch’s DistributedDataParallel strategy using Hugging Face 🤗 and Slurm

Antoni-Joan Solergibert

Last updated on Nov 29, 2023 10 min read

Transformers training in supercomputers with Hugging Face 🤗 and Slurm