TJ Solergibert
TJ Solergibert
Home
Blog
Experience
Transformers
A Deep Dive into 3D Parallelism with Nanotron⚡️
In this post, we will present 3D parallelism, the technology behind large-scale LLM training. We will delve into the core details of pipeline, tensor, and data parallelism with code snippets from Nanotron⚡️, a 3D parallel trainer from Hugging Face🤗
Antoni-Joan Solergibert
Jun 10, 2024
13 min read
Transformers training in supercomputers with Hugging Face 🤗 and Slurm
In this post we will analyze the scalability with up to a total of 4 nodes of the training of a transformer with PyTorch’s DistributedDataParallel strategy using Hugging Face 🤗 and Slurm
Antoni-Joan Solergibert
Last updated on Nov 29, 2023
10 min read
Cite
×