Transformers

A Deep Dive into 3D Parallelism with Nanotron⚡️

In this post, we will present 3D parallelism, the technology behind large-scale LLM training. We will delve into the core details of pipeline, tensor, and data parallelism with code snippets from Nanotron⚡️, a 3D parallel trainer from Hugging Face🤗

Antoni-Joan Solergibert

Jun 10, 2024 13 min read

A Deep Dive into 3D Parallelism with Nanotron⚡️

Transformers training in supercomputers with Hugging Face 🤗 and Slurm

In this post we will analyze the scalability with up to a total of 4 nodes of the training of a transformer with PyTorch’s DistributedDataParallel strategy using Hugging Face 🤗 and Slurm

Antoni-Joan Solergibert

Last updated on Nov 29, 2023 10 min read

Transformers training in supercomputers with Hugging Face 🤗 and Slurm