Distributed PyTorch

A Deep Dive into 3D Parallelism with Nanotron⚡️

In this post, we will present 3D parallelism, the technology behind large-scale LLM training. We will delve into the core details of pipeline, tensor, and data parallelism with code snippets from Nanotron⚡️, a 3D parallel trainer from Hugging Face🤗

Antoni-Joan Solergibert

Jun 10, 2024 13 min read

A Deep Dive into 3D Parallelism with Nanotron⚡️

Creating PyTorch Datasets and DataLoaders from Scratch: A Beginner's Guide

In this post, we will address the fundamental aspects of Torch’s Datasets and DataLoaders, considering an environment with Data, Pipeline and Tensor Parallelism and including functionalities to resume training after an interruption. We will present Nanosets, a custom dataset for LLM training at scale with Nanotron

Antoni-Joan Solergibert

May 11, 2024 16 min read

Creating PyTorch Datasets and DataLoaders from Scratch: A Beginner's Guide

Transformers training in supercomputers with Hugging Face 🤗 and Slurm

In this post we will analyze the scalability with up to a total of 4 nodes of the training of a transformer with PyTorch’s DistributedDataParallel strategy using Hugging Face 🤗 and Slurm

Antoni-Joan Solergibert

Last updated on Nov 29, 2023 10 min read

Transformers training in supercomputers with Hugging Face 🤗 and Slurm