TJ Solergibert
TJ Solergibert
Home
Blog
Experience
Distributed PyTorch
A Deep Dive into 3D Parallelism with Nanotron⚡️
In this post, we will present 3D parallelism, the technology behind large-scale LLM training. We will delve into the core details of pipeline, tensor, and data parallelism with code snippets from Nanotron⚡️, a 3D parallel trainer from Hugging Face🤗
Antoni-Joan Solergibert
Jun 10, 2024
13 min read
Creating PyTorch Datasets and DataLoaders from Scratch: A Beginner's Guide
In this post, we will address the fundamental aspects of Torch’s Datasets and DataLoaders, considering an environment with
Data, Pipeline and Tensor Parallelism
and including functionalities to resume training after an interruption. We will present Nanosets, a custom dataset for LLM training at scale with Nanotron
Antoni-Joan Solergibert
May 11, 2024
16 min read
Transformers training in supercomputers with Hugging Face 🤗 and Slurm
In this post we will analyze the scalability with up to a total of 4 nodes of the training of a transformer with PyTorch’s DistributedDataParallel strategy using Hugging Face 🤗 and Slurm
Antoni-Joan Solergibert
Last updated on Nov 29, 2023
10 min read
Cite
×