Train Your Large Model on Multiple GPUs with Tensor Parallelism - MachineLearningMastery.com

Tensor parallelism is a model-parallelism technique that shards a tensor along a specific dimension. It distributes the computation of a tensor across multiple devices with minimal communication ov...

By · · 1 min read
Train Your Large Model on Multiple GPUs with Tensor Parallelism - MachineLearningMastery.com

Source: MachineLearningMastery.com

Tensor parallelism is a model-parallelism technique that shards a tensor along a specific dimension. It distributes the computation of a tensor across multiple devices with minimal communication overhead. This technique is suitable for models with very large parameter tensors where even a single matrix multiplication is too large to fit on a single GPU. In […]