Notes on Google's Pathway

Table of Contents

Pathway1 is a Distributed Machine Learning System released by Google recently, with the help of Pathway, Google successfully scales Language Models to 540 Billion Parameters (the PaLM) 2. James Bradbury, lead of JAX team, claims3 that PaLM was trained on 6144 TPU v4 chips across two pods, and it achieves 46.2% end-to-end FLOPs utilization: a marvelous record.

In this article I'll go through the paper and briefly introduce the design highlights of Pathway.

Why Pathway

System Architecture

Discussions

Why designed so?

The simple question is the underlying hardwares decides the system design.

TODO My two cents on Sparsity

I'm working on compiler for sparse workloads in Deep Learning.

Controversy

Jinhui Yuan has written an article comparing Oneflow and Pathway: https://hub.baai.ac.cn/view/16314

My digest:

Footnotes:

Author: expye(Zihao Ye)

Email: expye@outlook.com

Date:

Last modified: 2022-12-27 Tue 07:18

Licensed under CC BY-NC 4.0