Dorylus @ OSDI 2021

Table of Contents

Paper link: https://www.usenix.org/system/files/osdi21-thorpe.pdf

Background (AWS Lambda)

Lambda is a service provided by AWS for serverless computing, where user can upload their code and AWS would allocate resources and manage servers for the program.

Motivation

GPU is not a good choice for GNN training:

  • GPU-based GNN training is expensive and under-utilizes GPU resources (not affordable).
  • GPU's limited memory hinders scalability (not scalable).

Other approaches to reduce training costs have some drawbacks:

  • CPU provide limited parallelism.
  • Sampling-based training harms accuracy (NOTE: I don't agree with this argument).

Proposal: CPU servers + serverless computing.

  • affordable: pay only for what you use.
  • scalable: larger memory

Challenges

  • limited compute resources
  • restricted network resources

TODO Solution

My review

What's appealing to me about this paper is usage of serverless computing: in case that cloud providers do not offer hardware suitable for your workloads (e.g. either too much GPU units, too much CPU resources, too much network traffic), serverless is a good idea if you can properly decompose your workload to "atomic components" so that each of them fully utilizes the resources you paid for.

p.s. Junru mentions that AWS Lambda used to suffer performance fluctuation.

Author: expye(Zihao Ye)

Email: expye@outlook.com

Date: 2022-07-28 Thu 00:00

Last modified: 2022-12-04 Sun 02:08

Licensed under CC BY-NC 4.0