Dorylus @ OSDI 2021
Table of Contents
Paper link: https://www.usenix.org/system/files/osdi21-thorpe.pdf
Background (AWS Lambda)
Lambda is a service provided by AWS for serverless computing, where user can upload their code and AWS would allocate resources and manage servers for the program.
Motivation
GPU is not a good choice for GNN training:
- GPU-based GNN training is expensive and under-utilizes GPU resources (not affordable).
- GPU's limited memory hinders scalability (not scalable).
Other approaches to reduce training costs have some drawbacks:
- CPU provide limited parallelism.
- Sampling-based training harms accuracy (NOTE: I don't agree with this argument).
Proposal: CPU servers + serverless computing.
- affordable: pay only for what you use.
- scalable: larger memory
Challenges
- limited compute resources
- restricted network resources
TODO Solution
My review
What's appealing to me about this paper is usage of serverless computing: in case that cloud providers do not offer hardware suitable for your workloads (e.g. either too much GPU units, too much CPU resources, too much network traffic), serverless is a good idea if you can properly decompose your workload to "atomic components" so that each of them fully utilizes the resources you paid for.
p.s. Junru mentions that AWS Lambda used to suffer performance fluctuation.