SOSP 2021 Overview

Table of Contents

Compared to OSDI these years, the number of papers related to Machine Learning was reduced significantly. Three best papers are mostly about storages.

Byzantine fault-tolerance

Finding bugs

iGUARD

This paper presents a binary instrument tool to detect possible race conditions.

Rudra

Graphs

FlashMob

The same team published KnightKing in SOSP two years ago. Graph random walk is famous for random memory access that cannot use cache effectively.

Instead of sampling next neighbor for a group of random individual nodes, FlashMob partitions nodes into batches, and decouples sampling into sample stage and shuffle stage:

sample stage
sample neighbors for a batch of nodes.
shuffle stage
send nodes to corresponding batches.

The paper also proposes to pre-sampling high degree nodes.

  • Such nodes were processed one at a time.
  • The edge-list was loaded to L1(L2) cache.
  • Store the pre-sampled neighbors for later use.

When we need to sample neighbors for high-degree nodes, we just need to lookup the pre-sampled table, which is sequential rather than random memory access.

The paper also proposed to group nodes w/ same (small) degrees together, and uses a packed layout rather than CSR so that we don't need to lookup the indices pointer table.

Learning

HiPress

Though gradient compression greatly reduced communication between nodes in Distributed Training, it cannot be reflected on end-to-end performance because compress/decompress has significant overhead.

This paper proposes HiPress, which composed of CASync as scheduling module and CompLL as runtime, that provides a user-friendly interface to write efficient code for gradient compression in Distributed Training.

CASync
A scheduling frameworks that overlaps I/O and compress/decompress and maximizes bandwidth utilization. It also involves a cost-model deciding how to divide compressed gradients.
CompLL
It's DSL for generating gradient compression CUDA kernels.

Flash storage

Kangaroo

This paper investigates how to cache massive tiny objects, and it is awarded with Best Paper this year.

Resource Allocation

MIND

Prerequisite

It's good to understand LegoOS1 first.

Smart NICs

LineFS

Another best paper.

Footnotes:

Author: expye(Zihao Ye)

Email: expye@outlook.com

Date: 2021-10-27 Wed 00:00

Last modified: 2021-11-27 Sat 00:50

Licensed under CC BY-NC 4.0