# SpArch @ HPCA 2020

## Table of Contents

Paper link: https://arxiv.org/pdf/2002.08947.pdf

## Motivation and key idea

- SpGeMM on CPU/GPU is slow
- inner product: poor input reuse, good output reuse
- outer product: poor output reuse, good input reuse
- SpArch jointly optimizes input reuse and output reuse

### OuterSPACE

OuterSPACE is a outer-product based SpGEMM accelerator.

The partial sparse matrices \(A[:,i] \times B[i, :]\) need to be stored in DRAM before mering, incurs extensive DRAM access.

### Proposed solution

- pipelined merger: multiply and merge
- slow, number of partial matrices exceeds merger's parallelism

- codensed matrix representation for \(A\)
- huffman tree scheduler
- reduce partial matrix DRAM access

- row prefetecher
- reduce input matrix DRAM access

## Better Output Reuse

### Pipelined Multiply and Merge

#### Merger Design

Instead of merging two sparse vectors in a serial way, SpArch propose to parallelize merging in space (Comparator Array).

All the results are generated in single clock cycle, because there is no input data dependency.

#### Hierarchical Merger

### Matrix Codensing

Matrix Codensing moves non-zero elements to the left. Reducing the number of partial matrices.