SpArch @ HPCA 2020
Table of Contents
Paper link: https://arxiv.org/pdf/2002.08947.pdf
Motivation and key idea
- SpGeMM on CPU/GPU is slow
- inner product: poor input reuse, good output reuse
- outer product: poor output reuse, good input reuse
- SpArch jointly optimizes input reuse and output reuse
OuterSPACE
OuterSPACE is a outer-product based SpGEMM accelerator.
The partial sparse matrices \(A[:,i] \times B[i, :]\) need to be stored in DRAM before mering, incurs extensive DRAM access.
Proposed solution
- pipelined merger: multiply and merge
- slow, number of partial matrices exceeds merger's parallelism
- codensed matrix representation for \(A\)
- huffman tree scheduler
- reduce partial matrix DRAM access
- row prefetecher
- reduce input matrix DRAM access
Better Output Reuse
Pipelined Multiply and Merge
Merger Design
Instead of merging two sparse vectors in a serial way, SpArch propose to parallelize merging in space (Comparator Array).
All the results are generated in single clock cycle, because there is no input data dependency.
Hierarchical Merger
Matrix Codensing
Matrix Codensing moves non-zero elements to the left. Reducing the number of partial matrices.