SpArch @ HPCA 2020

Table of Contents

Paper link:

Motivation and key idea

  • SpGeMM on CPU/GPU is slow
  • inner product: poor input reuse, good output reuse
  • outer product: poor output reuse, good input reuse
  • SpArch jointly optimizes input reuse and output reuse


OuterSPACE is a outer-product based SpGEMM accelerator.

The partial sparse matrices \(A[:,i] \times B[i, :]\) need to be stored in DRAM before mering, incurs extensive DRAM access.

Proposed solution

  • pipelined merger: multiply and merge
    • slow, number of partial matrices exceeds merger's parallelism
  • codensed matrix representation for \(A\)
  • huffman tree scheduler
    • reduce partial matrix DRAM access
  • row prefetecher
    • reduce input matrix DRAM access

Better Output Reuse

Pipelined Multiply and Merge

Merger Design

Instead of merging two sparse vectors in a serial way, SpArch propose to parallelize merging in space (Comparator Array).


All the results are generated in single clock cycle, because there is no input data dependency.

Hierarchical Merger

Matrix Codensing

Matrix Codensing moves non-zero elements to the left. Reducing the number of partial matrices.


Better Input Reuse

Author: expye(Zihao Ye)


Date: 2022-07-31 Sun 00:00

Last modified: 2022-12-27 Tue 07:18

Licensed under CC BY-NC 4.0