SpArch @ HPCA 2020

Table of Contents

Paper link: https://arxiv.org/pdf/2002.08947.pdf

Motivation and key idea

  • SpGeMM on CPU/GPU is slow
  • inner product: poor input reuse, good output reuse
  • outer product: poor output reuse, good input reuse
  • SpArch jointly optimizes input reuse and output reuse

OuterSPACE

OuterSPACE is a outer-product based SpGEMM accelerator.

The partial sparse matrices \(A[:,i] \times B[i, :]\) need to be stored in DRAM before mering, incurs extensive DRAM access.

Proposed solution

  • pipelined merger: multiply and merge
    • slow, number of partial matrices exceeds merger's parallelism
  • codensed matrix representation for \(A\)
  • huffman tree scheduler
    • reduce partial matrix DRAM access
  • row prefetecher
    • reduce input matrix DRAM access

Better Output Reuse

Pipelined Multiply and Merge

Merger Design

Instead of merging two sparse vectors in a serial way, SpArch propose to parallelize merging in space (Comparator Array).

2022-07-31_18-15-29_screenshot.png

All the results are generated in single clock cycle, because there is no input data dependency.

Hierarchical Merger

Matrix Codensing

Matrix Codensing moves non-zero elements to the left. Reducing the number of partial matrices.

2022-07-31_18-18-39_screenshot.png

Better Input Reuse

Author: expye(Zihao Ye)

Email: expye@outlook.com

Date: 2022-07-31 Sun 00:00

Last modified: 2024-07-04 Thu 10:35

Licensed under CC BY-NC 4.0