OuterSPACE @ HPCA 2018

Table of Contents

OuterSPACE is a SpGEMM accelerator that exploits:

  1. outer-product based matrix multiplication
    1. multiply and merge
  2. software controlled scratchpad memory, non-coherent.

Advantage of Outer-Product

  1. No index-matching.
  2. Maximize the re-use of non-zeros.
  3. Minimize loads of a column and row.

Multiply Phase

Below is a figure illustrating the outer-product of two sparse matrices, the input matrix \(\mathbb{A}\) is stored in CSC format while \(\mathbb{B}\) is stored in CSR format, the output format could either be CSR or CSC.

2022-12-12_09-20-53_screenshot.png

The intermediate partial products are stored as a set of linked lists (see \(R_i\) / \(C_i\) in the figure).

Merge Phase

In this phase the processing elements in OuterSPACE walked through and linked list and merging values. The paper claims that merging is rare for highly sparse matrices.

Matrix Format Conversion

OuterSPACE also supports CSR/CSC conversion, by partition the conversion operator to conversion-load and conversion-merge phases, this is equivalent to multiplying the input matrix \(mathbb{A}\) with an identity matrix \(\mathbb{I}\):

\[\mathbb{I}_{CC}\times\mathbb{A}_{CR} \rightarrow \mathbb{A}_{CC} \]

OuterSPACE Architecture

2022-12-12_16-39-54_screenshot.png

Processing Element
Data-streaming and compute engine: ALU, scratchpad memory, a control unit, a work queue and outstanding request queue. PEs are grouped into processing tiles.
Local Control Processor
coordinates the PEs inside a tile, streaming instructions for the PE to execute.
Central Control Processor
power-efficient core that is responsible for scheduling work and allocating memory for intermediate data structures.
High-speed crossbars and coalescing caches
can be reconfigured into scratchpad memories.
HBM
stores input matrices and intermediate partial products.

Author: expye(Zihao YE)

Email: expye@outlook.com

Date: 2022-12-01 Thu 00:00

Last modified: 2024-07-04 Thu 10:35

Licensed under CC BY-NC 4.0