# OuterSPACE @ HPCA 2018

## Table of Contents

OuterSPACE is a SpGEMM accelerator that exploits:

- outer-product based matrix multiplication
**multiply**and**merge**

- software controlled scratchpad memory, non-coherent.

## Advantage of Outer-Product

- No index-matching.
- Maximize the re-use of non-zeros.
- Minimize loads of a column and row.

## Multiply Phase

Below is a figure illustrating the outer-product of two sparse matrices, the input matrix \(\mathbb{A}\) is stored in CSC format while \(\mathbb{B}\) is stored in CSR format, the output format could either be CSR or CSC.

The intermediate partial products are stored as a set of linked lists (see \(R_i\) / \(C_i\) in the figure).

## Merge Phase

In this phase the processing elements in OuterSPACE walked through and linked list and merging values. The paper claims that merging is rare for highly sparse matrices.

## Matrix Format Conversion

OuterSPACE also supports CSR/CSC conversion, by partition the conversion operator to **conversion-load** and **conversion-merge** phases, this is equivalent to multiplying the input matrix \(mathbb{A}\) with an identity matrix \(\mathbb{I}\):

\[\mathbb{I}_{CC}\times\mathbb{A}_{CR} \rightarrow \mathbb{A}_{CC} \]

## OuterSPACE Architecture

- Processing Element
- Data-streaming and compute engine: ALU, scratchpad memory, a control unit, a work queue and outstanding request queue. PEs are grouped into processing tiles.
- Local Control Processor
- coordinates the PEs inside a tile, streaming instructions for the PE to execute.
- Central Control Processor
- power-efficient core that is responsible for scheduling work and allocating memory for intermediate data structures.
- High-speed crossbars and coalescing caches
- can be reconfigured into scratchpad memories.
- HBM
- stores input matrices and intermediate partial products.