TL;DR: This paper proposes a new functional unit K-ADD to replace the K-LUT unit in FPGA to accelerate sparse integer matrix multiplication.

## Motivation

### Previous Work

Note that the sparsity of this work only helps saving energy, not area/time.

### LUTs and Clusters

Suppose FPGA LUT size (number of inputs) is $$K$$,

• smaller $$K$$ is more area efficient
• larger $$K$$ reduces delay of routable interconnect, which improves performance.

#### Background

cluster size
number of LUTs inside a cluster (block).
Ultrascale clb
https://docs.xilinx.com/v/u/D5oEbljIJnFtiM166itDbA

Large cluster sizes of earlier FPGAs may not be optimal.