Boqueria - Next Generation At-Memory Inference Acceleration Device with 1000+ RISC-V Cores @ HotChips 2022

Table of Contents



  • Data movement is the costliest part of inference (90% energy consumption)
  • Optimizing compute architecture to minimizing distance travelled results in inference-specific AI accelerators.
  • Right balance between coarse-grained and fine-grained approach (WHAT IS THAT?)
  • Utilize most efficient data types.

Boqueria propose "at-memory computation" to reduce the distance from processing element and memory. It's different from In-Memory Computation whose computes units resides on memory.

Overall Architecture

  • 729 memory banks in total, each has a dual RISC-V processor.
  • 1.35 GHz, TSMC 7nm
  • 30 TFLOPs/W (I have no idea how good it is)
  • 238MB on-chip SRAM (328KB per memory bank)
  • 1 PB/s SRAM bandwidth

Memory Bank Design

  • Each RISC-V manages 4 row controllers.
  • Row controllers operates independently. (64 SIMD PEs)
  • Rotator cuff moves activations bewteen nearest neighbor PEs.
  • 8 E/W NOC (7GB/s, bi-directional)
  • 1 N/S NOC (70GB/s, bi-directional)

SRAM Array & Processing Element Design


  • Low Power SRAM Array (0.4V datapath operation)
  • Processing elements include int4/8, fp8 and bf16, detect zero to save power, structured sparsity and dedicated circuitry for softmax/layernorm.

Custom RISC-V Processor

  • Standard RV32EMC instruction set + some custom instructions
  • Each processor has 6KB memory, 32-bit ALU, 32-bit multiplier, x16 register file and 4-way context switching.

High-bandwidth I/O

  • I/O Ring NOC (141GB/s in both clockwise and counter-clockwise direction).
  • 1.5 TB/s E/W throughput and 1.9 TB/s N/S throughput
  • X16PCIe Gen5 for host connectivity (63 GB/s)
  • X8PCIe Gen5 for intra-chip connectivity (31.5GB/s)
  • 4MB scratchpad for data manipulation (why?) and 32GB of external LPDDR5 (>100GB/s)

Author: expye(Zihao YE)

Email: expye@ou

Date: 2022-11-15 Tue 00:00

Last modified: 2022-12-27 Tue 07:18

Licensed under CC BY-NC 4.0