Notes on CAAQA 6th Edition

Table of Contents

Chapter 4: Vector Architecture


for (i = 0; i < n; ++i)
  A[K[i]] = A[K[i]] + C[M[i]];

Though indexed loads and stores can be pipelined, they typically run very slow because of unpredicted memory access pattern.

Carefully design memory system can deliver better performance by utilizing more hardware resources.

For GPUs, programmers need to ensure all addresses in scatter/gather are to adjacent locations for efficient unit-stride access to memory.

Chapter 7: Domain Specific Architectures

The challenge for DSA is to find a target whose demand is large enough.

Nonrecurrent Enginnering (NRE) cost of a custom chip and supporting softwares are amortized over the number of chips manufactured. It's not reasonable to design a DSA w/ only 1000 users.

FPGA has lower NRE then ASIC, however, hardware is not as efficient as ASIC.


Author: Zihao Ye


Date: 2021-11-09 Tue 00:00

Last modified: 2022-12-27 Tue 07:18

Licensed under CC BY-NC 4.0