SparseLNR @ ICS 2022

Table of Contents

paper link:


TACO schedule is sub-optimal for some of the (fused) sparse operators.

SparseLNR proposes Branch Iteration Graph that breakdown the computation into smaller ones and then fuse outermost dimensions of the loop nests, while the innermost dimensions are distributed.

SparseLNR can describe the scheduling space of FusedMMa operator.

Prerequisite (FusedMM)

FusedMM is a template of fusing SpMM and SDDMM, and use vectorized instructions to accelerate the computation.

The author conduct experiments on CPU, and obtained significant speedup, this is somehow desirable because I haven't implemented any optimizations on CPU for DGL.

My Question

How to implement the back propagation FusedMM operator? (especially, GAT is more than fused SDDMM and SpMM, there is also sparse softmax).

I don't think this paper address the problem properly and they do not give any example of GAT.

Motivating Example

Below figure showing the different way to schedule a single sparse operator.


In the language of TVM, these schedules can be implemented with rfactor primitive.

Core Idea

Extend the Iteration Graph to enable branches.

The schedules in SparseLNR are basically rfactor and compute_at~/~reverse_compute_at in TVM. In SparseTIR, we can also support them in Stage-I if we want the created temporary buffer to be sparse.

My Digest

I had a chat with SparseLNR group.

Author: expye(Zihao Ye)


Date: 2022-08-10 Wed 00:00

Last modified: 2022-09-23 Fri 00:04

Licensed under CC BY-NC 4.0