Optimizing a WebGPU Matmul Kernel for 1 TFLOP

(zanussbaum.substack.com)

171 points | by zanussbaum 4 days ago ago

86 comments