Abstract: A wide variety of graph algorithms expressed as linear algebra operations, i.e., triangle counting, k-truss analysis, breath first search, betweenness centrality, depend on the masked sparse ...
These days, large language models can handle increasingly complex tasks, writing complex code and engaging in sophisticated ...
Abstract: In this paper, a table lookup-based computing technique is proposed to perform convolutional neural network (CNN) inference without multiplication, and its FPGA implementation is ...
CUDA-L2 is a system that combines large language models (LLMs) and reinforcement learning (RL) to automatically optimize Half-precision General Matrix Multiply (HGEMM) CUDA kernels. CUDA-L2 ...