AI-Native from Scratch: Co-Designing an AI Accelerator and its Programming Model
Achieving an AI accelerator's full performance requires low-level programming, so demanding that most developers remain in high-level frameworks and sacrifice it. The underlying cause is an abstraction gap: software reasons in tensors, hardware in threads and memory. Bridging them costs both ease and performance, and hides a workload's true cost until silicon exists and changes are costly.
This approach closes the gap by building both silicon and its programming model around tensors: the accelerator architecture and language express a single tensor-level operation. This shared abstraction allows a single kernel to be both easy to write and fully optimizable, and runs today on a current-generation AI accelerator.
The same model now guides the next architecture. By simulating today's kernels, we predict their cost accurately and evaluate datapath choices before tape-out. One design thus serves multiple chip generations and their programming models, from the current accelerator to what comes next.
연사자 이력
Tenure-Track Associate Professor at the KAIST School of Computing
Principal Investigator at the Concurrency and Parallelism Laboratory
fearless.systems/jeehoon.kang