https://www.modular.com/blog/software-pipelining-for-gpu-kernels-part-1-the-pipeline-problem
Modular: Software Pipelining for GPU Kernels: Part 1 - The Pipeline Problem
Mar 31, 2026 - Flash Attention is a simple algorithm: tiled back-to-back matmuls with an online softmax algorithm in between. The algorithm fits in a few dozen lines of...
gpu kernelsmodularsoftwarepipeliningpart
https://www.rightnowai.co/editor
RightNow AI - The Best All-in-One AI Code Editor for GPU Kernels | 2025 | RightNow AI
Jan 15, 2025 - RightNow AI is the best and only all-in-one AI-powered code editor for NVIDIA GPU kernel developers. Features custom agents, skills and MCP support, clear...
one codegpu kernelsrightnowaibest
https://docs.jax.dev/en/latest/notebooks/cute_dsl_jax.html
Writing High-Performance GPU Kernels with CuTe DSL and JAX — JAX documentation
high performancegpu kernelswritingcutedsl
https://www.modular.com/blog/tiletensor-part-1-safer-more-efficient-gpu-kernels
Modular: TileTensor Part 1 - Safer, More Efficient GPU Kernels
Apr 17, 2026 - Suppose you want to load a 2D tile of a matrix, where the tile is stored in shared memory in a specific interleaved layout to avoid bank conflicts. This...
modularpartsaferefficientgpu