sparse training - Robuta Search

https://openreview.net/forum?id=OWmu3QOa0O&referrer=%5Bthe%20profile%20of%20Nolan%20Simran%20Dey%5D(%2Fprofile%3Fid%3D~Nolan_Simran_Dey1)

Sparse maximal update parameterization: A holistic approach to sparse training dynamics | OpenReview

Several challenges make it difficult for sparse neural networks to compete with dense models. First, setting a large fraction of weights to zero impairs...

holistic approach sparse maximal update parameterization

https://huggingface.co/papers/2111.09839

Paper page - Training Neural Networks with Fixed Sparse Masks

Join the discussion on this paper page

neural networks paper training fixed sparse

https://openreview.net/forum?id=HddmvY8XvEt&referrer=%5Bthe%20profile%20of%20Mike%20Lasby%5D(%2Fprofile%3Fid%3D~Mike_Lasby1)

Dynamic Sparse Training with Structured Sparsity | OpenReview

Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training, matching the generalization of dense models while...

sparse training dynamic structured sparsity openreview

https://openreview.net/forum?id=WDgV1BJEW0&referrer=%5Bthe%20profile%20of%20Yongduo%20Sui%5D(%2Fprofile%3Fid%3D~Yongduo_Sui1)

Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological...

Graph Neural Networks (GNNs) excel in various graph learning tasks but face computational challenges when applied to large-scale graphs. A promising solution...

two heads sparse training better one boosting

https://openreview.net/forum?id=8abNCVJs2j

S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training | OpenReview

Training deep neural networks (DNNs) is costly. Fortunately, Nvidia Ampere and Hopper GPUs can accelerate matrix multiplications twice as fast as a dense...

ste continuous pruning function efficient