Robuta

https://openreview.net/forum?id=2u1xaPgbnU&referrer=%5Bthe%20profile%20of%20Shane%20Bergsma%5D(%2Fprofile%3Fid%3D~Shane_Bergsma1)
Continual pre-training (CPT) for domain adaptation must balance target-domain gains with stability on the base domain. Existing CPT scaling laws typically...
scaling lawsptppawareadaptationpredicting