https://openreview.net/forum?id=_E4NvIvrIL&referrer=%5Bthe%20profile%20of%20Humam%20Alwassel%5D(%2Fprofile%3Fid%3D~Humam_Alwassel1)
Video action detectors are usually trained using datasets with fully-supervised temporal annotations. Building such datasets is an expensive task. To alleviate...
iterative refinementsupervisedactionlocalization