SooHyun2i 2022. 5. 26. 23:18

Self-supervised learning 기반 Video Representation Learning 

 

training dataset : UCF/HMDB(small-scale), Kinetics 시리즈(medium-scale), AudioSet(large-scale)

 

Action recognition

- UCF 101 : pBYOL(A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning,CVPR2021), VideoMAE(VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training, 2021)

Masked autoencoders are scalable vision learners(FAIR 2021) 과 유사 -> Image 이고 위에는 video 버전

- HMDB51 : pBYOL, VideoMAE

 

Audio Classification

- ESC-50 : Broaden Your Views for Self-Supervised Video Learning(ICCV 2021)

 

Video retrieval

- UCF 101, HMDB51 : Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting(WACV 2022)