Can deep RL discover athletic high jump strategies, such as the Fosbury flop, Western roll, and more? Surprisingly, yes! Accepted to #SIGGRAPH2021. Very fun collaboration with Zhiqi Yin, Zeshi Yang, KangKang Yin (SFU). Paper + video: arpspoof.github.io/project/j… 1/2
Multiscale Vision Transformers
"We present Multiscale Vision Transformers (MViT) for
video and image recognition, by connecting the seminal idea of multiscale feature hierarchies with transformer models"
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures