Author: Yiran Zhong, PhD

Yiran Zhong is currently the leader of the Multimodal Research Team in Sensetime Research and a principal investigator at Shanghai AI Laboratory. Prior to that, he received a Ph.D. degree in Engineering from The Australian National University, Canberra, Australia in 2021 and an M.Eng with the first class honor in information and electronics engineering from The Australian National University, Canberra, Australia, in 2014, and a B.E. degree from the University of Electronic Science and Technology of China in 2008. His research interests include self-supervised learning, visual geometry learning, multimodality learning, machine learning, and natural language processing. He won the ICIP Best Student Paper Award in 2014.

Author: Yiran Zhong, PhD

cosFormer: Rethinking Softmax In Attention

Vicinity Vision Transformer

Audio-Visual Segmentation