Vision Transformer (ViT) has been attracting increasing attention in the community of computer vision in recent years. Compared with traditional convolutional neural networks, ViT is more feasible for capturing global relationships of image patches with the multi-head self-attention module.

Despite ViT’s outstanding performance on many vision tasks, the huge computational cost becomes a significant hurdle to the deployment of ViT in real-world applications where computing resource is limited.

To alleviate this obstacle, many efficient ViT architectures have been developed. Meanwhile, some research concentrates on accelerating the existing ViT models via dynamic network techniques, such as dynamic token pruning. In this seminar, I will firstly give an overview of the Vision Transformer architecture and its acceleration methods. Then I will introduce two works that we have done to resolve the problem.


Xuwei Xu is a second-year PhD student in the Data Science group, ITEE. He received his Bachelor's Degree from the Australian National Univerisity. He is currently working towards his PhD degree under the supervision of Dr Sen Wang and Dr Jiajun Liu. His research interests include Vision Transformer, Neural Architecture Search and efficient networks.


Dr Sen Wang

This session will be conducted via Zoom: https://uqz.zoom.us/j/89362232168

About Data Science Seminar

This seminar series will be run as weekly sessions and is hosted by ITEE Data Science.