Transformer was originally proposed for Natural Language Processing (NLP) tasks such as machine translation, sentiment analysis, etc. While the Transformer architecture became extremely successful in the NLP field, this paper presents a new vision Transformer, called Swin Transformer, that serves as a general-purpose backbone for computer vision.

It builds hierarchical feature maps by merging image patches in deeper layers and has linear computation complexity to input image size due to computation of self-attention only within each local window. It can thus serve as a general-purpose backbone for both image classification and dense recognition tasks. In contrast, previous vision Transformers produce feature maps of a single low resolution and have quadratic computation complexity to input image size due to computation of self-attention globally.

There are various challenges in adapting Transformer from NLP to vision tasks including the large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text. With the smart design of Swin Transformer, it is compatible with a broad range of vision tasks, including image classification (87.3 top-1 accuracy on ImageNet-1K) and dense prediction tasks such as object detection (58.7 box AP and 51.1 mask AP on COCO test-dev) and semantic segmentation (53.5 mIoU on ADE20K val).

Its performance surpasses the previous state-of-the-art by a large margin of +2.7 box AP and +2.6 mask AP on COCO, and +3.2 mIoU on ADE20K, demonstrating the large potential of Transformer-based models in the Computer Vision field.

Transformers are complex networks that require a large amount of computing power. Business Systems International (BSI) is the largest Nvidia GPU & DELL server supplier in Europe and we provide custom solutions of complete AI Machine Learning environments that enable the training of complex machine learning models such as the Transformer.

This article was provided by our AI researcher Bill Shao.

To learn more...

Our AI technology solutions can be viewed here and our AI inception programme here.

Get in touch to discover how we could optimise your business with AI.