AICVTG : Automatic Image Captioning

Published in North South University, 2023

The objective is to explore and implement image captioning using a combination of computer vision and natural language processing. The project aims to highlight the challenges in generating textual descriptions for images, emphasize the potential societal and economic benefits, propose a solution with image-based models (Vision Transformer) and language-based models (GPT-2), discuss the complexity compared to traditional computer vision tassks, detail the methodology using the MSCOCO dataset, and evaluate model performance through various image captioning metrics.

Project Image

Figure 1: Block Diagram of Our Approach

Project Image

Figure 2. Vision Transformer Complete Network Architecture

Project Image

Figure 3: Our End-to-End ViT-GPT Framework

Project Demo

Authors: S M Gazzali Arafat Nishan, Shafiul Bashar, Md. Imran Hossain
Supervisor: Dr. Shafin Rahman