VQA: Asking Questions About Image Content

Published in North South University, 2022

Visual Question Answering (VQA) has piqued the interest of researchers in deep learning, computer vision, and natural language processing, a relatively recent challenge in these fields. VQA requires an algorithm to respond to text-based inquiries about pictures. Because many open-ended answers comprise a few words or a closed set of options in a multiple choice format, VQA adapts itself to automated review. We used the combination of algorithms CNN (for the Image part) and LSTM/GRU/CNN (for the linguistic part), and a second algorithm which uses Vision Transformer (ViT)(for the Image part) and BERT(for the linguistic part).

Project Image

Project Demo
Download Capstone Poster here
Supervisor: Prof. Mohammad Ashrafuzzman Khan
Authors: S M Gazzali Arafat Nishan, Ashfaq Uddin Ahmed,Tashfia Tabassum

Share on

Twitter Facebook LinkedIn

S M G Arafat N.

Share on