Facebook research develops intelligent systems for AR/VR that can reason and answer questions about visual information

2021-11-01 The article translated by software

Visual Question Answering (VQA) aims to develop intelligent systems capable of reasoning and answering questions about visual information. To investigate this problem, early datasets focused on images as visual input. Recently, numerous QA benchmarks have been proposed in the industry to extend visual information from images to the video domain. While the image QA benchmark problem requires a system to learn cross-modal interactions, the video QA benchmark problem is not limited to capturing visual information with temporal variation. As an orthogonal extension of the VQA problem, another research direction is to study image/video VQA in a dialogue setting.

In this problem, questions about a given video or image are positioned over multiple turns of dialogue. In each dialogue turn, a question typically exhibits different types of cross-turn relationships with other questions in previous dialogue turns, such as object co-reference and topic alignment. In the study titled "DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue," teams from Facebook and Singapore Management University looked at multiple rounds of visual question answering.


from: news.nweon.com/91102

© 2020 www.ourvrworld.com