Knowledge Informed Sequential Scene Graph Verification Using VQA

Abstract

We propose a new task, non localized scene graph verification, whose objective is to provide a justified expression of inconsistencies between the visual content of the image and its non-localized scene graph in order to diagnose errors or anticipate corrections. We introduce a sequential algorithm capable of detecting and proposing plausible corrections, taking into account the information already present in the scene graph and exploiting knowledge priors. Instead of relying on object detection that requires bounding box annotations, we use a simple visual question answering (VQA) as a proxy for visual content analysis. We show on the VG150 dataset that our strategy is efficient compared to a baseline adapted from a caption editing approach. We also show that our algorithm is able to efficiently correct corrupted scene graphs.

Publication
Proceedings of the IEEE/CVF International Conference on Computer Vision