NL-EYE: ABDUCTIVE NLI FOR IMAGES

Technion - Israel Institute of Technology
Google Research
NL-Eye real examples. Example from every reasoning category.

NL-Eye Examples. Each example represents a reasoning category, and contains 3 images: premise (left column), plausible hypothesis (middle column) and implausible hypothesis (right column). The plausible hypotheses are framed in green while the implausible in red. The explanations are provided below each sample.

Abstract

Will a Visual Language Model (VLM)-based bot warn us about slipping if it detects a wet floor? Recent VLMs have demonstrated impressive capabilities, yet their ability to infer outcomes and causes remains underexplored. To address this, we introduce NL-Eye, a benchmark designed to assess VLMs' visual abductive reasoning skills. NL-Eye adapts the abductive Natural Language Inference (NLI) task to the visual domain, requiring models to evaluate the plausibility of hypothesis images based on a premise image and explain their decisions. NL-Eye consists of 350 carefully curated triplet examples (1,050 images) spanning diverse reasoning categories: physical, functional, logical, emotional, cultural, and social. The data curation process involved two steps - writing textual descriptions and generating images using text-to-image models, both requiring substantial human involvement to ensure high-quality and challenging scenes. Our experiments show that VLMs struggle significantly on NL-Eye, often performing at random baseline levels, while humans excel in both plausibility prediction and explanation quality. This demonstrates a deficiency in the abductive reasoning capabilities of modern VLMs. NL-Eye represents a crucial step toward developing VLMs capable of robust multimodal reasoning for real-world applications, including accident-prevention bots and generated video verification.

BibTeX

@misc{ventura2024nleye,
        title={NL-Eye: Abductive NLI for Images},
        author={Mor Ventura and Michael Toker and Nitay Calderon and Zorik Gekhman and Yonatan Bitton and Roi Reichart},
        year={2024},
        eprint={2410.02613},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
    }