Friday, December 26, 2025

GIE-Bench: In direction of Grounded Analysis for Textual content-Guided Picture Enhancing


Enhancing pictures utilizing pure language directions has develop into a pure and expressive method to modify visible content material; but, evaluating the efficiency of such fashions stays difficult. Present analysis approaches usually depend on image-text similarity metrics like CLIP, which lack precision. On this work, we introduce a brand new benchmark designed to judge text-guided picture enhancing fashions in a extra grounded method, alongside two vital dimensions: (i) purposeful correctness, assessed by way of routinely generated multiple-choice questions that confirm whether or not the meant change was efficiently utilized; and (ii) picture content material preservation, which ensures that non-targeted areas of the picture stay visually constant utilizing an object-aware masking method and preservation scoring. The benchmark contains over 1000 high-quality enhancing examples throughout 20 various content material classes, every annotated with detailed enhancing directions, analysis questions, and spatial object masks. We conduct a large-scale research evaluating GPT-Picture-1, the newest flagship within the text-guided picture enhancing house, in opposition to a number of state-of-the-art enhancing fashions, and validate our computerized metrics in opposition to human rankings. Outcomes present that GPT-Picture-1 leads in instruction-following accuracy, however usually over-modifies irrelevant picture areas, highlighting a key trade-off within the present mannequin conduct. GIE-Bench gives a scalable, reproducible framework for advancing extra correct analysis of text-guided picture enhancing.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles