Uni-OVSeg: Weakly-Supervised Open-Vocabulary Segmentation with Cutting-Edge Performance

In conclusion, this paper proposes an innovative framework for weakly-supervised open-vocabulary segmentation, named Uni-OVSeg. Using independent image-text and image-mask pairs, Uni-OVSeg effectively reduces the dependency on labour-intensive image-mask-text triplets, meanwhile achieving impressive segmentation performance in open-vocabulary settings. Using the LVLM to refine text descriptions and multi-scale ensemble to enhance the quality of region embeddings, we alleviate the noise in masktext correspondences, achieving substantial performance improvements. Notably, Uni-OVSeg significantly outper

Table 4. Mask classification performance. we first pool the region features based on the provided ground truth masks. Thesepooled features are then projected into the CLIP embedding space, where they are classified using text embeddings. We report the Top-1 accuracy (%) and time (sec. / sample).

forms previous state-of-the-art weakly-supervised methods and even surpasses the cutting-edge fully-supervised method on the Challenging PASCAL Context-459 dataset. This impressive advancement demonstrates the superiority of our proposed framework and paves the way for further research.

文章来源: https://hackernoon.com/uni-ovseg-weakly-supervised-open-vocabulary-segmentation-with-cutting-edge-performance?source=rss
如有侵权请联系:admin#unsafe.sh