Student seminar – VISION AND IMAGE PROCESSING RESEARCH LAB (VIP LAB)

November 22nd, 2024 – 11am-12pm, EC4-2101A

**11:00 am Presenter: Chang Liu**

Semantic segmentation tasks require expensive and time-consuming pixel-level annotations. Unsupervised domain adaptation (UDA) aims to transfer knowledge from a label-rich source domain to a target domain with no labels. Recently, vision-language models (VLMs) have shown promise for domain-adaptive classification, but remain under-explored for domain-adaptive semantic segmentation (DASS). Existing language-guided DASS methods align pixel-level features with generic class-wise prompts, which require target-domain knowledge, and do not leverage the intricate spatial relationships and object context endowed by language priors. In this work, we propose LangDA, the first domain-agnostic approach to explicitly induce context-awareness in language-driven DASS. In LangDA, we align image features with VLM-generated context-aware scene descriptions via a consistency objective. LangDA achieves state-of-the-art results on three adaptation benchmarks, outperforming existing methods by 3.9%, 2.6%, and 1.4%.