In the domain of visual scene understanding, deep neural networks have significantly advanced core tasks such as segmentation, tracking, and detection. However, these approaches typically operate under a closed-set assumption, restricting the model to...
In the domain of visual scene understanding, deep neural networks have significantly advanced core tasks such as segmentation, tracking, and detection. However, these approaches typically operate under a closed-set assumption, restricting the model to recognizing only those categories present within the training dataset. Recent advancements in visionlanguage pre-training have driven a shift towards open vocabulary settings, aiming to identify and categorize objects beyond the predefined label space in the training data. A survey paper is necessary to systematically organize the main studies in the field of open vocabulary segmentation and to suggest directions for future research. This paper investigates open vocabulary semantic segmentation, focusing on segmenting objects from arbitrary classes. We first provide a brief overview of the concept of open vocabulary segmentation and then classify its various sub-methodologies. We then examine detailed research directions for each methodology and conclude with a discussion on potential future research directions.