Named entity recognition (NER) technique can play a crucial role in extracting information from the web. While NER systems with relatively high performances have been developed based on careful manipulation of terms with a statistical model, term mism...
Named entity recognition (NER) technique can play a crucial role in extracting information from the web. While NER systems with relatively high performances have been developed based on careful manipulation of terms with a statistical model, term mismatches often degrade the performance of such systems because the strings of all the candidate entities are not known a priori. Despite the importance of lexical-level term mismatches for NER systems, however, most NER approaches developed to date utilize only the term string itself and simple term-level features, and do not exploit the semantic features of terms which can handle the variations of terms effectively. As a solution to this problem, here we propose to match the semantic concepts of term units in restaurant named entities (NEs), where these units are automatically generated from multiple resolutions of a semantic tree. As a test experiment, we applied our restaurant NER scheme to 49,153 nouns in Korean restaurant web pages. Our scheme achieved an average accuracy of 87.89% when applied to test data, which was considerably better than the 78.70% accuracy obtained using the baseline system.