This research aimed to enhance Korean language models' handling of Personal Identification Numbers(PINs) by creating a Korean Personal Identifiable Information(PII) annotation system and a related training dataset. The study mainly focused on distingu...
This research aimed to enhance Korean language models' handling of Personal Identification Numbers(PINs) by creating a Korean Personal Identifiable Information(PII) annotation system and a related training dataset. The study mainly focused on distinguishing complex numeric PINs like Resident Registration Numbers(RRNs) and Alien Registration Numbers(ARNs). It found that transformer-based Korean Language Models(LMs) struggled to differentiate these PINs, while multilingual Large Language Models(LLMs) were more effective, particularly in inferring RRNs, nationality, and age. The findings underscore the importance of regularly updating the Korean PINs dataset and developing specialized language models for more accurate PIN detection and inference.