RISS 검색 - 국내학술지논문 상세보기

국문 초록 (Abstract)

딥보이스 기술은 TTS(Text-to-Speech), 음성 복제, 음성 변환 등 다양한 산업에서 활용되고 있으나, 가 짜 뉴스 유포나 보이스피싱 등 악용 가능성으로 인해 심각한 사회적 문제를 야기할 수 있다. 이를 탐지하기 위해 다양한 딥보이스 탐지 기술이 연구되고 있으나, 탐지 모델은 적대적 공격(adversarial attack)에 취약하다는 한계 가 있다. 본 논문에서는 ASVspoof 2021 및 WaveFake 데이터셋을 사용하여 RawNet3 기반 딥보이스 탐지 모델 을 구현하고, FGSM, PGDL2, FAB 적대적 공격에 따른 탐지 성능을 EER(Equal Error Rate) 지표로 분석하였 다. 또한, 논문에서는 적대적 공격에 대응하기 위한 적응형 적대적 훈련 기법을 제안하였다. 제안된 기법은 적대적 샘플 탐지 성능을 향상시키는 동시에 원본 데이터 성능도 유지하였으며, 기존 적대적 훈련 방식 대비 EER을 4.90%에서 4.12%로 낮추는 등 딥보이스 탐지 성능이 우수함을 확인하였다.

번역하기

딥보이스 기술은 TTS(Text-to-Speech), 음성 복제, 음성 변환 등 다양한 산업에서 활용되고 있으나, 가 짜 뉴스 유포나 보이스피싱 등 악용 가능성으로 인해 심각한 사회적 문제를 야기할 수 있다. ...

다국어 초록 (Multilingual Abstract)

Deep voice technology is being used in various industries such as TTS(Text-to-Speech), voice cloning, and voice conversion, but it can cause serious social problems due to the possibility of misuse such as fake news distribution and voice phishing. Despite ongoing research into various deep voice detection techniques, detection models are still vulnerable to adversarial attacks. In this paper, we implement a RawNet3-based deep voice detection model using ASVspoof 2021 and WaveFake datasets, and analyze the detection performance against FGSM, PGDL2, and FAB adversarial attacks using the EER (Equal Error Rate) metric. Furthermore, we propose an adaptive adversarial training technique to counter adversarial attacks. The proposed technique improves adversarial sample detection performance while maintaining the original data performance, and it is confirmed that the deep voice detection performance is excellent, such as reducing EER from 4.90% to 4.12% compared to the existing adversarial training method.

번역하기