RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      KCI등재

      딥러닝 기반 한국 고전한문 표점 추론 자동화 모델의 구축과 활용 = Development and Application of a Deep Learning–Based Model for Automated Punctuation Inference in Korean Classical Chinese

      한글로보기

      https://www.riss.kr/link?id=A110055995

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract) kakao i 다국어 번역

      This study compiles and refines collated and punctuated Classical Chinese texts accumulated through prior research and projects to construct a database of approximately 3.4 million items (≈420 million characters). Building on this resource, we develop a punctuation inference model specialized for Korean Classical Chinese by fine-tuning the pretrained deep learning language model Chinese-RoBERTa into a multi-label token classification architecture. The training corpus—covering eight genres including annals, collected works, and diaries—was preprocessed and standardized to seven punctuation marks (, 。 · ? ! 《 》). The final model achieves an overall F1 score of 0.9050 on held-out validation data. On unseen corpora containing only traditional ring-dot punctuation (Hanguk Munjip Chonggan and Ilseongnok), the model attains F1 scores of 0.8784 and 0.9065, respectively, for punctuation-position matching. By punctuation type, question marks, commas, periods, and middle dots exhibit high performance, whereas book-title brackets (《》)— which require long-range dependencies in paired structures—and exclamation marks—sparse in the data—show lower recall. We release an open-source integrated system—including model weights, training data, source code, and GUI/CLI batch processing—to support records and information services and research workflows using natural-language analysis, such as text preprocessing, indexing and search, translation preprocessing, and OCR postprocessing. Future work includes a dual-path architecture for paired punctuation, genre-adaptive modules, and multi-task integration with sentence-structure analysis and named-entity recognition.
      번역하기

      This study compiles and refines collated and punctuated Classical Chinese texts accumulated through prior research and projects to construct a database of approximately 3.4 million items (≈420 million characters). Building on this resource, we devel...

      This study compiles and refines collated and punctuated Classical Chinese texts accumulated through prior research and projects to construct a database of approximately 3.4 million items (≈420 million characters). Building on this resource, we develop a punctuation inference model specialized for Korean Classical Chinese by fine-tuning the pretrained deep learning language model Chinese-RoBERTa into a multi-label token classification architecture. The training corpus—covering eight genres including annals, collected works, and diaries—was preprocessed and standardized to seven punctuation marks (, 。 · ? ! 《 》). The final model achieves an overall F1 score of 0.9050 on held-out validation data. On unseen corpora containing only traditional ring-dot punctuation (Hanguk Munjip Chonggan and Ilseongnok), the model attains F1 scores of 0.8784 and 0.9065, respectively, for punctuation-position matching. By punctuation type, question marks, commas, periods, and middle dots exhibit high performance, whereas book-title brackets (《》)— which require long-range dependencies in paired structures—and exclamation marks—sparse in the data—show lower recall. We release an open-source integrated system—including model weights, training data, source code, and GUI/CLI batch processing—to support records and information services and research workflows using natural-language analysis, such as text preprocessing, indexing and search, translation preprocessing, and OCR postprocessing. Future work includes a dual-path architecture for paired punctuation, genre-adaptive modules, and multi-task integration with sentence-structure analysis and named-entity recognition.

      더보기

      동일학술지(권/호) 다른 논문

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼