RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      Model-based and data-driven techniques for environment-robust automatic speech recognition

      한글로보기

      https://www.riss.kr/link?id=T13925182

      • 0

        상세조회
      • 0

        다운로드
      서지정보 열기
      • 내보내기
      • 내책장담기
      • 공유하기
      • 오류접수

      부가정보

      다국어 초록 (Multilingual Abstract)

      In this thesis, we propose model-based and data-driven techniques for environment-robust automatic speech recognition. The model-based technique is the feature enhancement method in the reverberant noisy environment to improve the performance of Gaussian mixture model-hidden Markov model (HMM) system. It is based on the interacting multiple model (IMM), which was originally developed in single-channel scenario. We extend the single-channel IMM algorithm such that it can handle the multi-channel inputs under the Bayesian framework. The multi-channel IMM algorithm is capable of tracking time-varying room impulse responses and background noises by updating the relevant parameters in an on-line manner. In order to reduce the computation as the number of microphones increases, a computationally efficient algorithm is also devised. In various simulated and real environmental conditions, the performance gain of the proposed method has been confirmed.
      The data-driven techniques are based on deep neural network (DNN)-HMM hybrid system. In order to enhance the performance of DNN-HMM system in the adverse environments, we propose three techniques. Firstly, we propose a novel supervised pre-training technique for DNN-HMM system to achieve robust speech recognition in adverse environments. In the proposed approach, our aim is to initialize the DNN parameters such that they yield abstract features robust to acoustic environment variations. In order to achieve this, we first derive the abstract features from an early fine-tuned DNN model which is trained based on a clean speech database. By using the derived abstract features as the target values, the standard error back-propagation algorithm with the stochastic gradient descent method is performed to estimate the initial parameters of the DNN. The performance of the proposed algorithm was evaluated on Aurora-4 DB and better results were observed compared to a number of conventional pre-training methods.
      Secondly, a new DNN-based robust speech recognition approaches taking advantage of noise estimates are proposed. A novel part of the proposed approaches is that the time-varying noise estimates are applied to the DNN as additional inputs. For this, we extract the noise estimates in a frame-by-frame manner from the IMM algorithm which has been known to show good performance in tracking slowly-varying background noise. The performance of the proposed approaches is evaluated on Aurora-4 DB and better performance is observed compared to the conventional DNN-based robust speech recognition algorithms.
      Finally, a new approach to DNN-based robust speech recognition using soft target labels is proposed. The soft target labeling means that each target value of the DNN output is not restricted to 0 or 1 but takes non negative values in (0,1) and their sum equals 1. In this study, the soft target labels are obtained from the forward-backward algorithm well-known in HMM training. The proposed method makes the DNN training be more robust in noisy and unseen conditions. The performance of the proposed approach was evaluated on Aurora-4 DB and various mismatched noise test conditions, and found better compared to the conventional hard target labeling method.
      Furthermore, in the data-driven approaches, an integrated technique using above three algorithms and model-based technique is described. In matched and mismatched noise conditions, the performance results are discussed. In matched noise conditions, the initialization method for the DNN was effective to enhance the recognition performance. In mismatched noise conditions, the combination of using the noise estimates as an DNN input and soft target labels showed the best recognition results in all the tested combinations of the proposed techniques.
      번역하기

      In this thesis, we propose model-based and data-driven techniques for environment-robust automatic speech recognition. The model-based technique is the feature enhancement method in the reverberant noisy environment to improve the performance of Gauss...

      In this thesis, we propose model-based and data-driven techniques for environment-robust automatic speech recognition. The model-based technique is the feature enhancement method in the reverberant noisy environment to improve the performance of Gaussian mixture model-hidden Markov model (HMM) system. It is based on the interacting multiple model (IMM), which was originally developed in single-channel scenario. We extend the single-channel IMM algorithm such that it can handle the multi-channel inputs under the Bayesian framework. The multi-channel IMM algorithm is capable of tracking time-varying room impulse responses and background noises by updating the relevant parameters in an on-line manner. In order to reduce the computation as the number of microphones increases, a computationally efficient algorithm is also devised. In various simulated and real environmental conditions, the performance gain of the proposed method has been confirmed.
      The data-driven techniques are based on deep neural network (DNN)-HMM hybrid system. In order to enhance the performance of DNN-HMM system in the adverse environments, we propose three techniques. Firstly, we propose a novel supervised pre-training technique for DNN-HMM system to achieve robust speech recognition in adverse environments. In the proposed approach, our aim is to initialize the DNN parameters such that they yield abstract features robust to acoustic environment variations. In order to achieve this, we first derive the abstract features from an early fine-tuned DNN model which is trained based on a clean speech database. By using the derived abstract features as the target values, the standard error back-propagation algorithm with the stochastic gradient descent method is performed to estimate the initial parameters of the DNN. The performance of the proposed algorithm was evaluated on Aurora-4 DB and better results were observed compared to a number of conventional pre-training methods.
      Secondly, a new DNN-based robust speech recognition approaches taking advantage of noise estimates are proposed. A novel part of the proposed approaches is that the time-varying noise estimates are applied to the DNN as additional inputs. For this, we extract the noise estimates in a frame-by-frame manner from the IMM algorithm which has been known to show good performance in tracking slowly-varying background noise. The performance of the proposed approaches is evaluated on Aurora-4 DB and better performance is observed compared to the conventional DNN-based robust speech recognition algorithms.
      Finally, a new approach to DNN-based robust speech recognition using soft target labels is proposed. The soft target labeling means that each target value of the DNN output is not restricted to 0 or 1 but takes non negative values in (0,1) and their sum equals 1. In this study, the soft target labels are obtained from the forward-backward algorithm well-known in HMM training. The proposed method makes the DNN training be more robust in noisy and unseen conditions. The performance of the proposed approach was evaluated on Aurora-4 DB and various mismatched noise test conditions, and found better compared to the conventional hard target labeling method.
      Furthermore, in the data-driven approaches, an integrated technique using above three algorithms and model-based technique is described. In matched and mismatched noise conditions, the performance results are discussed. In matched noise conditions, the initialization method for the DNN was effective to enhance the recognition performance. In mismatched noise conditions, the combination of using the noise estimates as an DNN input and soft target labels showed the best recognition results in all the tested combinations of the proposed techniques.

      더보기

      목차 (Table of Contents)

      • Abstract i
      • Contents iv
      • List of Figures viii
      • List of Tables x
      • 1 Introduction 1
      • Abstract i
      • Contents iv
      • List of Figures viii
      • List of Tables x
      • 1 Introduction 1
      • 2 Experimental Environments and Database 7
      • 2.1 ASR in Hands-Free Scenario and Feature Extraction 7
      • 2.2 Relationship between Clean and Distorted Speech in Feature Domain 10
      • 2.3 Database 12
      • 2.3.1 TI Digits Corpus 13
      • 2.3.2 Aurora-4 DB 15
      • 3 Previous Robust ASR Approaches 17
      • 3.1 IMM-Based Feature Compensation in Noise Environment 18
      • 3.2 Single-Channel Reverberation and Noise-Robust Feature Enhancement Based on IMM 24
      • 3.3 Multi-Channel Feature Enhancement for Robust Speech Recognition 26
      • 3.4 DNN-Based Robust Speech Recognition 27
      • 4 Multi-Channel IMM-Based Feature Enhancement for Robust Speech Recognition 31
      • 4.1 Introduction 31
      • 4.2 Observation Model in Multi-Channel Reverberant Noisy Environment 33
      • 4.3 Multi-Channel Feature Enhancement in a Bayesian Framework 35
      • 4.3.1 A Priori Clean Speech Model 37
      • 4.3.2 A Priori Model for RIR 38
      • 4.3.3 A Priori Model for Background Noise 39
      • 4.3.4 State Transition Formulation 40
      • 4.3.5 Function Linearization 41
      • 4.4 Feature Enhancement Algorithm 42
      • 4.5 Incremental State Estimation 48
      • 4.6 Experiments 52
      • 4.6.1 Simulation Data 52
      • 4.6.2 Live Recording Data 54
      • 4.6.3 Computational Complexity 55
      • 4.7 Summary 56
      • 5 Supervised Denoising Pre-Training for Robust ASR with DNN-HMM 59
      • 5.1 Introduction 59
      • 5.2 Deep Neural Networks 61
      • 5.3 Supervised Denoising Pre-Training 63
      • 5.4 Experiments 65
      • 5.4.1 Feature Extraction and GMM-HMM System 66
      • 5.4.2 DNN Structures 66
      • 5.4.3 Performance Evaluation 68
      • 5.5 Summary 69
      • 6 DNN-Based Frameworks for Robust Speech Recognition Using Noise Estimates 71
      • 6.1 Introduction 71
      • 6.2 DNN-Based Frameworks for Robust ASR 73
      • 6.2.1 Robust Feature Enhancement 74
      • 6.2.2 Robust Model Training 75
      • 6.3 IMM-Based Noise Estimation 77
      • 6.4 Experiments 78
      • 6.4.1 DNN Structures 78
      • 6.4.2 Performance Evaluations 79
      • 6.5 Summary 82
      • 7 DNN-Based Robust Speech Recognition Using Soft Target Labels 83
      • 7.1 Introduction 83
      • 7.2 DNN-HMM Hybrid System 85
      • 7.3 Soft Target Label Estimation 87
      • 7.4 Experiments 89
      • 7.4.1 DNN Structures 89
      • 7.4.2 Performance Evaluation 90
      • 7.4.3 Effects of Control Parameter ξ 91
      • 7.4.4 An Integration with SDPT and ESTN Methods 92
      • 7.4.5 Performance Evaluation on Various Noise Types 93
      • 7.4.6 DNN Training and Decoding Time 95
      • 7.5 Summary 96
      • 8 Conclusions 99
      • Bibliography 101
      • 요약 108
      더보기

      분석정보

      View

      상세정보조회

      0

      Usage

      원문다운로드

      0

      대출신청

      0

      복사신청

      0

      EDDS신청

      0

      동일 주제 내 활용도 TOP

      더보기

      주제

      연도별 연구동향

      연도별 활용동향

      연관논문

      연구자 네트워크맵

      공동연구자 (7)

      유사연구자 (20) 활용도상위20명

      이 자료와 함께 이용한 RISS 자료

      나만을 위한 추천자료

      해외이동버튼