RISS 학술연구정보서비스

검색
다국어 입력

http://chineseinput.net/에서 pinyin(병음)방식으로 중국어를 변환할 수 있습니다.

변환된 중국어를 복사하여 사용하시면 됩니다.

예시)
  • 中文 을 입력하시려면 zhongwen을 입력하시고 space를누르시면됩니다.
  • 北京 을 입력하시려면 beijing을 입력하시고 space를 누르시면 됩니다.
닫기
    인기검색어 순위 펼치기

    RISS 인기검색어

      검색결과 좁혀 보기

      선택해제
      • 좁혀본 항목 보기순서

        • 원문유무
        • 음성지원유무
        • 학위유형
        • 주제분류
        • 수여기관
          펼치기
        • 발행연도
          펼치기
        • 작성언어
        • 지도교수

      오늘 본 자료

      • 오늘 본 자료가 없습니다.
      더보기
      • 화자적응화 연속음성인식 시스템 구현에 관한 연구

        김상범 東亞大學校 大學院 1999 국내박사

        RANK : 249727

        음성은 인간적 가장 자연스러운 의사소통의 수단이며 이러한 음성에 의한 인간-기계 인터페이스는 속도가 빠르고 특별한 훈련이 없이도 이루어진다. 또한 컴퓨터 및 정보통신 기술의 급속한 발전으로 음성인식 기술은 중요한 연구 과제가 되고 있다. 이러한 음성인식에 관한 연구는 HMM과 신경망에 의한 방법들이 활발히 진행되고 있다. 현재의 음성인식 시스템에는 DP매칭법, HMM 및 신경회로망으로 처리하는 연구가 계속되고 있다. 연속음성 인식에서는 HMM을 이용한 인식연구가 활발히 이루어지고 있으며 화자적응화 방법을 이용하여 소량의 적응화용 데이터를 추가적으로 학습하며 특정 환경 및 특정화자 모델에 근접한 인식률을 얻는 방법이 주목되어지고 있다. 본 연구에서는 음절단위의 HMM을 이용하여 발성된 한 문장에 대해 화자 적응화 할 수 있는 방법을 제안하였다. 음절단위 HMM모델을 구축한 후 적응화 하고자 하는 환경 및 화자의 데이터(음절 및 문장)를 연결학습법과 Viterbi 알고리듬으로 음절단위의 추출을 자동화한 후 MAP(최대사후확률추정)을 이용하여 적응화하였다. 시뮬레이션 실험에서는 음절 DDCHMM(지속시간제어 HMM)모델을 학습한 후 연속음성을 화자 적응화 하였다. 신문사설에 대한 인식결과에서, 적응화한 경우 인식률이 71.8%로 적응화 전보다 약 37% 향상되었다. 적응화때 사용되는 웨이트값의 변화에 따른 인식률은 크게 차이가 나지 않았으며, ML추정에 의한 파라미터의 MAP추정과 Viterbi 알고리듬에 의한 추출 프레임의 MAP추정에 의한 인식률 변화도 크게 나지 않았다. PC에서 실시간 음성인식 시스템을 구축하여 음절 또는 문장을 인식 실험하였다. 인식시스템은 마이크를 통해 입력되는 음성을 A/D변환하여, 환상형 버퍼에 순환적으로 저장되도록 구성하였고, 시작점과 끝점을 검출하여 무음 구간을 제거한 음성부분만을 저장용 버퍼에 저장하도록 하였다. 저장용 버퍼에 저장된 음성은 음성분석과정에서 10차 멜 켑스트럼을 구하여 학습 및 인식용 음성 파라미터로 사용할 수 있도록 하였다. 연속음성 인식률 향상을 위해 OPDP(One Pass DP)법을 사용하여 자동차 제어문을 실시간으로 인식실험한 결과 약 90% 이상의 인식률을 얻을 수 있었다. 이상의 연구결과는 이미 만들어진 모델에 대하여 새로운 화자가 발성한 적은 데이터를 이용하여 적응화 된 모델을 만들 수 있을 뿐만 아니라, 상당한 인식률 향상을 가져 올 수 있었다. 그리고 본 연구에서 제안한 방법을 더욱 개선하면 향후 온라인 시스템, 대화시스템 및 자동통역 시스템의 실시간 처리 음성인식 시스템의 구현을 이룰 수 있을 것이다. The man-machine interface through speech has benefits in that it is fast and can be performed without special training. The establishment of speech recognition technology is becoming an important research subject because the computer and telecommunication technology are developing rapidly. Nowadays, the speech recognition systems using DP matching, HMM and neural network are going on studying continuously. One of them, recognition system using HMM has been studied widely. Especially speaker adaptation methods which train models by additionally small amount of adaptation data to the special environments and speaker models get highly good recognition rate. In this study, speaker adaptation of uttered sentence using syllable unit HMM is proposed. Making data-base of syllable unit HMM model and segmentation of syllable of speaker data(syllables and sentence) for adaptation is performed automatically by concatenation training and viterbi algorithm. And speaker adaptation is performed by MAPE(Maximum A Posteriori Probability Estimation). In simulation test, Continuous speech data is adapted by MAPE, after training syllable unit DDCHMM model. As a result of the simulation, for newspaper editorial continuous speech, the recognition rates of adaptation of HMM was 71.8% which is approximately 37% improvement over that of unadapted HMM. There are not too much differences of recognition rates by varing adapted weight values, and by comparing MAP estimation using segmented frame sample and MAP estimation using ML estimated parameter sample. we have realized real time speech recognition system in IBM PC. Recognition system is composed so that it may detect start and end time of A/D converted speech data which is stored into circular buffer continuously and remove unvoiced data region, then so that it may calculate me1 cepstrum and recognize them by O(n)DP within real time. Using OPDP method to get better recognition rate, the recognition rates of sentences of car control command was 90% as high recognition performance. In the result of experiment, speaker adaptation method can adapt a speaker-independent models to new speaker using small amount of new speaker uttered speech data, and can achieves high recognition performance in the speech recognition system. Henceforth, it will realize voice-online-system, dialogue system and automatic interpreter of real time speech recognition system if suggested method is improved.

      • HMM에 의한 실시간 연속음성 인식시스템 구현에 관한 연구

        이영재 동아대학교 1996 국내박사

        RANK : 249711

        This paper is a study on the composition of Real-Time Continuous Speech Recognition System for Man-Machine Interface and it examines the posibility that applies to automatic system. HMM model can be classified into Continuous Distribution HMM and Discrete Duration Control HMM, and the recognition algorithm can be classified into O(n)DP method and One Pass DP method in order to choose HMM model and recognition algorithm. The simulation is implemented for 35 continuous speech samples of four connected spoken digits in two cases which are divided into two submodels according to whether the regression coefficients are included or not. As a result of the simulation, the average recognition rates show 93.0% and 80.5% respectively for two cases; the one is Continuous Distribution HMM model which includes regression coefficients and the other does not include when O(n)DP method is used. Average recognition rates show 93.4% and 84.4% respectively for two cases the one is Discrete Duration Control HMM model which includes regression coefficients and the other does not include when O(n)DP method is used. When HMM model does not include regression coefficients, the average recognition rate of One Pass DP method is better improved than that of O(n)DP method by 12%. The Continuous Speech Recognition System is composed of Continuous Distribution HMM model and algorithm of One Pass DP method which are chosen by the consideration of computing time and recognition rate according to the result of simulation. Continuous Speech Recognition System is composed so that it may detect start point and end point of speech data which are converted into samples by 10 KHz, 8 bit A/D within real time, then so that it may recognize them by One Pass DP method, display the result of recognition on PC monitor and at same time send control data to Interface. HMM models are created by training for continuous speech samples which are control words, area names and digital sounds. In the result of experiment by Continuous Speech Recognition System, there are some kind of errors which are insertion, replacement and deletion of one syllable, but it examined the posibility that can be applied to Man-Machine Interface on automatic system if post-process is performed for recognition.

      • 적응필터링 기법을 이용한 잡음음성의 화자적응화에 관한 연구

        이종연 東亞大學校 大學院 1999 국내석사

        RANK : 249710

        현재의 음성인식 시스템에는 DP매칭법, HMM 및 신경회로망으로 처리하는 연구가 계속되고 있다. 연속음성 인식에서는 HMM을 이용한 인식연구가 활발히 이루어지고 있으며 화자적응화 방법을 이용하여 소량의 적응화용 데이터를 추가적으로 학습하여 특정 환경 및 특정화자 모델에 근접한 인식률을 얻는 방법이 주목되어지고 있다. 음성인식을 실생활에 이용하기 위해서 인식률을 저하시키는 요인인 잡음을 감소시켜야 한다. 본 논문에서는 RLS적응필터를 이용하여 잡음을 감소시켰다. 그리고 기존의 Mel Cepstrum 대신 SGDS(Smooth Group Delay Spectrum)을 특징 파라미터로 사용하였다. 음절단위의 HMM을 이용하여 발성된 한 문장에 대해 화자 적응화할 수 있는 방법을 제안하였다. 음절단위 HMM모델을 구축한 후 적응화 하고자 하는 환경 및 화자의 데이터(음절 및 문장)를 연결학습법과 Viterbi 알고리듬으로 음절단위의 추출을 자동화한 후 MAP(최대사후확률추정)을 이용하여 적응화하였다. 음절 CHMM(연속분포 HMM)모델을 학습한 후 연속음성을 화자 적응화 하였다. 시뮬레이션 실험에서는 잡음음성을 필터링한 경우와 하지 않은 경우에 대하여 인식실험을 하였다. ML로 추정한 파라미터를 가지고 MAP 추정한 경우와 Viterbi 알고리듬으로 추출한 프레임을 샘플로 하는 MAP 추정한 경우에 대한 화자적응화 실험이다. 연속음성 인식방법으로는 O(n)DP법을 이용하였다. 실험결과, ML로 추정한 파라미터를 가지고 평균, 분산, 평균과 분산을 MAP 추정한 경우 분산만을 추정하였을 때 가장 높은 인식률을 보였다. 신호대잡음비가 10dB, 5dB, 0dB인 경우 필터링하기 전의 인식률은 각각 75.2%, 55.8%, 46.1%이고, 필터링한 후의 인식률은 각각 75.7%, 74.5%, 71.0%이다. Viterbi 알고리듬으로 추출한 프레임을 샘플로 하는 평균, 분산, 평균과 분산을 MAP 추정한 경우 평균과 분산을 동시에 추정하였을 때 가장 높은 인식률을 보였다. 신호대잡음비가 10dB, 5dB, 0dB인 경우 필터링하기 전의 인식률은 각각 68.5%, 62.8%, 42.8%이고, 필터링한 후의 인식률은 각각 72.5%, 73.0%, 75.9%이다. Nowadays, the speech recognition systems using DP matching, HMM and neural network are going on studying continuously. One of them, recognition system using HMM has been studied widely. Especially speaker adaptation methods which train models by additionally small amount of adaptation data to the special environments and speaker models get highly good recognition rate. In order to apply speech recognition to real life, we have to reduce the noise that makes recognition rate reduced. In this paper, the RLS adaptive filter reduce the noise. In stead of MEL Cepstrum which now is used in speech recognition, SGDS(Smoothed Group Delay Spectrum) as a parameter is used. In this study, speaker adaptation of uttered sentence using syllable unit HMM is proposed. Making data-base of syllable unit HMM model and segmentation of syllable of speaker data(syllables and sentence) for adaptation is performed automatically by concatenation training and Viterbi algorithm. And speaker adaptation is performed by MAPE(Maximum A Posteriori Probability Estimation). In simulation test, Continuous speech data is adapted by MAPE, after training syllable unit CHMM model. There are two ways in speaker adaptation. One is to estimate MAP with the ML parameter. The other is to estimate MAP with the frame sampled by Viterbi Algorithm. O(n)DP method is used in continue speech recognition method. MAPE by the ML estimated mean, covariance, and mean-covariance. The highest recognition rate is when MAPE by covariance. When SNR is 10dB, 5dB, 0dB each, recognition rate is 75.2%, 55.8%, and 46.1%, before filtering. After filtering, the rate is 75.7%, 74.5%, and 71.0%. Among MAPE by mean, covariance and mean-covariance with the frame sampled by Viterbi Algorithm. The highest rate is when MAP is estimated by mean-covariance. When SNR is 10dB, 5dB, 0dB each, recognition rate is 68.5%, 62.8%, and 42.8%, before filtering. However, the rate is 72.5%, 73.0%, and 75.9% after filtering.

      • 한국어 연속 음성 인식에 있어 운율 정보의 계산적 모델

        강평수 전남대학교 대학원 1999 국내석사

        RANK : 249710

        운율 정보를 연속 음성 인식에 적용하기 위한 방안으로서 발화된 인식 대상 음성과 운율 경계 강도의 적합성 검사를 하는 방안을 제안하였다. 즉, 주어진 텍스트에서 예측된 운율 경계 강도와 음성으로부터 예측된 운율 경계 강도를 이용하여 문장 인식에 적용하는 방안을 제안하였다. 이를 위하여 발화된 200개의 낭독체 문장(spoken sentence)에 대하여 청취 테스트를 통하여 운율 경계 강도를 레이블링하였다. 이를 바탕으로 하여 문장으로부터 운율 경계 강도를 예측하는 방법을 문장의 나무구조와 표면 구조를 이용하여 연구하였으며, 한편으로 음성으로부터 운율 경계강도를 결정하는 알고리듬을 개발하였다. 이 방법은 길이 정보를 기존의 LDA 방법에, 피치정보를 VQ에 적용하여 tri-tone이란 개념을 도입하여 혼합한 모형이다. 위의 모형의 능력을 측정하기 위하여 애매한 문장(ambiguous sentences)을 대상으로 하여 연속 음성 인식 실험을 하였다. 연속 음성 인식 부분은 연속 음성 인식 툴인 HTK를 이용하였으며 운율 부분은 앞의 실험을 통하여 얻은 모듈로 실험에 이용하였다. 인식 실험 결과 운율 정보를 이용하지 않은 모형보다 12%의 인식률 향상을 얻어내었다. In this study we proposed a new method to apply prosody information to a continuous speech recognition system. The main concept of our proposed method was based on prosodic boundary strength(PBS). First, the PBSes are predicted from the recognizing sentence. Second, PBSes are predicted from the spoken utterance. Then, the pairs of PBS sequences are compared statistically to apply to select the most probable sentence among n-best sentence. Based on the above concept, I performed listening test with which the PBSes were determined for every utterances and studied prediction methods which could estimate PBSes from the sentence with tree and surface structures of the given sentence. Besides, I suggested the algorithm for the estimation of PBSes from the spoken utterances using prosodic information. The algorithms is LDA-VQ model: LDA was applied to the duration information and VQ model was used for ujeol-boundary pitch information. Finally, syntactic informations and prosodic informations are mixed to apply to the n-best speech recognition. Experimental result showed 12% higher recognition rate than the experiment without using prosody information.

      • Tagged word 카테고리 기반의 트라이그램과 래티스 확장 방식을 사용한 연속 음성 인식

        장준원 서강대학교 대학원 1999 국내석사

        RANK : 249695

        연속 음성 인식 시, 언어 모델은 시스템의 인식 성능을 좌우하는 중요한 역할을 한다. 따라서, 보다 복잡하면서도 뛰어난 성능을 발취하는 언어 모델을 시스템에 도입 하고자 하는 욕구는 언제나 존재한다. 하지만, 이러한 복잡한 언어 모델을 시스템에 직접적으로 적용하는 것은 비효율적이다. 계산량과 메모리 사용량이 크게 증가하여, 인식에 소요되는 비용이 너무 커지기 때문이다. 이에 대한 대안으로 제시되는 것이 2-pass 전략이다. 이는, 1-pass에서 일반적인 언어 모델을 이용하여 여러 가지 인식 후보들의 집합을 생성해내고, 이 결과에 대해 보다 복잡한 언어 모델을 적용하여 rescoring을 수행하는 것이다. 이를 통해, 추가적인 비용을 최소화하면서도 보다 복잡하고 뛰어난 성능을 지닌 언어 모델을 인식에 적용할 수 있다. 본 논문에서는 1-pass에서 바이그램 래티스를 생성한 뒤, 이를 확장시켜 트라이그램 언어 모델을 적용한 후, 2-pass에서 이에 대한 재인식을 수행하여 최종 인식 결과를 얻는 래티스 확장 방식을 연속 음성 인식에 사용하였다. 언어 모델로는 tagged word 분류를 따르는 카테고리 기반의 back-off 트라이그램 언어 모델을 사용하여, 이를 단어 기반의 back-off 트라이그램 연어 모델과 비교하였다. 실험 결과, 단어 기반의 트라이그램을 인식에 적용하기 위해 추가적으로 소모된 시간은 바이그램만을 이용한 인식에 걸린 시간의 약 20%에 불과했다. 반면, 문장 인식률은 절대값으로 8.3% 상승하였다. 더 나아가, tagged word 분류를 적용한 카테고리 기반의 트라이그램의 경우, 단어 기반의 결과보다 문장 인식률이 절대값으로 8% 가량 향상되었다. 이 방식은 기존의 연구에서 사전이 tagged word로 이루어진 것에 비해, 사전을 단어 만으로 구성하고, 언어 모델 자체가 해당 단어에 대한 카테고리를 예측하기 때문에, 인식 네트워크가 불필요하게 커지는 문제를 겪지 않는다. 따라서 기존 연구에 비해 효율적인 인식을 수행할 수 있다. In continuous speech recognition systems, language model plays an important role so that it could determine system performance. So, there has been desire for introducing complexer and better language models into that system. But, it is not efficient to apply these language models directly to that system. Because, in this case, the memory usage and computational cost will increase very much, so total recognition cost will become unacceptable for real system. It is the 2-pass strategy which is suggested as an alternative of this situation. This strategy goes as follows : In 1-pass, the system generates an hypothesis set using general language model. And in 2-pass, this system applies complexer language models on that set. Doing so, the system can apply complexer and better language model in recognition process while it minimizes additional cost. In this thesis, we generated bigram lattices in 1-pass. After that, we expanded them, and applied trigram language model on them. In 2-pass, we performed re-recognition on these trigram lattices, so we could get final recognition results. This is called `lattice expansion method'. We used simple word-based back-off trigram language model as well as more sophisticated category-based back-off trigram language model that follows tagged word classification rule in experiments. In results, additional time cost for applying word-based trigram in recognition was no more than 20% of which is needed for recognition using bigram only. But, sentence correctness increased by 8.3% in absolute measure. Further, there was 8% absolute increase in sentence correctness in comparison with word-based trigram language model, when we used tagged word category-based trigram language model. This approach is superior than existing research which compounded dictionary from whole tagged words, because our language model can predict word category by oneself, so we can make dictionary with words only, and the system can perform recognition on smaller space, more efficiently.

      • HMM을 이용한 연속음성인식 시스템의 화자적응화에 관한 연구

        김창근 東亞大學校 大學院 1998 국내석사

        RANK : 249694

        현재의 음성인식 시스템에는 DP매칭법, HMM 및 신경회로망으로 처리하는 연구가 계속되고 있다. 연속음성 인식에서는 HMM을 이용한 인식연구가 활발히 이루어지고 있으며 화자적응화 방법을 이용하여 소량의 적응화용 데이터를 추가적으로 학습하여 특정 환경 및 특정화자 모델에 근접한 인식율을 얻는 방법이 주목되어지고 있다. 본 연구에서는 음절단위의 HMM을 이용하여 발성된 한 문장에 대해 화자적응화 할 수 있는 방법을 제안하였다. 음절단위 DDCHMM모델을 구축한 후 적응화 하고자 하는 환경 및 화자의 데이터(음절 및 문장)를 연결학습법과 Viterbi 알고리즘으로 음절단위의 추출을 자동화한 후 MAPE(최대사후확률추정)을 이용하여 적응화 하였다. Nowadays, the speech recognition system using DP matching, HMM and neural network are going on studying continuously. One of them, Study of recognition system using HMM have been widely. Especially speaker adaptation method which is adpating additionally small amount of adaptation data to the special environments and speaker models gets highly good recognition rate. In this study, Speaker adaptation of uttered sentence using syllable unit hmm is proposed. Making data-base of syllable unit DDCHMM model, Segmentation of syllable of speaker data(syllables and sentence) for adaptation is performed automatically by concatenation training and viterbi algorithm. And speaker adaptation is performed by MAPE(Maximum A Posteriori Probability Estimation). In simulation test, Continuous speech data is adapted by MAPE, after training syllable unit DDCHMM model, As a result of the simulation, newspaper editorial continuous speech, the recognition rates of adaptation of HMM was 71.1% respectively which is approximately 168% improvement over that of unadapted HMM. There is not too much difference of recognition rates varing adapted weight values and comparing MAP estimation using segmented frame sample and MAP estimation using ML estimated parameter sample. Next we have experiment by real time speech recognition system in IBM PC. Recognition system is composed so that it may detect start and time of A/D converted speech data which is stored into circular buffer continuously and remove unvoiced data region, then so that it may calculate mel cepstrum and recognize them by O(n)DP within real time. Recognition rate of adaptation of HMM is better improved than that of unadaptation of HMM. Using OPDP method to get better recognition rate, the recognition rates of sentences of Car control command and digit show 95.2% as highily good one. In result of experiment, speaker adaptation method is the way of getting good recognition rate through adapting additionally small amount of adaptation data. and It may helpful to built voice-online-system, dialogue system and automatic interpreter if refer to real time speech recognition system.

      • 연속음성인식을 위한 향상된 결정트리 기반 상태 공유 기법 연구

        김동화 부산대학교 일반대학원 1999 국내박사

        RANK : 249679

        Mixture Gaussian distributions and context dependent phone models have been used to achieve high performance in many continuous speech recognition systems based on HMMs. But this creates a trainability problem due to the resulting large number of model parameters. Decision tree-based state tying has been used popularly for not only improving the robustness and accuracy of context dependent acoustic modeling but synthesizing unseen triphones. Standard method of construction of the phonetic decision tree perform one level pruning using just single Gaussian triphone models. In the standard decision tree, the coarse state clusters due to using just single Gaussian models, strict state-tying assumption with one-level pruning, and decision tree environment mismatch between training stage and decoding stage can cause problems in the accuracy and robustness of the acoustic models and result in low recognition performance of the system. In this paper, two improved approaches, two-level pruning decision tree and multi-mixture decision tree using both multi-mixture Gaussian models and single Gaussian models are proposed to tackle the problems in standard decision tree. The former performs two level pruning for the state tying and the mixture weight tying to make better use of limited training data. Using the second level, the tied states can have different mixture weights according to the similarity in their phonetic context. Increment in log likelihood, which is same with one used in the first level pruning, is also used as goodness-of-split for the second level pruning. The combination of two thresholds is used for robust estimation of Gaussians and the detailed distinction of states in the same cluster. The procedure of the second proposed approach is as follows. Phonetic decision tree is built from all the seen single Gaussian models using traditional method. The models are re-estimated using adequate amount of data through the decision tree. Then the number of mixture components is increased and parameters are re-estimated again. The decision tree is built again using the multi-mixture Gaussian models. The score of a node is approximated with the sum of log likelihoods of each mixture distribution weighted by its accumulated state occupancy from the pooled-statistics. Pooled-variance and mean for each mixture are calculated using the variance and mean of the mixture distribution from all multi-mixture Gaussian distributions in the node and accumulated state occupancy for the mixture in the node. These re-estimations and building decision trees are repeated until the desired performance is achieved or a specific number mixture components is reached. During this acoustic modeling, the number of leaves of the decision trees is decreased gradually while the number of mixture components increases. For both approaches, one decision tree for each state of each monophone is built to cluster all states of the phone. The performances of the proposed approaches were evaluated on the BN-96, Wall Street Journal 20k and Wall Street Journal 5k databases. To compare with the standard decision tree, WSJ5k system had approximately 4,000 tied states with maximum 10 mixtures, HUB4 and WSJ29k systems had approximately 7,000 tied states with maximum 24 mixtures. The recognition performance of the systems using the two-level decision tree built with single Gaussian models and the multi-mixture decision tree built with 2 mixture Gaussian models was better than that of the system using standard one-level single Gaussian decision tree. Furthermore, by tuning the balance of the first and second level tree nodes in two-level decision tree, we can get better performance with even fewer parameters than the standard decision tree-based approach.

      • 새로운 話者 定規化와 母音 情報를 利用한 連續音聲認識機의 具現

        유일수 成均館大學校 大學院 2003 국내석사

        RANK : 249676

        One of the important parts of large vocabulary continuous speech recognition is primarily to improve the recognition accuracy, and another is to archive near real-time performance without compromising the recognition accuracy. This paper considered those two important parts of continuous speech recognition. First, the method of vocal tract normalization is a successful method for improving the accuracy of speech recognition. A frequency warping procedures based low complexity and maximum likelihood has been applied to speaker normalization. This paper proposes a new power spectrum method for speaker normalization warping. A simple mechanism for implementing this method achieved by modifying the power spectrum of filter bank in Mel-frequency cepstrum(MFCC) feature analysis. Also, this paper proposes the hybrid VTN to considerer both the power spectrum warping and a frequency warping. Second, this paper proposes reduction method of recognition candidates using vowel information for improving recognition process time. Main idea of this method is to perform recognition candidate reduction using extracted vowel information where is in first syllable of continuous utterance. Experimental study was executed proposal method for those two important parts of continuous speech recognition using SKKU PBW DB. First, this paper experimented and evaluated speaker normalization method for the recognition accuracy. The results have shown that the power spectrum warping is better than a frequency warping about the recognition performance, and reduced the word error rate by 3.06% and 2.06%, respectively than baseline system. Also, the case of the hybrid VTN reduced word error rate by 4.07% than baseline system. Second this paper experimented and evaluated reduction method of recognition candidates using vowel information for improving recognition process time. The results have shown that the accuracy of vowel information extraction is 96.8% and the recognition process time is improving 4.4 times than baseline system.

      연관 검색어 추천

      이 검색어로 많이 본 자료

      활용도 높은 자료

      해외이동버튼