Tuesday, August 21, 2012

Ronanki: GSoC 2012 Pronunciation Evaluation week 9


This week, I finished with my random phrase pronunciation evaluation and is in testing phase @ http://talknicer.net/~ronanki/test/index.html

The system can provide evaluation scoring for any random sentence. It also gives feedback for mispronunciation and rate of duration at word level. Please, test the system and mail me the bugs if any. Please avoid giving proper nouns and punctuation marks while testing the system.

For doing this, I evaluated entire TIMIT dataset  and the statistics for each phone are evaluated at three positions: 
Begin/Middle/End (0/1/2). The count in the last column represents the number of times each phone occurred at each position. The statistics are @ http://talknicer.net/~ronanki/phrase_data/statistics/TIMIT_statistics.txt

Next week, I am going to implement CART models so that each phone can be compared with respective phone in better context. Regarding features, I studied about Power Normalized Cepstral Coefficients (PNCC) which are more robust towards speech recognition even in noisy environment. PNCC are 13 in dimension, computationally more cost than MFCC but performs better than MFCC in speech recognition. I downloaded the available matlab code @ http://www.cs.cmu.edu/~robust/archive/algorithms/PNCC_IEEETran/ and trying some experiments on nTIMIT database. I also implemented phonological mapping with current state of spectral features (MFCC) using ANN. Currently, I am in testing phase of speech recognition using all these features. 

No comments:

Post a Comment