Tuesday, August 21, 2012

Ronanki: GSoC 2012 Pronunciation Evaluation week 12

This week, I am trying to extend the TIMIT statistics to 5 or 6 per each phoneme based on syllable position or I can do CART modelling to predict duration and acoustic score based on training. I did this to some extent using wagon in speech tools.

Regarding mis-pronunciation detection accuracy, I collected data from 8 non-native speakers with 5 words being recorded 10 times in both correct and wrong ways and 5 sentences being recorded 3-5 times in both correct and wrong ways. Here is the link to it @ http://researchweb.iiit.ac.
database/ and the description of the database is here at http://researchweb.iiit.ac.in/~srikanth.ronanki/GSoC/PE_database/description.txt

I need to split each speaker's data into individual files which is a tedious task and taking some time. Somehow, I completed with one speaker's data and the current text-independent system is doing good. 46 out of 50 correct words are detected good pronunciation and 42 words out of 50 wrong words are detected mis-pronunciation by setting a common threshold for all words. It takes one or two more days to give complete statistics. 

In parallel, I completed phonological features and generated acoustic models for TIMIT database because I faced some difficulties to find complete set of wav files for WSJ database. But, I failed in both decoding and forced-alignment with the new models generated on phonological features. Even I failed in generating appropriate models with sphinx mfc features. Even though they generated properly, I didn't get results with forced-alignment or decode functions by replacing with WSJ models. I will try to overcome these issues by next week.

