Tuesday, August 21, 2012

Ronanki: GSoC 2012 Pronunciation Evaluation week 7

Last week, I continued to work on spectral features and phonological features and their mapping based on neural network training for first few days. Based on forced-alignment/manual labels if exists, these phonological features http://talknicer.net/~ronanki/phonological_features/feature_stream for each phone in a phrase are repeated against it's spectral features. I am looking over CSLU Toolkit which uses a neural net for feature-to-diphone decoding and stopped at that point to work out later after mid-evaluation.

Later, I worked on integration of acoustic/duration scores along with edit-distance grammar decoding with the current website for exemplar outlier analysis.

I tried with many test cases such as
1. silence
2. noisy speech
3. Junk speech
4. Random sentence
5. Actual sentence shortened in the end
6. Actual sentence skipped the beginning

In test cases from 1-6, the forced alignment did not reach the final state and failed to create phone segmentation file, label file which contains acoustic scores, phone labels respectively.

7. Actual sentence
8. Actual sentence with more silence both at beginning and end
9. Actual sentence with one small word skip in the middle of phrase
10. Similar sounding sentences such as
Ex. Utterance: 
Approach the teaching of pronunciation with more confidence
Tested Similar sounding: 
a. Approach the teaching opponents the nation with over confidence
b. Approach the preaching opponents the nation with confidence

In test cases from 7-10, the forced alignment worked and generated acoustic scores, phone labels. Then, I moved on to edit-distance grammar decoding testing accuracy on cases 7-10 so that I can set a threshold parameter to distinguish between cases (7,8,9) and (10)

Earlier, I tested for cases 7,8 with phrase decoder in edit-distance and reported it around 73% and the accuracy is < 40% for case 10 so that I can easily set the threshold parameter such as accuracy = x>0.4 ? T : F

I also discussed with my mentor James Salsman on giving weights to words based on parts of speech for phrase output score and here is what he proposed and all the units are in db which represents relative loudness in English.
(%wt, %pos); # scoring weights and names of parts of speech
$wt{'q'} = 1.0; $pos{'q'} = 'quantifier';
$wt{'n'} = 0.9; $pos{'n'} = 'noun';
$wt{'v'} = 0.9; $pos{'v'} = 'verb';
$wt{'-'} = 0.8; $pos{'-'} = 'negative';
$wt{'w'} = 0.8; $pos{'w'} = 'adverb';
$wt{'m'} = 0.8; $pos{'m'} = 'adjective';
$wt{'o'} = 0.7; $pos{'o'} = 'pronoun';
$wt{'s'} = 0.6; $pos{'s'} = 'possessive';
$wt{'p'} = 0.6; $pos{'p'} = 'preposition';
$wt{'c'} = 0.5; $pos{'c'} = 'conjunction';
$wt{'a'} = 0.4; $pos{'a'} = 'article';

Hope, I do it and launch the site after integrating everything before mid-evaluation submission.

No comments:

Post a Comment