Wednesday, July 4, 2012

Ronanki: GSoC 2012 Pronunciation Evaluation Week 4

The source code for the functions below have been uploaded to http://cmusphinx.svn.sourceforge.net/viewvc/cmusphinx/branches/speecheval/ronanki/scripts/
Here are some brief notes on how to use those programs:

Method 1: (phoneme decode)
Path:
neighborphones_decode/one_phoneme/
Steps To Run:
1. Use split_wav2phoneme.py to split a sample wav file in to individual phoneme wav files
Usage: python split_wav2phoneme.py <input_phoneseg_file> <complete_phone_list> <input_wav_file> <out_split_dir>
2. Create split.ctl file using extracted split_wav directory
3. Run feature_extract.sh program to extract features for individual phoneme wav files
4. Java Speech Grammar Format (JSGF) files are already created in FSG_phoneme
5. Run jsgf2fsg.sh in FSG_phoneme to convert from jsgf to fsg.
6. Run decode_1phoneme.py to get the required output in output_decoded_phones.txt
Usage: python decode_1phoneme.py <input_split_ctl_file> <output_phone_file>

Method 2: (Three phones decode)
Path: 
neighborphones_decode/three_phones/
Steps To Run:
1. Use split_wav2threephones.py to split a sample wav file in to individual phoneme wav files which consists of three phones the other two being served as contextual information for the middle one.
Usage: python split_wav2threephones.py <input_phoneseg_file> <ngb_key_mapper> <input_wav_file> <out_split_dir>
2. Create split.ctl file using extracted split_wav directory
3. Run feature_extract.sh program to extract features for individual phoneme wav files
4. Java Speech Grammar Format (JSGF) files are already created in FSG_phoneme
5. Run jsgf2fsg.sh in FSG_phoneme to convert from jsgf to fsg.
6. Run decode_3phones.py to get the required output in output_decoded_phones.txt
Usage: python decode_3phones.py <input_split_ctl_file> <output_phone_file>

Method 3: (Single/Batch phrase decode)
Path: 
neighborphones_decode/phrases/
Steps To Run:
1. Run decode.sh program to get the required output in sample.out
2. Provide the input arguments such as grammar file, feats, acoustic models etc., for the input test phrase
3. Construct grammar file (JSGF) using my earlier scripts from phonemes2ngbphones and then use jsgf2fsg in sphinxbase to convert from JSGF to FSG which serves as input Language Model to sphinx3_decode

No comments:

Post a Comment