AI's Breakdown
The following is a description by the author who typed all this up. Bless her. ;O; Her site is lost in the 'net now but I ensure her words will not be lost.
Ever wonder what it says in the background when Professor Hodgeson is talking about KIDS, while the KIDS prototype stuff is on the screen? Well, thanks to miraculous technology in combination with my extremely boring life, I have had time to copy all that down. And yes, I have it here for you. It's boring and odd, but it gives you an idea about how Hodgeson felt about the kids in KIDS and about the experiment itself.
"Our initial body of utterances was collected with a program that periodically called staff members and asked them to say 5 names selected at random from 64 full Japanese names (surname followed by first name). Using this program 684 utterances were recorded from 47 native Japanese speakers (3/4 of which were male) and tagged with the utterance transcription. The utterances were represented as Bark-scale power spectra of 0 ms speech frames. Hamming windowed at 5 ms shifts. The utterances were time synchronously phoneme labeled using their transcriptions in an automated process. The results were manually checked & adjusted to corrected any missegmentations.
From this data we generated our initial models as described above and used them to bring the automated attendant system online. The system, open to about 100 users, ran as described in Section 3 and after some months we had collected over 350 additional utterances. The newly collected utterances were briefly checked and a few mislabeled ones were deleted.
Even with the new utterances, this is not a large data set (especially considering the task is multispeaker, and recorded over telephone lines), but we nonetheless performed the following experiments to asses the effects of incremental retraining. The 350 new utterances were added in 4 stages (preserving their temporal sequence) to the initial set of 684 (e.g. 684 + 87, 684 + 175, ...). At each stage one third of all the utterances were selected at random and held out for testing. The remaining two thirds became the training data, from which a new set of models was made using the three step procedure outlined above.
"At each stage we made 2 tests. The first checked basic recognition accuracy when new models were enerated from the expanded training data and the new testing data was incorporated into the test set. The second used the new testing data but no new training data in order to check how well the original models generalized to unseen data. These two tests were conducted on both the models produced by embedded K-means and on the models after minimum error training (step 3 above). Results for these tests are shown in Figure 2."