Window To My World

Now let us look at a dictation in this imaginary language. Supposing this dictation is
100 lines containing 1000 words of various lengths, and the number of syllables in the
dictation is 2500 a nice mean. Let us also assume that the dictionary has the full
complement of 1364 distinct words possible in this imaginary language.

If the dictator pauses between each word, the computer simply takes each word and
compares it to its list to find the closest match. Because it is not necessarily going for
the exact match, it will have to go through the entire list for each word. Therefore, this
task will be done 1000 x 1364, or 1,364,000 times, not too large a task for our
microcomputer.

But suppose the program must decide where the words begin and end. It would then
have to look at each syllable to see if it were a word. It then would look at each two
consecutive syllable, each three consecutive syllable, etc up to each five consecutive
syllables. This would mean that it would have to look at all potential word
combinations, and there would be 2500 possible one-syllable words, 2499 possible
two-syllable words, 2498 possible three-syllable words, etc, for a grand total of
12,490 potential word combinations. This means that instead of going through the
match-up process a mere 1,364,000 times, the software would have to perform this
task 17,036,360 times. That is to say, such a task requires twelve times as much
computing power. However, the task becomes geometrically more difficult when we
deal with the real world of English.

Taking this problem to the real world, we need to look at phonemes instead of just
syllables. Each syllable has at least one phoneme and many have three. For example,
the word "sick" has three, i.e., "s," "ih," and "kh." Moreover, many of the entries in
the dictionary are phrases, and so there are many more than a few phonemes. The
word "choledochoduodenostomy" has around nineteen phonemes; "salpingo-
ophorectomy" has a similar amount. The common phrase "pupils are equal, round,
and reactive to light and accommodation," which most voice recognition software
dictionaries contain, has about 44.