Wednesday, May 6, 2020
Speech Recognition Principles And Applications Essay Research free essay sample
Speech Recognition: Principles And Applications Essay, Research Paper Table of contents Abstract 3 Overview of the Characteristics of Automatic Speech Recognition Systems 4 Number of Words 4 Use of Grammar 5 Continuous vs. Discrete Speech 5 Speaker Dependency 6 Early on Approaches to Automatic Speech Recognition 6 Acoustic-Phonetic Approach 7 Statistical Pattern Recognition Approach 8 Modern Approach to Automatic Speech Recognition 8 Hidden Markov Models 9 Training of an Automatic Speech Recognition System Based on HMMs 11 Sub-Word Unit of measurements 11 Applications of Automatic Speech Recognition Systems 12 Automated Call-Type Recognition 13 Data Entry 13 Future Applications Using Automatic Speech Recognition Systems 14 Decision 14 Mentions 15 Abstraction With the progresss of engineering, a batch of people may believe that incorporating the ability of understanding human address in a computing machine system is a piece of bar. However, scientists disagree. Since the early 19 1950ss, scientists have tried to implement the perfect automatic address acknowledgment system, but they failed. They were successful in doing the computing machine recognise a big figure of words, but till now, a computing machine that understands everything without run intoing any conditions does non be. Due to the tremendous applications, a batch of money and clip is spent in bettering speech acknowledgment systems. SPEECH RECOGNITION: Principles AND APPLICATIONS Nowadays, computing machine systems play a major function in our lives. They are used everyplace get downing with places, offices, eating houses, gas Stationss, and so on. Nonetheless, for some, computing machines still represent the machine they will neer cognize how to utilize. Communicating with a computing machine is done utilizing a keyboard or a mouse, devices many people are non comfy utilizing. Speech acknowledgment solves this job and destroys the boundaries between worlds and computing machines. Using a computing machine will be every bit easy as speaking with your friend. Unfortunately, scientists have discovered that implementing a perfect address acknowledgment system is no easy undertaking. This study will show the rules and the major attacks to speech acknowledgment systems along with some of their applications. Overview of the Characteristics of Automatic Speech Recognition Systems How can we measure a address acknowledgment system? Obviously depicting it by good or bad International Relations and Security Network # 8217 ; t adequate since the public presentation of such a system may be outstanding in one application and hapless in another. In fact, speech acknowledgment systems are designed harmonizing to the application. Some of these variable features are presented below. Number of Words The major feature of a address acknowledgment system is the figure of words it can recognize. The inquiry that comes to mind is how many words are plenty so that the public presentation of a address acknowledgment system is acceptable. The reply depends on the application ( 6, p98 ) . Some applications may necessitate few words, like automated call-type acknowledgment, others may necessitate 1000s, like informations entry. However, increasing the figure of words or the vocabulary of a address acknowledgment system increases its complexness and decreases its public presentation ( chance of mistake is higher ) ( 6, p.98 ) . Systems with big vocabularies are besides slower since more clip is needed to seek a word in a big vocabulary. Increasing the figure of words isn # 8217 ; t plenty because the address acknowledgment system is unable to distinguish words like # 8216 ; to # 8217 ; and # 8216 ; two # 8217 ; or # 8216 ; right # 8217 ; and # 8216 ; write # 8217 ; ( 6, p.98 ) . Use of Grammar Using grammar, distinguishing words like # 8216 ; to # 8217 ; and # 8216 ; two # 8217 ; or # 8216 ; right # 8217 ; and # 8216 ; write # 8217 ; is possible. Grammar is besides used to rush up a address acknowledgment system by contracting the scope of the hunt ( 6, p.98 ) . Grammar besides increases the public presentation of a address acknowledgment system by extinguishing inappropriate word sequencing. However, grammar doesn # 8217 ; t let random command which is a job for some applications ( 6, p.98 ) . Continuous vs. Discrete Speech When talking to each other, we don # 8217 ; t pause between words. In other words, we use uninterrupted address. However, for speech acknowledgment systems, there is trouble in covering with uninterrupted address ( 6, p.98 ) . The easy manner out will be utilizing distinct address where we pause between words ( 6, p.100 ) . With distinct address input, the soundless spread between words is used to find the boundary of the word, whereas in uninterrupted address, the address acknowledgment system must divide words utilizing an algorithm which is non a hundred per cent accurate. Still, for a little vocabulary and utilizing grammar, uninterrupted address acknowledgment systems are available. They are dependable and do non necessitate great computational power ( 6, p.100 ) . However, for big vocabulary, uninterrupted address acknowledgment systems are really hard to accomplish, necessitate immense computational power, every bit good as being slow. In fact, treating a address sample can t ake three to ten times the clip required for a individual to state it ( 6, p.100 ) . Speaker Dependency Speech acknowledgment system interior decorators must see another of import issue: whether their systems are speaker-dependent or speaker-independent. Each individual pronounces a word otherwise. Although it is easy for worlds to recognize the word # 8216 ; auto # 8217 ; whether an American or an Englishman says it, for speech acknowledgment systems, this is non the instance. Speaker dependence is determined from the application, some may necessitate speaker-dependent systems ( as in informations entry ) , others may necessitate speaker-independent systems ( as in machine-controlled call-type acknowledgment ) ( 6, p.100 ) . Speaker dependence affects greatly the preparation of an automatic address acknowledgment system ( 4, p.42 ) . Early on Approaches to Automatic Speech Recognition When scientists dreamed about a machine capable of understanding spoken linguistic communication, computing machines and ace fast integrated circuits were non available. However, they managed to construct the cardinal rules of speech acknowledgment systems. Several attacks were used, each one with advantages and disadvantages. Two of these attacks are discussed below. Acoustic-Phonetic Approach The theory behind acoustic-phonetic attack is acoustic phonetics. This theory assumes that spoken linguistic communication is divided into phonic units that are finite and peculiar. These phonic units are distinguished by belongingss that are evident in the address signal ( 7, pp.42-43 ) . The procedure by which address is recognised is described briefly in what follows: ab initio, address is divided into sections. Harmonizing to the acoustic belongingss of these sections, an appropriate phonic unit is attached to it. The obtained sequence of units is used to explicate a valid word ( 7, p43 ) . Figure 1: Phonetic sequence for a address sample ( 7, 43 ) . As an illustration, see the sequence of phonic units matched with a sample of address illustrated in figure 1. The symbol # 8216 ; SIL # 8217 ; indicates a silence whereas the perpendicular place of the phonic unit indicates how good it is matched with the corresponding section of address ( the higher, the best lucifer ) . After seeking, we can fit the phonic sequence SIL-AO-L-AX-B-AW-T with the look # 8216 ; all about # 8217 ; . It is obvious that the chosen phonemes are non merely the first picks in the phonic sequence, but besides 2nd ( B and AX ) and 3rd ( L ) picks. Therefore fiting a phonic sequence with a word or a group of words is non obvious ( 7, p.43 ) . In fact, this the chief disadvantage of this attack. Statistical Pattern Recognition Approach In statistical form acknowledgment, the address forms are straight inputted into the system and compared with the forms inputted in the system during preparation ( 7, p.43 ) . Unlike the acoustic-phonetic attack, the address is non segmented nor checked for its belongingss. If adequate forms are inputted to the address acknowledgment system during preparation, it will execute better than the acoustic-phonetic attack. In general, statistical form acknowledgment attack is used more than acoustic-phonetic attack because it is simpler to utilize, invariant to different address vocabularies, and more accurate ( higher public presentation ) ( 7, p.44 ) . Modern Approach to Automatic Speech Recognition With the handiness of computing machines and high velocity microprocessors, more research was done utilizing the immense computational power available to work out the address acknowledgment job. However, scientists, boulder clay now, Don # 8217 ; t cognize the solution. However, they were able to implement new attacks that proved to be much more efficient than earlier methods. Speech acknowledgment systems are able to recognize more words and with more truth ( 3, p.115 ) . Some of these attacks are presented below. Hidden Markov Models ( HMMs ) Address is divided into phonemes. Unfortunately, these phonemes do non stay the same, they change harmonizing to the environing phonemes ( 4, p.44 ) . HMMs are a tool to stand for these alterations mathematically. A Markov theoretical account consists of a figure of provinces linked together with each province matching to a alone end product. Each nexus between two provinces is characterised by a chance called transitional chance ( 4, p.44 ) . Traveling from one province to another O R staying in the same province is map of the corresponding transitional chance ( 2, p.50 ) . A classical illustration exemplifying Markov theoretical accounts is the undermentioned: see a three-state conditions system with province one being rainy, province two cloudy, and province three sunny. Such a system is shown in figure 2 ( transitional chances are added for account below ) . From the diagram, it is clear that if the current twenty-four hours is cheery, the chance of tomorrow being cloudy is 0.1, of tomorrow being rainy is 0.1, of tomorrow being cheery is 0.8 ( 2, p.50 ) . Figure 2: Three-state Markov theoretical account of the conditions ( 2, p.51 ) . This illustration is an discernible Markov theoretical account since we can look into the province we are presently in ( 2, p.50 ) . Nevertheless, speech acknowledgment systems use concealed Markov theoretical accounts since the address fragment is non discernible by the address acknowledgment system ( 2, p.50 ) . In concealed Markov theoretical accounts, a province can stand for many end products, hence, a chance distribution of all possible end products is associated with each province. A diagram of a three-state HMM is shown in figure 3 ( 4, p.44 ) . This figure shows that each province has five possible end products ( A, B, C, D, and E ) happening with a chance harmonizing to b # 8211 ; 1 ( s ) , b2 ( s ) , or b3 ( s ) . HMMs are double probabilistic since the passage from one province to the other and the end product generated at that province are probabilistic ( 4, p.44 ) . Therefore we notice that if we receive a sequence of end products from an HMM, we are non able to retrac e the sequence of provinces that the HMM passed by to acquire that sequence ( 4, p.44 ) . Looking at figure 3, it is apparent that an end product sequence of A-B-C for illustration, can be achieved by any sequence of three provinces ; nevertheless, each sequence of provinces has its ain chance of happening. In speech acknowledgment, each word is represented by a sequence of provinces ( 1, p.53 ) , hence, it is indispensable to happen this sequence for any sequence of end products. In fact, happening this sequence is tantamount to work outing the address acknowledgment job. Figure 3: Three-state hidden Markov theoretical account ( 4, p.44 ) . The sequence of provinces is determined harmonizing to its chance. However, look intoing all the chances of all possible sequences can be really clip consuming, particularly in speech acknowledgment HMMs that are much more complicated than our three-state illustration in figure 3. This job was solved utilizing an algorithm that utilises the fact that the chance of being in a certain province relies on the old province ( 4, p.44 ) . Training of an Automatic Speech Recognition System Based on HMMs As mentioned earlier, a major constituent of an HMM system are the chances between provinces and the chance distribution of each province. To hold a good address acknowledgment system, these chances must alter to factors like linguistic communication, possible figure of talkers, and so on ( 3, p.115 ) . Determining these chances is portion of what is known as developing the address acknowledgment system. This preparation procedure depends on whether we are covering with a speaker-dependent or a speaker-independent address acknowledgment system. In the first instance, address samples are taken from the user and the chances are determined consequently. In the 2nd instance, address samples are accumulated from many talkers in add-on to the text of what was said. In this instance, the preparation procedure is much more complicated since the spectrograph ( step of frequence vs. clip ) of the same word depends on the talker. A preparation procedure consists besides of implementing a dictionary keeping the vocabulary along with a grammar of permitted word sequences ( 4, p.42 ) . Sub-Word Unit of measurements In HMMs, each word is represented by a sequence of provinces ( 1, p.53 ) . A word is recognised from the sequence of provinces that is most likely associated with a sequence of end products. Therefore, the unit for such HMMs is the word. Many scientists believe that utilizing sub-words alternatively of words may better the quality of address acknowledgment ( 1, p.50 ) . To implement sub-word HMMs, a system of sub-word units must by selected. The simplest signifier of sub-word units are phones. Using phones as units for an HMM seems to be the right pick since phones are little in figure and swimmingly trained, but the public presentation of such an HMM is hapless since a phone is affected by the environing phones ( 1, p.53 ) . Another pick of sub-word units are syllables. Similar to phones, syllables are besides affected by environing syllables, but their figure is much greater than phones ( around 20 000 in English ) which make them difficult to develop ( 1, p.53 ) . A new sub-word unit, known as triphone, seem to be the most successful. Triphones solve the job of influence between sub-word units and their surrounding by patterning each phone harmonizing to its right and left neighbor ( 1, p.53 ) . As an illustration, the # 8216 ; t # 8217 ; in # 8216 ; still # 8217 ; will be modelled by the s-t-i triphone ( 1, p.53 ) . The immediate job one can believe of is the big figure of triphones since we are taking each phone and uniting it with all possible left and right phone neighbours. This job can be resolved by utilizing the fact that some triphones can be really similar since many neighboring phones can impact a phone the same manner ( 1, pp.53-54 ) . For illustration, the consequence on the # 8216 ; t # 8217 ; in # 8216 ; still # 8217 ; is similar to the 1 in # 8216 ; steal # 8217 ; ( 1, pp.53-54 ) . Even though the public presentation of the acknowledgment system is affected by such estimates, it remains within acceptable criterions ( 1, p.54 ) . Applications of Automatic Speech Recognition Systems With all the clip and money spend on researches on address acknowledgment systems, person may inquire about the applications of address acknowledgment. This portion will show some of the presently available applications along with some future applications of automatic address acknowledgment systems. Automated Call-Type Recognition An interesting and comparatively simple application of speech acknowledgment systems is machine-controlled call-type acknowledgment. In wage phones, operators are needed to find the call-type of the company ( 7, p.490 ) . Speech acknowledgment may be used alternatively of operators. Five types of calls are available: # 8216 ; roll up # 8217 ; , # 8216 ; naming card # 8217 ; , # 8216 ; operator # 8217 ; for operator assisted calls, # 8216 ; 3rd figure # 8217 ; for 3rd party charge calls, # 8216 ; individual # 8217 ; for person-to-person calls ( 7, p.490 ) . For this application, the address acknowledgment system must be speaker independent and capable of recognizing and descrying the five cardinal words mentioned above in a address sample ( 2, p.52 ) . The job in this application is the high sum of background noise since wage phones are normally available in public topographic points, nevertheless, this job can be solved utilizing appropriate address acknowledgment systems ( low-level talkers, etc. ) ( 2, p.52 ) . Datas Entry Entering informations utilizing address acknowledgment is really practical when executing a manual undertaking ( 6, p.102 ) . A address acknowledgment system for this application is extremely complex and structured since it should incorporate a big vocabulary. For informations entry, speaker-dependent or speaker-independent address acknowledgment systems are available even though speaker-independent systems perform better than speaker-dependent systems. They are besides available for distinct or uninterrupted address ( 6, p.102 ) . Data entry applications are still limited since the public presentation of speech acknowledgment systems in this field is still limited. Future applications utilizing automatic address acknowledgment systems With the increasing public presentation of automatic address acknowledgment systems, companies are more interested in incorporating speech acknowledgment systems in their merchandises. Car makers are interested in replacing all the levers, boss, and buttons by a address acknowledgment system capable of making everything, from raising temperature to locking doors and turning on the wireless ( 5, p.49 ) . In this manner, the electronic content of the auto is increased whereas the mechanical is reduced. This makes the auto easier to plan and construct, hence bing less ( 5, p.49 ) . Others think of using speech acknowledgment systems in kitchen contraptions such as dish washers, ovens, iceboxs. Air-conditioners might some twenty-four hours be voice controlled ( 5, p.49 ) . Decision The gradual but inevitable development of address acknowledgment systems will certainly take to a system that will one twenty-four hours compare to the perfect address acknowledgment device, the human being. New methods and algorithms are researched every twenty-four hours to better the public presentation of speech acknowledgment systems. Will we make a phase where keyboards, buttons, and all input devices become disused? Time will state. Bibliography 1. Holmes, W.J. , A ; Pearce, D.J.B. ( 1993, Vol.11, No.1 ) . Sub-word units for automatic address acknowledgment of any vocabulary. GEC Journal of Research, 49-58. 2. Juang, B.H. , A ; Perdue, R.J. , Jr, A ; Thomson, D.L. ( 1995, March / April ) . Deployable automatic address acknowledgment systems: Progresss and challenges. AT A ; T Technical Journal, 45-54. 3. Kay, R. ( 1998, January ) . Do you hear what I say? . Byte, 115-116. 4. Makhoul, J.F. , A ; Schwartz, R. ( 1997, December ) . The voice of the computing machine is heard in the land ( and it listens excessively! ) . Spectrum, 39-47. 5. Mannes, G. ( 1995, July ) . Machines that listens. Popular Mechanics, 47-49. 6. Markowitz, J. ( 1995, December ) . Talking to machines. Byte, 97-104 7. Rabiner, L. A ; Juang, B.H. ( 1993 ) . Fundamentalss of address acknowledgment. New Jersey. Prentice-Hall.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.