《信息工程專業(yè)英語》課件第7章_第1頁
《信息工程專業(yè)英語》課件第7章_第2頁
《信息工程專業(yè)英語》課件第7章_第3頁
《信息工程專業(yè)英語》課件第7章_第4頁
《信息工程專業(yè)英語》課件第7章_第5頁
已閱讀5頁,還剩273頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

Unit7AutomaticSpeechRecognition7.1Text7.2ReadingMaterials

7.1Text

Basicspeechrecognitionchallenge

Speechrecognitionistheprocessofconvertinganacousticsignal,capturedbyamicrophoneoratelephone,toasetofwords.FlowdiagramofspeechrecognitionisshowninFig7.1.Therecognizedwordscanbethefinalresults,asforapplicationssuchascommandscontrol,dataentryanddocumentpreparation.Fig7.1Flowdiagramofspeechrecognition

In1992,theU.S.NationalScienceFoundationsponsoredaworkshoptoidentifythekeyresearchchallengesintheareaofhumanlanguagetechnology,andtheinfrastructureneededtosupportthework.

Researchinthefollowingareasforspeechrecognitionwereidentified:

Robustness:Inarobustsystem,performancedegradesgracefully(ratherthancatastrophically)asconditionsbecomemoredifferentfromthoseunderwhichitwastrained.Differencesinchannelcharacteristicsandacousticenvironmentshouldreceiveparticularattention.

Portability:Portabilityreferstothegoalofrapidlydesigning,developinganddeployingsystemsfornewapplications.Atpresent,systemstendtosuffersignificantdegradationwhenmovedtoanewtask.Inordertopeakperformance,theymustbetrainedonexamplesspecifictothenewtask,whichistimeconsumingandexpensive.

Adaptation:Howcansystemscontinuouslyadapttochangingconditionsandimprovethroughuse?Suchadaptationcanoccuratmanylevelsinsystems,subwordmodels,wordpronunciations,languagemodels,etc.

LanguageModeling:Currentsystemsusestatisticallanguagemodelstohelpreducethespaceandresolveacousticambiguity.Asvocabularysizegrowsandotherconstraintsarerelaxedtocreatemorehabitablesystems,itwillbeincreasinglyimportanttogetasmuchconstraintaspossiblefromlanguagemodels;perhapsincorporatingsyntacticandsemanticconstraintsthatcannotbecapturedbypurelystatisticalmodels.

ConfidenceMeasures:Mostspeechrecognitionsystemsassignscorestohypothesesforthepurposeofrankorderingthem.Thesescoresdonotprovideagoodindicationofwhetherahypothesisiscorrectornot,justthatitisbetterthantheotherhypotheses.Aswemovetotasksthatrequireactions,weneedbettermethodstoevaluatetheabsolutecorrectnessofhypotheses.

Out-of–VocabularyWords:Systemsaredesignedforusewithaparticularsetofwords,butsystemusersmaynotknowexactlywhichwordsareinthesystemvocabulary.Thisleadstoacertainpercentageofout-of-vocabularywordsinnaturalconditions.Systemsmusthavesomemethodofdetectingsuchout-of-vocabularywords,ortheywillendupmappingawordfromthevocabularyontotheknownword,causinganerror.

SpontaneousSpeech:Systemsthataredeployedforrealusemustdealwithavarietyofspontaneousspeechphenomena,suchasfilledpauses,falsestarts,hesitations,ungrammaticalconstructionsandothercommonbehaviorsnotfoundinredspeech.DevelopmentontheATIStaskhasresultedinprogressinthisarea,butmuchwordremainstobedone.

Prosody:Prosodyreferstoacousticstructurethatextendsoverseveralsegmentsorwords.Stress,intonationandrhythmconveyimportantinformationforwordrecognitionandtheuser’sintentions.Currentsystemsdonotcaptureprosodicstructure.Howtointegrateprosodicinformationintotherecognitionarchitectureisacriticalquestionthathasnotyetbeenanswered.

ModelingDynamics:Systemsassumeasequenceofinputframeswhicharetreaterasiftheywereindependent.Butitisknownthatperceptualcuesforwordsandphonemesrequiretheintegrationoffeaturesthatreflectthemovementsofthearticulators,whicharedynamicinnature.Howtomodeldynamicsandincorporatethisinformationintorecognitionsystemsisanunsolvedproblem.

Technicalwordsandphrases

acoustic adj.聲學(xué)的;音響的;聽覺的

sponsored vt.贊助;發(fā)起n.贊助者;主辦者;保證人

infrastructure n.基礎(chǔ)設(shè)施;公共建設(shè)

robustness n.穩(wěn)定性;穩(wěn)健性;健壯性

gracefully adv.優(yōu)雅地;溫文地

catastrophically adv.突變(catastrophe的變形),災(zāi)難性地

portability n.可移植性;輕便

ambiguity n.含糊;不明確

incorporate

vt.包含,吸收

syntactic adj.句法的;語法的

semantic

adj.語義的;語義學(xué)的

hypotheses

n.假定;臆測(hypothesis的復(fù)數(shù))

spontaneous adj.無意識(shí)的;自發(fā)的;自然的

prosody n.韻律學(xué)

intonation n.聲調(diào),語調(diào)

articulators n.發(fā)音之人或物;發(fā)音糾正器

speechrecognition 語音識(shí)別

refersto 指的是

languagemodels 語言模型

filledpauses 停頓

ungrammaticalconstructions 非法結(jié)構(gòu)

integrateinto 合并

modelingdynamics 動(dòng)力學(xué)建模

perceptualcues 知覺線索

ATS(AutomaticTerminalInformationSystem)自動(dòng)終端情報(bào)服務(wù)

7.1.1Exercises

1.PutthePhrasesintoEnglish

(1)指令控制; (2)資料輸入;

(3)關(guān)鍵挑戰(zhàn); (4)聲學(xué)環(huán)境;

(5)絕對正確性。

2.PutthePhrasesintoChinese

(1)anacousticsignal;

(2)documentpreparation;

(3)subwordmodels;

(4)prosodicstructure;

(5)modelingdynamics;

(6)perceptualcues.

3.Translation

(1)Inarobustsystem,performancedegradesgracefully(ratherthancatastrophically)asconditionsbecomemoredifferentfromthoseunderwhichitwastrained.

(2)Inordertopeakperformance,theymustbetrainedonexamplesspecifictothenewtask,whichistimeconsumingandexpensive.

(3)Thesescoresdonotprovideagoodindicationofwhetherahypothesisiscorrectornot,justthatitisbetterthantheotherhypotheses.

(4)Systemsaredesignedforusewithaparticularsetofwords,butsystemusersmaynotknowexactlywhichwordsareinthesystemvocabulary.

7.1.2參考譯文

語音識(shí)別是把從麥克風(fēng)或者電話中捕捉到的聽覺信號(hào)轉(zhuǎn)變?yōu)橐幌盗袉卧~的過程。語音識(shí)別流程圖如圖7.1所示。識(shí)別的單詞可以作為最終的結(jié)果如指令控制、資料輸入和文件準(zhǔn)備的應(yīng)用。

1992年,美國國家科學(xué)基金會(huì)主辦一場研討會(huì)來鑒定人類語言領(lǐng)域研究的關(guān)鍵挑戰(zhàn),并為這項(xiàng)工作提供基礎(chǔ)設(shè)施。

研究語音識(shí)別從以下幾個(gè)方面來鑒定:

魯棒性:在一個(gè)堅(jiān)固的系統(tǒng)中,當(dāng)環(huán)境與系統(tǒng)所匹配的環(huán)境不同時(shí),系統(tǒng)性能緩慢地降低了(而不是變形)。信道特征和聲學(xué)環(huán)境的差異應(yīng)受到特別的注意。

可移植性:可移植性指的是為新的應(yīng)用迅速地設(shè)計(jì)、發(fā)展和開發(fā)系統(tǒng)。目前,當(dāng)系統(tǒng)移植到一個(gè)新任務(wù)時(shí)系統(tǒng)性能顯著退化。為了達(dá)到最佳性能,必須致力于研究特定新任務(wù)的例子,這項(xiàng)工作很耗時(shí)而且花費(fèi)巨大。

適應(yīng)性:如何讓系統(tǒng)不斷適應(yīng)環(huán)境的改變并提高使用性能?這樣的適應(yīng)性存在于系統(tǒng)的很多層面中,如子字模、單詞發(fā)音、語言模型等。

語言建模:目前的系統(tǒng)用統(tǒng)計(jì)語言模型來減少空間,解決聲學(xué)的模糊問題。隨著單詞尺寸的增長,同時(shí)放寬了其他方面約束去創(chuàng)造更加實(shí)用的系統(tǒng),從語言模型中獲得盡可能多的約束條件將會(huì)變得越來越重要;可能單純的統(tǒng)計(jì)模型不能獲取合語法和語義的限制。

信心對策:大部分的語音識(shí)別系統(tǒng)由假設(shè)分配分?jǐn)?shù)來排序。這些分?jǐn)?shù)并不是說明這個(gè)假設(shè)是對是錯(cuò),而是說明這個(gè)假設(shè)比其他的更合適。當(dāng)我們接受需要操作的任務(wù)時(shí),我們需要更好的辦法來評(píng)估假設(shè)的絕對正確性。

詞匯以外的單詞:系統(tǒng)為使用者設(shè)計(jì)了一系列詳細(xì)的單詞,但是使用者可能不會(huì)確切地知道哪些單詞在系統(tǒng)的詞匯表中。這會(huì)導(dǎo)致一定比例詞匯表以外的單詞出現(xiàn)。系統(tǒng)必須采用一些方法來檢測出這些詞匯的出現(xiàn),或者直接在已知的詞匯中停止尋找,否則會(huì)導(dǎo)致錯(cuò)誤。

無意識(shí)語音:系統(tǒng)在實(shí)際使用中必須解決多種多樣的無意識(shí)語音現(xiàn)象,比如充滿了停頓、錯(cuò)誤的開始、猶豫、不合語法結(jié)構(gòu)和其他容易被語音誤解的行為。ATIS的發(fā)展促進(jìn)了這個(gè)領(lǐng)域的發(fā)展,但是還有很多問題需要解決。

韻律:韻律指的是遍布幾個(gè)片段和單詞的聲學(xué)結(jié)構(gòu)。重讀、聲調(diào)和節(jié)奏傳遞著詞匯識(shí)別和用戶意圖的重要信息?,F(xiàn)在的系統(tǒng)沒有捕捉到韻律學(xué)的結(jié)構(gòu)。怎樣把韻律學(xué)的信息和識(shí)別體系結(jié)合起來,這個(gè)問題尚未得到解答。

動(dòng)力學(xué)建模:系統(tǒng)假設(shè)一系列的輸入幀,這些輸入幀就像獨(dú)立的處理器。但是詞匯和音素的知覺線索要求特征的集合反映出發(fā)音器官的運(yùn)轉(zhuǎn),這個(gè)運(yùn)轉(zhuǎn)在實(shí)際中是動(dòng)態(tài)的。怎樣建立動(dòng)態(tài)模型和將這些信息合并到認(rèn)知系統(tǒng)中還是一個(gè)尚待解決的問題。

7.2ReadingMaterials

7.2.1MajorComponentsinaSpeechRecognitionSystem

TheSpeechCommunicationsGroupatSPERRYUNIVACDefenseSystemsisdevelopingalinguistically-orientedprocedureforrecognizingwords,phrases,andnaturalsentencesbycomputer.Themajorcomponentsofthecurrentspeechrecognitionsystemperformacousticandphoneticanalysis,phoneticsegmentation,andlexicalmatchingandscoring.

Theacousticprocessingisbasedonalinear-predictivespectralanalysisofthespeechsignal.Soundsareclassifiedbymanner,place,andvoicingusingformantfrequenciesandotherspectralfunctions,aswellasinformationaboutsyllableboundariesandnuclei.Alinearsequenceofanalysissegmentsiscreated,andmatchedagainstthelexiconusingascoringmatrixthatranksanalysis-lexicalsegmentpairsbytheirexpectedconfusions.Wordsequencesareprogressivelyformedandrankedagainsttheentireinputtodeterminethemostlikelyphrasesspoken.

Whentherecognitionsystemwastestedona31-wordvocabularyfromtwomalespeakers,singlewordrecognitionscoresof95%correctwereobtainedwhenthetasksyntaxwasused.Preliminaryresultsforrecognizingconnectedwordsequencesfromthreemalespeakersrangefrom54to74%forataskwithconstrainedwordorder.Currentplansforenhancingtherecognitionsystemincludetheincorporationofcomponentsforphonologicalrules,speakernormalization,andprosodicguidelines.Byaddingmorepowerfulproceduresforsyntacticandsemanticanalysis,thesystemwillbeextendedfromtherecognitionofseveral-wordnounphrasestotheunderstandingofmorenaturalsentences.

Duringthepastsevenyears,theSpeechCommunicationsGroupatSPERRYUNIVAChasbeendevelopingeffective,proceduresforverbalcommunicationwithcomputers.Thelinguistically-orientedtechniquesbeingdevelopedforthecomputerrecognitionofspeecharedesignedtoaccommodateavarietyofvocabularieswithoutextensiveadjustmentandanumberofsimilarspeakerswithoutextensivetraining.Inaddition,theseprocedurescanbeappliedtobothisolatedwordsandconnectedwordsequences,andtheycangracefullyevolvetounderstandmorenaturalsentenceswiththeadditionofsyntacticandsemanticanalysiscapabilities.

Theprinciplecomponentscomprisingthesysteminclude:(1)acousticparameterextraction—arepresentationofthespeechsignalintermsoftimevaryingsource,resonance,andenergyfunctions,(2)linguisticfeatureextraction—aderivationoftheinformation-carryingattributesfromtheparameters,includingprosodicandphoneticcontent,(3)segmentalstructuring—aphonologicaltransformationandorganizationofthelinguisticfeaturesinaformatconsistentwithlexicalmatching,(4)lexicalcreation—aprocessforprovidingdescriptionsofthewordsinthevocabularyintermsoflikelyphonologicalalternativesofthelinguisticfeaturestobedeterminedduringtheanalysis,

(5)matchingofanalysisandlexicalrepresentations—alignmentandscoringoffeaturerepresentationsandimpositionoftaskrelatedconstraints.Currently,theprocessoflexicalcreationandupdatingaremanual,althoughsomeworkisinprogresstoautomateaspectsofthisoperation.Thefewanalyticphonologicalrulesthathavebeenimplementedarepartofthesegmentalstructuringprocess,andnotyetpartofaseparatecomponent.Atthefeaturelevel,syllablestressandphraseboundariesareavailable,butnotcurrentlyusedbythesystemaspartoftherecognitionprocess.Spectralanalysisisnowusedtocalculateenergyfunctionswhilethehardwareenergyfiltersarebeingimplemented.

7.2.2PatternRecognition

Thedisciplineofpatternrecognitionisusuallydividedintothestatisticalandthestructuralapproach.Instatisticalpatternrecognition,objectsorpatternsaregivenbyfeaturevectors.Hence,apatternisformallyrepresentedasavectorconsistingofnmeasurements,orfeaturevalues,andcanbeunderstoodasapointinthen-dimensionalrealspace,i.e.x=x1;…;xn∈Rn.Representingpatternsbyfeaturevectorsx∈Rnoffersanumberofusefulproperties,inparticular,themathematicalwealthofoperationsavailableinavectorspace.

Forexample,quantitiessuchasthesum,theproduct,themean,orthedistanceoftwoentitiesarewelldefinedinavectorspaceand,moreover,canbeefficientlycomputed.Theconvenienceandlowcomputationalcomplexityofalgorithmsthatusefeaturevectorsastheirinputhaveeventuallyresultedinarichrepositoryofalgorithmictoolsforstatisticalpatternrecognition.However,theuseoffeaturevectorsimplicatestwolimitations.

First,asvectorsalwaysrepresentapredefinedsetoffeatures,allvectorsinagivenapplicationhavetopreservethesamelengthregardlessofthesizeorcomplexityofthecorrespondingobjects.Second,thereisnodirectpossibilitytodescribebinaryorhigher-orderrelationshipsthatmightexistamongdifferentpartsofapattern.Thesetwodrawbacksaresevere,particularlywhenthepatternsunderconsiderationarecharacterizedbycomplexstructuralrelationshipsratherthanthestatisticaldistributionofafixedsetoffeatures.

Structuralpatternrecognition,bycontrast,isbasedonsymbolicdatastructures,suchasstrings,trees,orgraphsforpatternrepresentation.Graphs,whichconsistofafinitesetofnodesconnectedbyedges,isthemostgeneralrepresentationformalism,andtheotherdatatypescommonlyusedinstructuralpatternrecognitionarespecialcasesofgraphs.Inparticular,stringsandtreesaresimpleinstancesofgraphs.Intheremainderofthepresentpaperwewillfocusongraphs.Butthereadershouldkeepinmindthatstringsandtreesarealwaysincludedasspecialcases.

Theabovementioneddrawbacksoffeaturevectors,namelythesizeconstraintandthelackingabilitytorepresentstructuralrelationships,canbeovercomebygraphbasedrepresentations.Infact,graphsarenotonlyabletodescribepropertiesofanobject,butalsobinaryrelationshipsamongdifferentpartsoftheunderlyingobject,bymeansofedges.Notethattheserelationshipscanbeofvariousnature,viz.spatial,temporal,orconceptual.Moreover,graphsarenotconstrainedtoafixedsize,i.e.thenumberofnodesandedgesisnotlimitedaprioriandcanbeadaptedtothesizeorthecomplexityofeachindividualobjectunderconsideration.

Onedrawbackofgraphsarisesfromthefactthatthereislittlemathematicalstructureinthedomainofgraphs.Forexample,computingthe(weighted)sumortheproductofapairofentities,whichareelementaryoperationsrequiredinmanyclassificationandclusteringalgorithms,isnotpossibleinthedomainofgraphs,orisatleastnotdefinedinastandardizedway.

7.2.3HiddenMarkovModeling

ThebasictheoryofMarkovchainshasbeenknowntomathematiciansandengineersforcloseto80

years,butitisonlyinthepastdecadethatithasbeenappliedexplicitlytoproblemsinspeechprocessing.Oneof

themajorreasonswhyspeechmodels,basedonMarkovchains,havenotbeendevelopeduntilrecentlywasthelackof

amethodforoptimizingtheparametersoftheMarkovmodeltomatchobservedsignalpatterns.

Suchamethodwasproposedinthelate1960’s

andwasimmediatelyappliedtospeechprocessinginseveralresearchinstitutions.ContinuedrefinementsinthetheoryandimplementationofMarkovmodelingtechniqueshavegreatlyenhancedthemethod,leadingtoawiderangeofapplicationsofthesemodels.

Assumeyouaregiventhefollowingproblem.Arealwordprocessproducesasequenceofobservablesymbols.Thesymbolscouldbediscrete(outcomesofcointossingexperiments,charactersfromafinitealphabet,quantizedvectorsfromacodebook,etc.)orcontinuous(speechcoefficients,etc.).Yourjobistobuildasignalmodelthatexplainsandcharacterizestheoccurrenceoftheobservedsymbols.Ifsuchasignalmodelisobtainable,itthencanbeusedlatertoidentifyorrecognizeothersequencesofobservations.

Inattackingsuchaproblem,somefundamentaldecisions,guidedbysignalandsystemtheory,mustbemade.Forexample,onemustdecideontheformofthemodel,linearornon-linear,time-varyingortime-invariant,deterministicorstochastic.Dependingonthesedecisions,aswellasothersignalprocessingconsiderations,severalpossiblesignalmodelscanbeconstructed.

Tofixideas,considermodelingapuresinewave.Ifwehavereasontobelievethattheobservedsymbolsarefromapuresinewave,thenallthatwouldneedtobemeasuredistheamplitude,frequencyandperhapsphaseofthesinewaveandanexactmodel,whichexplainstheobservedsymbols,wouldresult.

Considernextasomewhatmorecomplicatedsignal-namelyasinewaveimbeddedinnoise.Thenoisecomponentsofthesignalmakethemodelingproblemmorecomplicatedbecauseinordertoproperlyestimatethesinewaveparameters(amplitude,frequency,phase)onehastotakeintoaccountthecharacteristicsofthenoisecomponent.

Linearsystemmodels

Theconceptsbehindtheaboveexampleshavebeenwellstudiedinclassicalcommunicationtheory.Thevarietyandtypesofrealwordprocesses,however,doesnotstophere.Linearsystemmodels,whichmodeltheobservedsymbolsastheoutputofalinearsystemexcitedbyanappropriatesource,formanotherimportantclassofprocessesforsignalmodelingandhaveprovenusefulforawidevarietyofapplications.

Forexamples,“shorttime”segmentsofspeechsignalscanbeeffectivelymodeledastheoutputofanall-polefilterexcitedbyappropriatesourceswithessentiallyaflatspectralenvelope.Thesignalmodelingtechnique,inthiscase,thusinvolvesdeterminationofthelinearfiltercoefficientsand,insomecases,theexcitationparameters.Obviously,spectralanalysesofotherkindsalsofallwithinthiscategory.

Onecanfurtherincorporatetemporalvariationsofthesignalintothelinearsystemmodelbyallowingthefiltercoefficients,ortheexcitationparameters,tochangewithtime.Infact,manyrealworldprocessescannotbemeaningfullymodeledwithoutconsideringsuchtemporalvariation.Speechsignalsareoneexamplesofsuchprocesses.Thereareseveralwaystoaddresstheproblemofmodelingtemporalvariationofasignal.

Asmentionedabove,withina“shorttime”period,somephysicalsignals,suchasspeech,canbeeffectivelymodeledbyasimplelineartime-invariantsystemwiththeappropriateexcitation.Theeasiestwaythentoaddressthetime-varyingnatureoftheprocessistoviewitasadirectconcatenationofthesesmaller“shorttime”segments,eachsuchsegmentbeingindividuallyrepresentedbyalinearsystemmodel.

Inotherwords,theoverallmodelisasynchronoussequenceofsymbolswhereeachofthesymbolsisalinearsystemmodelrepresentingashortsegmentoftheprocess.Inasensethistypeofapproachmodelstheobservedsignalusingrepresentativetokensofthesignalitself(orsomesuitablyaveragedsetofsuch,signalsifwehavemultipleobservations).

Time-varyingprocesses

Modelingtime-varyingprocesseswiththeaboveapproachassumesthateverysuchshort-timesegmentofobservationisaunitwithapre-chosenduration.Ingeneral,however,theredoesn’texistapreciseproceduretodecidewhattheunitdurationshouldbeso

thatboththetime-invariantassumptionholds,andtheshort-timelinearsystemmodels(aswellasconcatenationof

themodels)aremeaningful.Inmostphysicalsystems,thedurationofashort-timesegmentisdeterminedempirically.

Inmanypro

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論