龍星計(jì)劃課程信息檢索CourseOverviewBackground_第1頁(yè)
龍星計(jì)劃課程信息檢索CourseOverviewBackground_第2頁(yè)
龍星計(jì)劃課程信息檢索CourseOverviewBackground_第3頁(yè)
龍星計(jì)劃課程信息檢索CourseOverviewBackground_第4頁(yè)
龍星計(jì)劃課程信息檢索CourseOverviewBackground_第5頁(yè)
已閱讀5頁(yè),還剩60頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

龍星計(jì)劃課程:信息檢索

CourseOverview&BackgroundChengXiangZhai(翟成祥)DepartmentofComputerScienceGraduateSchoolofLibrary&InformationScienceInstituteforGenomicBiology,StatisticsUniversityofIllinois,Urbana-Champaign2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20231OutlineCourseoverviewEssentialbackgroundProbability&statisticsBasicconceptsininformationtheoryNaturallanguageprocessing2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20232CourseOverview2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20233CourseObjectivesIntroducethefieldofinformationretrieval(IR)Foundation:Basicconcepts,principles,methods,etcTrends:FrontiertopicsPreparestudentstodoresearchinIRand/orrelatedfieldsResearchmethodology(generalandIR-specific)ResearchproposalwritingResearchproject(tobefinishedafterthelectureperiod)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20234PrerequisitesProficiencyinprogramming(C++isneededforassignments)Knowledgeofbasicprobability&statistics(wouldbenecessaryforunderstandingalgorithmsdeeply)Bigplus:knowledgeofrelatedareasMachinelearningNaturallanguageprocessingDatamining…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20235CourseManagementTeachingstaffInstructor:ChengXiangZhai(UIUC)Teachingassistants:HongfeiYan(PekingUniv)BoPeng(PekingUniv)Coursewebsite:/~course/cs410/Coursegroupdiscussion:Questions:Firstpostthequestionsonthegroupdiscussionforum;ifquestionsareunanswered,bringthemtotheofficehours(firstofficehour:June23,2:30-4:30pm)

2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20236Format&RequirementsLecture-based:Morninglectures:Foundation&TrendsAfternoonlectures:IRresearchmethodologyReadingsareusuallyavailableonline2Assignments(basedonmorninglectures)Coding(C++),experimentingwithdata,analyzingresults,openexplorations(~5hourseach)Finalexam(basedonmorninglectures):1:30-4:30pm,June30.Practicequestionswillbeavailable2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20237Format&Requirements(cont.)Courseproject(Mini-TREC)WorkinteamsPhaseI:createtestcollections(~3hours,donewithinlectureperiod)PhaseII:developalgorithmsandsubmitresults(doneinthesummer)Researchprojectproposal(basedonafternoonlectures)Workinteams2outlinedonewithinlectureperiodfullproposal(5pages)duelater2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20238CoverageofTopics:IRvs.TIMTextInformationManagement(TIM)InformationRetrieval(IR)Multimedia,etcIRandTIMwillbeusedinterchangeably2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20239WhatisTextInfo.Management?TIMisconcernedwithtechnologiesformanagingandexploitingtextinformationeffectivelyandefficientlyImportanceofmanagingtextinformationThemostnaturalwayofencodingknowledgeThinkaboutscientificliteratureThemostcommontypeofinformationHowmuchtextualinformationdoyouproduceandconsumeeveryday?ThemostbasicformofinformationItcanbeusedtodescribeothermediaofinformationThemostusefulformofinformation!2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202310TextManagementApplicationsAccessMiningOrganizationSelectinformationCreateKnowledgeAdd

Structure/Annotations2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202311ExamplesofText

ManagementApplicationsSearchWebsearchengines(Google,Yahoo,…)Librarysystems…RecommendationNewsfilterLiterature/movierecommenderCategorizationAutomaticallysortingemails…Mining/ExtractionDiscoveringmajorcomplaintsfromemailincustomerserviceBusinessintelligenceBioinformatics…Manyothers…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202312ElementsofTextInfoManagementTechnologiesSearchText

FilteringCategorizationSummarization

ClusteringNaturalLanguageContentAnalysisExtractionMiningVisualizationRetrievalApplicationsMiningApplicationsInformationAccessKnowledgeAcquisitionInformationOrganizationFocusofthecourse2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202313TextManagementandOtherAreasTMAlgorithmsUserTextStorageCompressionProbabilisticinferenceMachinelearningNaturallanguageprocessingHuman-computerinteractionTMApplicationsSoftwareengineeringWebComputerscienceInformationScience2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202314RelatedAreasInformationRetrievalDatabasesLibrary&InfoScienceMachineLearningPatternRecognitionDataMiningNaturalLanguageProcessingApplicationsWeb,Bioinformatics…StatisticsOptimizationSoftwareengineeringComputersystemsModelsAlgorithmsApplicationsSystems2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202315Publications/Societies(Incomplete)ACMSIGIRVLDB,PODS,ICDEASISLearning/MiningNLPApplicationsStatisticsSoftware/systemsCOLING,EMNLP,ANLPHLTICML,NIPS,UAIRECOMB,PSBJCDLInfo.ScienceInfoRetrievalACMCIKMDatabasesACMSIGMODACLICMLAAAIACMSIGKDDISMBWWWSOSPOSDITREC2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202316Schedule:availableat/~course/cs410/2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202317DateMorningLecture(8:30-11:30)

(Foundation&Trends)AfternoonLecture(1:30-2:30)

(ResearchMethodology)Notes6/21SatCourseoverviewandbackground(probability,statistics,informationtheory,NLP)

Slides:ppt

LectureNotes:Prob&Stat,InfoTheory,NLP

Readings:

Bush45,Rosenfeld'snoteonestimation,Rosenfeld'snoteoninformationtheory,IntroductiontoIRresearch

Slides:ppt6/22SunInformationRetrievalOverview(part1)(basicconcepts,history,evaluation)

LectureNotes:textretrieval,

Readings:

Singhal'sreview(Error),Book-Ch8.TRECmeasures

PrepareyourselfforIRresearch

Mini-TRECtaskspecificationready6/23MonInformationRetrievalOverview(Part2)(basicretrievalmodels,systemimplementation,applications)FindagoodIRresearchtopic

Assign#1out6/24TueStatisticalLanguageModelsforIR(probabilisticretrievalmodels,KL-divergencemodel,specialretrievaltasks)FormulateIRresearchhypotheses

Assign#2out6/25WedModernRetrievalFrameworks(axiomatic,decision-theoretic)Finalexampracticequestionsavailable6/26ThuPersonalizedSearch&UserModeling(implicitfeedback,explicitfeedback,activefeedback)Test/RefineIRresearchhypotheses

Proposalteamdue6/27FriNaturalLanguageProcessingforIR(phraseindexing,dependencyanalysis,sensedisambiguation,sentimentretrieval)WriteandpublishanIRpaper

6/28SatNoclassMini-TRECPhaseITaskdue6/29SunTopicModelsforTextmining(PLSA,LDA,extensionsandapplications)Proposaloutlinedue6/30MonFutureofIR,coursesummary

FinalExam(1:30-4:30)Assigns#1,#2due7/5SatResearchproposaldue7/?Mini-TRECdatasetsready8/?Mini-TRECPhaseIITaskdue2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202318EssentialBackgroud1:

Probability&Statistics

2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202319Prob/Statistics&TextManagementProbability&statisticsprovideaprincipledwaytoquantifytheuncertaintiesassociatedwithnaturallanguageAllowustoanswerquestionslike:Giventhatweobserve“baseball”threetimesand“game”onceinanewsarticle,howlikelyisitabout“sports”?(textcategorization,informationretrieval)Giventhatauserisinterestedinsportsnews,howlikelywouldtheuseruse“baseball”inaquery? (informationretrieval)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202320BasicConceptsinProbabilityRandomexperiment:anexperimentwithuncertainoutcome(e.g.,tossingacoin,pickingawordfromtext)Samplespace:allpossibleoutcomes,e.g.,Tossing2faircoins,S={HH,HT,TH,TT}Event:ES,EhappensiffoutcomeisinE,e.g.,E={HH}(allheads)E={HH,TT}(sameface)Impossibleevent({}),certainevent(S) ProbabilityofEvent:1P(E)0,s.t.P(S)=1(outcomealwaysinS)P(AB)=P(A)+P(B)if(AB)=(e.g.,A=sameface,B=differentface)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202321BasicConceptsofProb.(cont.)

ConditionalProbability:P(B|A)=P(AB)/P(A)P(AB)=P(A)P(B|A)=P(B)P(A|B)So,P(A|B)=P(B|A)P(A)/P(B)(Bayes’Rule)Forindependentevents,P(AB)=P(A)P(B),soP(A|B)=P(A)Totalprobability:IfA1,…,AnformapartitionofS,thenP(B)=P(BS)=P(BA1)+…+P(BAn)(why?)So,P(Ai|B)=P(B|Ai)P(Ai)/P(B)=P(B|Ai)P(Ai)/[P(B|A1)P(A1)+…+P(B|An)P(An)]ThisallowsustocomputeP(Ai|B)basedonP(B|Ai)

2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202322InterpretationofBayes’RuleHypothesisspace:H={H1,

…,

Hn} Evidence:EIfwewanttopickthemostlikelyhypothesisH*,wecandropP(E)PosteriorprobabilityofHiPriorprobabilityofHiLikelihoodofdata/evidenceifHiistrue2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202323RandomVariableX:S

(“measure”ofoutcome)E.g.,numberofheads,allsameface?,…EventscanbedefinedaccordingtoXE(X=a)={si|X(si)=a}E(Xa)={si|X(si)a}So,probabilitiescanbedefinedonXP(X=a)=P(E(X=a))P(aX)=P(E(aX))Discretevs.continuousrandomvariable(thinkof“partitioningthesamplespace”)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202324AnExample:DocClassificationX1:

[sport1011]TopicthecomputergamebaseballX2:[sport1111]X3:[computer1100]X4:[computer1110]X5:[other0011]

……

For3topics,fourwords,n=?EventsEsport={xi|topic(xi)=“sport”}Ebaseball={xi|baseball(xi)=1}Ebaseball,computer={xi|baseball(xi)=1&computer(xi)=0}SampleSpaceS={x1,…,xn}ConditionalProbabilities:P(Esport|Ebaseball),P(Ebaseball|Esport),P(Esport|Ebaseball,

computer),... Aninferenceproblem:Supposeweobservethat“baseball”ismentioned,howlikelythetopicisabout“sport”?But,P(B=1|T=“sport”)=?,P(T=“sport”)=?P(T=“sport”|B=1)

P(B=1|T=“sport”)P(T=“sport”)ThinkingintermsofrandomvariablesTopic:T{“sport”,“computer”,“other”},“Baseball”:B{0,1},…P(T=“sport”|B=1),P(B=1|T=“sport”),...2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202325GettingtoStatistics...P(B=1|T=“sport”)=?(parameterestimation)Ifweseetheresultsofahugenumberofrandomexperiments,thenBut,whatifweonlyseeasmallsample(e.g.,2)?Isthisestimatestillreliable?Ingeneral,statisticshastodowithdrawingconclusionsonthewholepopulationbasedonobservationsofasample(data)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202326ParameterEstimationGeneralsetting:Givena(hypothesized&probabilistic)modelthatgovernstherandomexperimentThemodelgivesaprobabilityofanydatap(D|)thatdependsontheparameterNow,givenactualsampledataX={x1,…,xn},whatcanwesayaboutthevalueof?Intuitively,takeyourbestguessof--“best”means“bestexplaining/fittingthedata”Generallyanoptimizationproblem2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202327MaximumLikelihoodvs.BayesianMaximumlikelihoodestimation“Best”means“datalikelihoodreachesmaximum”Problem:smallsampleBayesianestimation“Best”meansbeingconsistentwithour“prior”knowledgeandexplainingdatawellProblem:howtodefineprior?2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202328IllustrationofBayesianEstimationPrior:p()Likelihood:p(X|)X=(x1,…,xN)Posterior:p(|X)p(X|)p():priormodeml:MLestimate:posteriormode2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202329MaximumLikelihoodEstimateData:adocumentdwithcountsc(w1),…,c(wN),andlength|d|Model:multinomialdistributionMwithparameters{p(wi)}Likelihood:p(d|M)Maximumlikelihoodestimator:M=argmaxMp(d|M)We’lltunep(wi)tomaximizel(d|M)UseLagrangemultiplierapproachSetpartialderivativestozeroMLestimate2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202330WhatYouShouldKnowProbabilityconcepts:samplespace,event,randomvariable,conditionalprob.multinomialdistribution,etcBayesformulaanditsinterpretationStatistics:Knowhowtocomputemaximumlikelihoodestimate2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202331EssentialBackground2:

BasicConceptsinInformationTheory

2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202332InformationTheoryDevelopedbyShannoninthe40sMaximizingtheamountofinformationthatcanbetransmittedoveranimperfectcommunicationchannelDatacompression(entropy)Transmissionrate(channelcapacity)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202333BasicConceptsinInformationTheoryEntropy:MeasuringuncertaintyofarandomvariableKullback-Leiblerdivergence:comparingtwodistributionsMutualInformation:measuringthecorrelationoftworandomvariables2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202334Entropy:MotivationFeatureselection:Ifweuseonlyafewwordstoclassifydocs,whatkindofwordsshouldweuse?P(Topic|“computer”=1)vsp(Topic|“the”=1):whichismorerandom?Textcompression:Somedocuments(lessrandom)canbecompressedmorethanothers(morerandom)Canwequantifythe“compressibility”?Ingeneral,givenarandomvariableXfollowingdistributionp(X),Howdowemeasurethe“randomness”ofX?HowdowedesignoptimalcodingforX?2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202335Entropy:DefinitionEntropyH(X)measurestheuncertainty/randomnessofrandomvariableXExample:P(Head)H(X)1.02023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202336Entropy:PropertiesMinimumvalueofH(X):0WhatkindofXhastheminimumentropy?MaximumvalueofH(X):logM,whereMisthenumberofpossiblevaluesforXWhatkindofXhasthemaximumentropy?Relatedtocoding2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202337InterpretationsofH(X)Measuresthe“amountofinformation”inXThinkofeachvalueofXasa“message”ThinkofXasarandomexperiment(20questions)MinimumaveragenumberofbitstocompressvaluesofXThemorerandomXis,thehardertocompressAfaircoinhasthemaximuminformation,andishardesttocompressAbiasedcoinhassomeinformation,andcanbecompressedto<1bitonaverageAcompletelybiasedcoinhasnoinformation,andneedsonly0bit2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202338ConditionalEntropyTheconditionalentropyofarandomvariableYgivenanotherX,expresseshowmuchextrainformationonestillneedstosupplyonaveragetocommunicateYgiventhattheotherpartyknowsXH(Topic|“computer”)vs.H(Topic|“the”)?2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202339CrossEntropyH(p,q)WhatifweencodeXwithacodeoptimizedforawrongdistributionq?Expected#ofbits=?Intuitively,H(p,q)H(p),andmathematically,2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202340Kullback-LeiblerDivergenceD(p||q)WhatifweencodeXwithacodeoptimizedforawrongdistributionq?Howmanybitswouldwewaste?Properties:-D(p||q)0-D(p||q)D(q||p)-D(p||q)=0iffp=q

KL-divergenceisoftenusedtomeasurethedistancebetweentwodistributionsInterpretation:Fixp,D(p||q)andH(p,q)varyinthesamewayIfpisanempiricaldistribution,minimizeD(p||q)orH(p,q)isequivalenttomaximizinglikelihood

Relativeentropy2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202341CrossEntropy,KL-Div,andLikelihoodLikelihood:logLikelihood:CriterionforselectingagoodmodelPerplexity(p)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202342MutualInformationI(X;Y)Comparingtwodistributions:p(x,y)vsp(x)p(y)Properties:I(X;Y)0;I(X;Y)=I(Y;X);I(X;Y)=0iffX&YareindependentInterpretations:-MeasureshowmuchreductioninuncertaintyofXgiveninfo.aboutY-MeasurescorrelationbetweenXandY-Relatedtothe“channelcapacity”ininformationtheoryExamples:I(Topic;“computer”)vs.I(Topic;“the”)?I(“computer”,“program”)vs(“computer”,“baseball”)?2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202343WhatYouShouldKnowInformationtheoryconcepts:entropy,crossentropy,relativeentropy,conditionalentropy,KL-div.,mutualinformationKnowtheirdefinitions,howtocomputethemKnowhowtointerpretthemKnowtheirrelationships2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202344EssentialBackground3:

NaturalLanguageProcessing

2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202345WhatisNLP?…?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????…Howcanacomputermakesenseoutofsuchastring?-Whatarethebasicunitsofmeaning(words)?-Whatisthemeaningofeachword?-Howarewordsrelatedwitheachother?-Whatisthe“combinedmeaning”ofwords?-Whatisthe“meta-meaning”?(speechact)-Handlingalargechunkoftext-MakingsenseofeverythingSyntaxSemanticsPragmaticsMorphologyDiscourseInferenceLalistasactualizadasfigurancomoAneioI.ArabictextSpanishtext2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202346AnExampleofNLPAdogischasingaboyontheplaygroundDetNounAuxVerbDetNounPrepDetNounNounPhraseComplexVerbNounPhraseNounPhrasePrepPhraseVerbPhraseVerbPhraseSentenceDog(d1).Boy(b1).Playground(p1).Chasing(d1,b1,p1).SemanticanalysisLexicalanalysis(part-of-speechtagging)Syntacticanalysis(Parsing)Apersonsayingthismayberemindinganotherpersontogetthedogback…Pragmaticanalysis(speechact)Scared(x)ifChasing(_,x,_).+Scared(b1)Inference2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202347Ifwecandothisforallthesentences,then…BADNEWS:Unfortunately,wecan’t.GeneralNLP=“AI-Complete”2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202348NLPisDifficult!Naturallanguageisdesignedtomakehumancommunicationefficient.Asaresult,weomitalotof“commonsense”knowledge,whichweassumethehearer/readerpossesseswekeepalotofambiguities,whichweassumethehearer/readerknowshowtoresolveThismakesEVERYstepinNLPhardAmbiguityisa“killer”Commonsensereasoningispre-required2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202349ExamplesofChallengesWord-levelambiguity:E.g.,“design”canbeanounoraverb(AmbiguousPOS)“root”hasmultiplemeanings(Ambiguoussense)Syntacticambiguity:E.g.,“naturallanguageprocessing”(Modification)“Amansawaboywithatelescope.”(PPAttachment)Anaphoraresolution:“JohnpersuadedBilltobuyaTVforhimself.”(himself=JohnorBill?)Presupposition:“Hehasquitsmoking.”impliesthathesmokedbefore.2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202350Despiteallthechallenges,researchinNLPhasalsomadealotofprogress…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202351High-levelHistoryofNLPEarlyenthusiasm(1950’s):MachineTranslationTooambitiousBar-Hillelreport(1960)concludedthatfully-automatichigh-qualitytranslationcouldnotbeaccomplishedwithoutknowledge(Dictionary+Encyclopedia)Lessambitiousapplications(late1960’s&early1970’s):Limitedsuccess,failedtoscaleupSpeechrecognitionDialogue(Eliza)Inferenceanddomainknowledge(SHRDLU=“blockworld”)Realworldevaluation(late1970’s–now)Storyunderstanding(late1970’s&early1980’s)Largescaleevaluationofspeechrecognition,textretrieval,informationextraction(1980–now)Statisticalapproachesenjoymoresuccess(firstinspeechrecognition&retrieval,laterothers)Currenttrend:Boundarybetweenstatisticalandsymbolicapproachesisdisappearing.WeneedtousealltheavailableknowledgeApplication-drivenNLPresearch(bioinformatics,Web,Questionanswering…)Stat.languagemodelsRobustcomponenttechniquesApplicationsKnowledgerepresentationDeepunderstandinginlimiteddomainShallowunderstanding2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202352TheStateoftheArtAdogischasingaboyontheplaygroundDetNounAuxVerbDetNounPrepDetNounNounPhraseComplexVerbNounPhraseNounPhrasePrepPhraseVerbPhraseVerbPhraseSentenceSemantics:someaspects-Entity/relationextraction-Wordsensedisambiguation-AnaphoraresolutionPOSTagging:97%Parsing:partial>90%(?)Speechactanalysis:???Inference:???2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202353TechniqueShowcase:POSTaggingThissentenceservesasanexampleofDetNV1PDetNPannotatedtext…V2NTrainingdata(Annotatedtext)POSTagger“Thisisanewsentence”ThisisanewsentenceDetAuxDetAdjN

Thisisanewsentence

DetDetDetDetDet…… DetAuxDetAdjN …… V2V2V2V2V2Considerallpossibilities,andpicktheonewiththehighestprobabilityMethod1:IndependentassignmentMostcommontagMethod2:Partialdependency2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202354TechniqueShowcase:ParsingSNPVPNPDetBNPNPBNPNPNPPPBNPNVPVVPAuxVNPVPVPPPPPPNPVchasingAuxisNdogNboyNplaygroundDettheDetaPonGrammarLexiconGenerateSNPVPBNPNDetAdogVPPPAuxVistheplaygroundonaboychasingNPPNPSNPVPBNPNdogPPAuxVisonaboychasingNPPNPDetAtheplaygroundNProllerskates1.01.0……0.010.003……Probabilityofthistree=0.000015Chooseatreewithhighestprob….Canalsobetreatedasaclassification/decisionproblem…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202355SemanticAnalysisTechniquesOnlysuccessfulforVERYlimiteddomainorforSOMEaspectofsemanticsE.g.,Entityextraction(e.g.,recognizingaperson’sname):Userulesand/ormachinelearningWordsensedisambiguation:addressedasaclassificationproblemwithsupervisedlearningAnaphoraresolution…Ingeneral,exploitingmachinelearningandstatisticallanguagemodels…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202356WhatWeCan’tDo100%POStagging“Heturnedoffthehighway.”vs“Heturnedoffthefan.”Generalcompleteparsing“Amansawaboywithatelescope.”DeepsemanticanalysisWillweeverbeabletopreciselydefinethemeaningof“own”in“Johnownsarestaurant.”?Robust&generalNLPtendstobe“shallow”,while“deep”understandingdoesn’tscaleup…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202357MajorNLPApplicationsSpeechrecognition:e.g.,AutotelephonecallroutingTextmanagementTextretrieval/filteringTextclassificationTextsummarizationTextminingQueryansweringLanguagetutoringSpelling/grammarcorrectionMachinetranslationCross-languageretrievalRestrictednaturallanguageNaturallanguageuserinterfaceOurfocus2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202358

NLP&TextManagementBetterNLP=>BetterTextManagementBadNLP=>BadTextManagement?Robust,shallowNLPtendstobemoreusefulthandeep,butfragileNLP.ErrorsinNLPcanhurttextmanagementperformance…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202359HowMuchNLPisReallyNeeded?TasksDependencyonNLPClassification/RetrievalSummarizat

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論