版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
龍星計(jì)劃課程:信息檢索
CourseOverview&BackgroundChengXiangZhai(翟成祥)DepartmentofComputerScienceGraduateSchoolofLibrary&InformationScienceInstituteforGenomicBiology,StatisticsUniversityofIllinois,Urbana-Champaign2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20231OutlineCourseoverviewEssentialbackgroundProbability&statisticsBasicconceptsininformationtheoryNaturallanguageprocessing2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20232CourseOverview2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20233CourseObjectivesIntroducethefieldofinformationretrieval(IR)Foundation:Basicconcepts,principles,methods,etcTrends:FrontiertopicsPreparestudentstodoresearchinIRand/orrelatedfieldsResearchmethodology(generalandIR-specific)ResearchproposalwritingResearchproject(tobefinishedafterthelectureperiod)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20234PrerequisitesProficiencyinprogramming(C++isneededforassignments)Knowledgeofbasicprobability&statistics(wouldbenecessaryforunderstandingalgorithmsdeeply)Bigplus:knowledgeofrelatedareasMachinelearningNaturallanguageprocessingDatamining…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20235CourseManagementTeachingstaffInstructor:ChengXiangZhai(UIUC)Teachingassistants:HongfeiYan(PekingUniv)BoPeng(PekingUniv)Coursewebsite:/~course/cs410/Coursegroupdiscussion:Questions:Firstpostthequestionsonthegroupdiscussionforum;ifquestionsareunanswered,bringthemtotheofficehours(firstofficehour:June23,2:30-4:30pm)
2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20236Format&RequirementsLecture-based:Morninglectures:Foundation&TrendsAfternoonlectures:IRresearchmethodologyReadingsareusuallyavailableonline2Assignments(basedonmorninglectures)Coding(C++),experimentingwithdata,analyzingresults,openexplorations(~5hourseach)Finalexam(basedonmorninglectures):1:30-4:30pm,June30.Practicequestionswillbeavailable2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20237Format&Requirements(cont.)Courseproject(Mini-TREC)WorkinteamsPhaseI:createtestcollections(~3hours,donewithinlectureperiod)PhaseII:developalgorithmsandsubmitresults(doneinthesummer)Researchprojectproposal(basedonafternoonlectures)Workinteams2outlinedonewithinlectureperiodfullproposal(5pages)duelater2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20238CoverageofTopics:IRvs.TIMTextInformationManagement(TIM)InformationRetrieval(IR)Multimedia,etcIRandTIMwillbeusedinterchangeably2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,20239WhatisTextInfo.Management?TIMisconcernedwithtechnologiesformanagingandexploitingtextinformationeffectivelyandefficientlyImportanceofmanagingtextinformationThemostnaturalwayofencodingknowledgeThinkaboutscientificliteratureThemostcommontypeofinformationHowmuchtextualinformationdoyouproduceandconsumeeveryday?ThemostbasicformofinformationItcanbeusedtodescribeothermediaofinformationThemostusefulformofinformation!2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202310TextManagementApplicationsAccessMiningOrganizationSelectinformationCreateKnowledgeAdd
Structure/Annotations2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202311ExamplesofText
ManagementApplicationsSearchWebsearchengines(Google,Yahoo,…)Librarysystems…RecommendationNewsfilterLiterature/movierecommenderCategorizationAutomaticallysortingemails…Mining/ExtractionDiscoveringmajorcomplaintsfromemailincustomerserviceBusinessintelligenceBioinformatics…Manyothers…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202312ElementsofTextInfoManagementTechnologiesSearchText
FilteringCategorizationSummarization
ClusteringNaturalLanguageContentAnalysisExtractionMiningVisualizationRetrievalApplicationsMiningApplicationsInformationAccessKnowledgeAcquisitionInformationOrganizationFocusofthecourse2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202313TextManagementandOtherAreasTMAlgorithmsUserTextStorageCompressionProbabilisticinferenceMachinelearningNaturallanguageprocessingHuman-computerinteractionTMApplicationsSoftwareengineeringWebComputerscienceInformationScience2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202314RelatedAreasInformationRetrievalDatabasesLibrary&InfoScienceMachineLearningPatternRecognitionDataMiningNaturalLanguageProcessingApplicationsWeb,Bioinformatics…StatisticsOptimizationSoftwareengineeringComputersystemsModelsAlgorithmsApplicationsSystems2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202315Publications/Societies(Incomplete)ACMSIGIRVLDB,PODS,ICDEASISLearning/MiningNLPApplicationsStatisticsSoftware/systemsCOLING,EMNLP,ANLPHLTICML,NIPS,UAIRECOMB,PSBJCDLInfo.ScienceInfoRetrievalACMCIKMDatabasesACMSIGMODACLICMLAAAIACMSIGKDDISMBWWWSOSPOSDITREC2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202316Schedule:availableat/~course/cs410/2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202317DateMorningLecture(8:30-11:30)
(Foundation&Trends)AfternoonLecture(1:30-2:30)
(ResearchMethodology)Notes6/21SatCourseoverviewandbackground(probability,statistics,informationtheory,NLP)
Slides:ppt
LectureNotes:Prob&Stat,InfoTheory,NLP
Readings:
Bush45,Rosenfeld'snoteonestimation,Rosenfeld'snoteoninformationtheory,IntroductiontoIRresearch
Slides:ppt6/22SunInformationRetrievalOverview(part1)(basicconcepts,history,evaluation)
LectureNotes:textretrieval,
Readings:
Singhal'sreview(Error),Book-Ch8.TRECmeasures
PrepareyourselfforIRresearch
Mini-TRECtaskspecificationready6/23MonInformationRetrievalOverview(Part2)(basicretrievalmodels,systemimplementation,applications)FindagoodIRresearchtopic
Assign#1out6/24TueStatisticalLanguageModelsforIR(probabilisticretrievalmodels,KL-divergencemodel,specialretrievaltasks)FormulateIRresearchhypotheses
Assign#2out6/25WedModernRetrievalFrameworks(axiomatic,decision-theoretic)Finalexampracticequestionsavailable6/26ThuPersonalizedSearch&UserModeling(implicitfeedback,explicitfeedback,activefeedback)Test/RefineIRresearchhypotheses
Proposalteamdue6/27FriNaturalLanguageProcessingforIR(phraseindexing,dependencyanalysis,sensedisambiguation,sentimentretrieval)WriteandpublishanIRpaper
6/28SatNoclassMini-TRECPhaseITaskdue6/29SunTopicModelsforTextmining(PLSA,LDA,extensionsandapplications)Proposaloutlinedue6/30MonFutureofIR,coursesummary
FinalExam(1:30-4:30)Assigns#1,#2due7/5SatResearchproposaldue7/?Mini-TRECdatasetsready8/?Mini-TRECPhaseIITaskdue2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202318EssentialBackgroud1:
Probability&Statistics
2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202319Prob/Statistics&TextManagementProbability&statisticsprovideaprincipledwaytoquantifytheuncertaintiesassociatedwithnaturallanguageAllowustoanswerquestionslike:Giventhatweobserve“baseball”threetimesand“game”onceinanewsarticle,howlikelyisitabout“sports”?(textcategorization,informationretrieval)Giventhatauserisinterestedinsportsnews,howlikelywouldtheuseruse“baseball”inaquery? (informationretrieval)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202320BasicConceptsinProbabilityRandomexperiment:anexperimentwithuncertainoutcome(e.g.,tossingacoin,pickingawordfromtext)Samplespace:allpossibleoutcomes,e.g.,Tossing2faircoins,S={HH,HT,TH,TT}Event:ES,EhappensiffoutcomeisinE,e.g.,E={HH}(allheads)E={HH,TT}(sameface)Impossibleevent({}),certainevent(S) ProbabilityofEvent:1P(E)0,s.t.P(S)=1(outcomealwaysinS)P(AB)=P(A)+P(B)if(AB)=(e.g.,A=sameface,B=differentface)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202321BasicConceptsofProb.(cont.)
ConditionalProbability:P(B|A)=P(AB)/P(A)P(AB)=P(A)P(B|A)=P(B)P(A|B)So,P(A|B)=P(B|A)P(A)/P(B)(Bayes’Rule)Forindependentevents,P(AB)=P(A)P(B),soP(A|B)=P(A)Totalprobability:IfA1,…,AnformapartitionofS,thenP(B)=P(BS)=P(BA1)+…+P(BAn)(why?)So,P(Ai|B)=P(B|Ai)P(Ai)/P(B)=P(B|Ai)P(Ai)/[P(B|A1)P(A1)+…+P(B|An)P(An)]ThisallowsustocomputeP(Ai|B)basedonP(B|Ai)
2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202322InterpretationofBayes’RuleHypothesisspace:H={H1,
…,
Hn} Evidence:EIfwewanttopickthemostlikelyhypothesisH*,wecandropP(E)PosteriorprobabilityofHiPriorprobabilityofHiLikelihoodofdata/evidenceifHiistrue2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202323RandomVariableX:S
(“measure”ofoutcome)E.g.,numberofheads,allsameface?,…EventscanbedefinedaccordingtoXE(X=a)={si|X(si)=a}E(Xa)={si|X(si)a}So,probabilitiescanbedefinedonXP(X=a)=P(E(X=a))P(aX)=P(E(aX))Discretevs.continuousrandomvariable(thinkof“partitioningthesamplespace”)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202324AnExample:DocClassificationX1:
[sport1011]TopicthecomputergamebaseballX2:[sport1111]X3:[computer1100]X4:[computer1110]X5:[other0011]
……
For3topics,fourwords,n=?EventsEsport={xi|topic(xi)=“sport”}Ebaseball={xi|baseball(xi)=1}Ebaseball,computer={xi|baseball(xi)=1&computer(xi)=0}SampleSpaceS={x1,…,xn}ConditionalProbabilities:P(Esport|Ebaseball),P(Ebaseball|Esport),P(Esport|Ebaseball,
computer),... Aninferenceproblem:Supposeweobservethat“baseball”ismentioned,howlikelythetopicisabout“sport”?But,P(B=1|T=“sport”)=?,P(T=“sport”)=?P(T=“sport”|B=1)
P(B=1|T=“sport”)P(T=“sport”)ThinkingintermsofrandomvariablesTopic:T{“sport”,“computer”,“other”},“Baseball”:B{0,1},…P(T=“sport”|B=1),P(B=1|T=“sport”),...2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202325GettingtoStatistics...P(B=1|T=“sport”)=?(parameterestimation)Ifweseetheresultsofahugenumberofrandomexperiments,thenBut,whatifweonlyseeasmallsample(e.g.,2)?Isthisestimatestillreliable?Ingeneral,statisticshastodowithdrawingconclusionsonthewholepopulationbasedonobservationsofasample(data)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202326ParameterEstimationGeneralsetting:Givena(hypothesized&probabilistic)modelthatgovernstherandomexperimentThemodelgivesaprobabilityofanydatap(D|)thatdependsontheparameterNow,givenactualsampledataX={x1,…,xn},whatcanwesayaboutthevalueof?Intuitively,takeyourbestguessof--“best”means“bestexplaining/fittingthedata”Generallyanoptimizationproblem2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202327MaximumLikelihoodvs.BayesianMaximumlikelihoodestimation“Best”means“datalikelihoodreachesmaximum”Problem:smallsampleBayesianestimation“Best”meansbeingconsistentwithour“prior”knowledgeandexplainingdatawellProblem:howtodefineprior?2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202328IllustrationofBayesianEstimationPrior:p()Likelihood:p(X|)X=(x1,…,xN)Posterior:p(|X)p(X|)p():priormodeml:MLestimate:posteriormode2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202329MaximumLikelihoodEstimateData:adocumentdwithcountsc(w1),…,c(wN),andlength|d|Model:multinomialdistributionMwithparameters{p(wi)}Likelihood:p(d|M)Maximumlikelihoodestimator:M=argmaxMp(d|M)We’lltunep(wi)tomaximizel(d|M)UseLagrangemultiplierapproachSetpartialderivativestozeroMLestimate2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202330WhatYouShouldKnowProbabilityconcepts:samplespace,event,randomvariable,conditionalprob.multinomialdistribution,etcBayesformulaanditsinterpretationStatistics:Knowhowtocomputemaximumlikelihoodestimate2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202331EssentialBackground2:
BasicConceptsinInformationTheory
2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202332InformationTheoryDevelopedbyShannoninthe40sMaximizingtheamountofinformationthatcanbetransmittedoveranimperfectcommunicationchannelDatacompression(entropy)Transmissionrate(channelcapacity)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202333BasicConceptsinInformationTheoryEntropy:MeasuringuncertaintyofarandomvariableKullback-Leiblerdivergence:comparingtwodistributionsMutualInformation:measuringthecorrelationoftworandomvariables2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202334Entropy:MotivationFeatureselection:Ifweuseonlyafewwordstoclassifydocs,whatkindofwordsshouldweuse?P(Topic|“computer”=1)vsp(Topic|“the”=1):whichismorerandom?Textcompression:Somedocuments(lessrandom)canbecompressedmorethanothers(morerandom)Canwequantifythe“compressibility”?Ingeneral,givenarandomvariableXfollowingdistributionp(X),Howdowemeasurethe“randomness”ofX?HowdowedesignoptimalcodingforX?2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202335Entropy:DefinitionEntropyH(X)measurestheuncertainty/randomnessofrandomvariableXExample:P(Head)H(X)1.02023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202336Entropy:PropertiesMinimumvalueofH(X):0WhatkindofXhastheminimumentropy?MaximumvalueofH(X):logM,whereMisthenumberofpossiblevaluesforXWhatkindofXhasthemaximumentropy?Relatedtocoding2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202337InterpretationsofH(X)Measuresthe“amountofinformation”inXThinkofeachvalueofXasa“message”ThinkofXasarandomexperiment(20questions)MinimumaveragenumberofbitstocompressvaluesofXThemorerandomXis,thehardertocompressAfaircoinhasthemaximuminformation,andishardesttocompressAbiasedcoinhassomeinformation,andcanbecompressedto<1bitonaverageAcompletelybiasedcoinhasnoinformation,andneedsonly0bit2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202338ConditionalEntropyTheconditionalentropyofarandomvariableYgivenanotherX,expresseshowmuchextrainformationonestillneedstosupplyonaveragetocommunicateYgiventhattheotherpartyknowsXH(Topic|“computer”)vs.H(Topic|“the”)?2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202339CrossEntropyH(p,q)WhatifweencodeXwithacodeoptimizedforawrongdistributionq?Expected#ofbits=?Intuitively,H(p,q)H(p),andmathematically,2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202340Kullback-LeiblerDivergenceD(p||q)WhatifweencodeXwithacodeoptimizedforawrongdistributionq?Howmanybitswouldwewaste?Properties:-D(p||q)0-D(p||q)D(q||p)-D(p||q)=0iffp=q
KL-divergenceisoftenusedtomeasurethedistancebetweentwodistributionsInterpretation:Fixp,D(p||q)andH(p,q)varyinthesamewayIfpisanempiricaldistribution,minimizeD(p||q)orH(p,q)isequivalenttomaximizinglikelihood
Relativeentropy2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202341CrossEntropy,KL-Div,andLikelihoodLikelihood:logLikelihood:CriterionforselectingagoodmodelPerplexity(p)2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202342MutualInformationI(X;Y)Comparingtwodistributions:p(x,y)vsp(x)p(y)Properties:I(X;Y)0;I(X;Y)=I(Y;X);I(X;Y)=0iffX&YareindependentInterpretations:-MeasureshowmuchreductioninuncertaintyofXgiveninfo.aboutY-MeasurescorrelationbetweenXandY-Relatedtothe“channelcapacity”ininformationtheoryExamples:I(Topic;“computer”)vs.I(Topic;“the”)?I(“computer”,“program”)vs(“computer”,“baseball”)?2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202343WhatYouShouldKnowInformationtheoryconcepts:entropy,crossentropy,relativeentropy,conditionalentropy,KL-div.,mutualinformationKnowtheirdefinitions,howtocomputethemKnowhowtointerpretthemKnowtheirrelationships2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202344EssentialBackground3:
NaturalLanguageProcessing
2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202345WhatisNLP?…?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????…Howcanacomputermakesenseoutofsuchastring?-Whatarethebasicunitsofmeaning(words)?-Whatisthemeaningofeachword?-Howarewordsrelatedwitheachother?-Whatisthe“combinedmeaning”ofwords?-Whatisthe“meta-meaning”?(speechact)-Handlingalargechunkoftext-MakingsenseofeverythingSyntaxSemanticsPragmaticsMorphologyDiscourseInferenceLalistasactualizadasfigurancomoAneioI.ArabictextSpanishtext2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202346AnExampleofNLPAdogischasingaboyontheplaygroundDetNounAuxVerbDetNounPrepDetNounNounPhraseComplexVerbNounPhraseNounPhrasePrepPhraseVerbPhraseVerbPhraseSentenceDog(d1).Boy(b1).Playground(p1).Chasing(d1,b1,p1).SemanticanalysisLexicalanalysis(part-of-speechtagging)Syntacticanalysis(Parsing)Apersonsayingthismayberemindinganotherpersontogetthedogback…Pragmaticanalysis(speechact)Scared(x)ifChasing(_,x,_).+Scared(b1)Inference2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202347Ifwecandothisforallthesentences,then…BADNEWS:Unfortunately,wecan’t.GeneralNLP=“AI-Complete”2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202348NLPisDifficult!Naturallanguageisdesignedtomakehumancommunicationefficient.Asaresult,weomitalotof“commonsense”knowledge,whichweassumethehearer/readerpossesseswekeepalotofambiguities,whichweassumethehearer/readerknowshowtoresolveThismakesEVERYstepinNLPhardAmbiguityisa“killer”Commonsensereasoningispre-required2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202349ExamplesofChallengesWord-levelambiguity:E.g.,“design”canbeanounoraverb(AmbiguousPOS)“root”hasmultiplemeanings(Ambiguoussense)Syntacticambiguity:E.g.,“naturallanguageprocessing”(Modification)“Amansawaboywithatelescope.”(PPAttachment)Anaphoraresolution:“JohnpersuadedBilltobuyaTVforhimself.”(himself=JohnorBill?)Presupposition:“Hehasquitsmoking.”impliesthathesmokedbefore.2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202350Despiteallthechallenges,researchinNLPhasalsomadealotofprogress…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202351High-levelHistoryofNLPEarlyenthusiasm(1950’s):MachineTranslationTooambitiousBar-Hillelreport(1960)concludedthatfully-automatichigh-qualitytranslationcouldnotbeaccomplishedwithoutknowledge(Dictionary+Encyclopedia)Lessambitiousapplications(late1960’s&early1970’s):Limitedsuccess,failedtoscaleupSpeechrecognitionDialogue(Eliza)Inferenceanddomainknowledge(SHRDLU=“blockworld”)Realworldevaluation(late1970’s–now)Storyunderstanding(late1970’s&early1980’s)Largescaleevaluationofspeechrecognition,textretrieval,informationextraction(1980–now)Statisticalapproachesenjoymoresuccess(firstinspeechrecognition&retrieval,laterothers)Currenttrend:Boundarybetweenstatisticalandsymbolicapproachesisdisappearing.WeneedtousealltheavailableknowledgeApplication-drivenNLPresearch(bioinformatics,Web,Questionanswering…)Stat.languagemodelsRobustcomponenttechniquesApplicationsKnowledgerepresentationDeepunderstandinginlimiteddomainShallowunderstanding2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202352TheStateoftheArtAdogischasingaboyontheplaygroundDetNounAuxVerbDetNounPrepDetNounNounPhraseComplexVerbNounPhraseNounPhrasePrepPhraseVerbPhraseVerbPhraseSentenceSemantics:someaspects-Entity/relationextraction-Wordsensedisambiguation-AnaphoraresolutionPOSTagging:97%Parsing:partial>90%(?)Speechactanalysis:???Inference:???2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202353TechniqueShowcase:POSTaggingThissentenceservesasanexampleofDetNV1PDetNPannotatedtext…V2NTrainingdata(Annotatedtext)POSTagger“Thisisanewsentence”ThisisanewsentenceDetAuxDetAdjN
Thisisanewsentence
DetDetDetDetDet…… DetAuxDetAdjN …… V2V2V2V2V2Considerallpossibilities,andpicktheonewiththehighestprobabilityMethod1:IndependentassignmentMostcommontagMethod2:Partialdependency2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202354TechniqueShowcase:ParsingSNPVPNPDetBNPNPBNPNPNPPPBNPNVPVVPAuxVNPVPVPPPPPPNPVchasingAuxisNdogNboyNplaygroundDettheDetaPonGrammarLexiconGenerateSNPVPBNPNDetAdogVPPPAuxVistheplaygroundonaboychasingNPPNPSNPVPBNPNdogPPAuxVisonaboychasingNPPNPDetAtheplaygroundNProllerskates1.01.0……0.010.003……Probabilityofthistree=0.000015Chooseatreewithhighestprob….Canalsobetreatedasaclassification/decisionproblem…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202355SemanticAnalysisTechniquesOnlysuccessfulforVERYlimiteddomainorforSOMEaspectofsemanticsE.g.,Entityextraction(e.g.,recognizingaperson’sname):Userulesand/ormachinelearningWordsensedisambiguation:addressedasaclassificationproblemwithsupervisedlearningAnaphoraresolution…Ingeneral,exploitingmachinelearningandstatisticallanguagemodels…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202356WhatWeCan’tDo100%POStagging“Heturnedoffthehighway.”vs“Heturnedoffthefan.”Generalcompleteparsing“Amansawaboywithatelescope.”DeepsemanticanalysisWillweeverbeabletopreciselydefinethemeaningof“own”in“Johnownsarestaurant.”?Robust&generalNLPtendstobe“shallow”,while“deep”understandingdoesn’tscaleup…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202357MajorNLPApplicationsSpeechrecognition:e.g.,AutotelephonecallroutingTextmanagementTextretrieval/filteringTextclassificationTextsummarizationTextminingQueryansweringLanguagetutoringSpelling/grammarcorrectionMachinetranslationCross-languageretrievalRestrictednaturallanguageNaturallanguageuserinterfaceOurfocus2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202358
NLP&TextManagementBetterNLP=>BetterTextManagementBadNLP=>BadTextManagement?Robust,shallowNLPtendstobemoreusefulthandeep,butfragileNLP.ErrorsinNLPcanhurttextmanagementperformance…2023?ChengXiangZhaiDragonStarLectureatBeijingUniversity,June21-30,202359HowMuchNLPisReallyNeeded?TasksDependencyonNLPClassification/RetrievalSummarizat
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- APP運(yùn)營(yíng)專員招聘面試題及答案
- “夢(mèng)工場(chǎng)”招商銀行南通分行2026寒假實(shí)習(xí)生招聘?jìng)淇碱}庫(kù)附答案
- 中共贛州市贛縣區(qū)委政法委下屬事業(yè)單位面向全區(qū)選調(diào)工作人員參考題庫(kù)附答案
- 樂(lè)山市公安局2025年第四批次警務(wù)輔助人員招聘(40人)考試備考題庫(kù)必考題
- 北京市石景山區(qū)教育系統(tǒng)教育人才庫(kù)教師招聘?jìng)淇碱}庫(kù)附答案
- 山東高速集團(tuán)有限公司2025年下半年校園招聘(339人) 考試備考題庫(kù)附答案
- 廣安市關(guān)于2025年社會(huì)化選聘新興領(lǐng)域黨建工作專員的考試備考題庫(kù)必考題
- 永豐縣2025年退役士兵選調(diào)考試【25人】考試備考題庫(kù)必考題
- 浙江國(guó)企招聘-2025杭州臨平環(huán)境科技有限公司公開(kāi)招聘49人參考題庫(kù)附答案
- 滎經(jīng)縣財(cái)政局關(guān)于滎經(jīng)縣縣屬國(guó)有企業(yè)2025年公開(kāi)招聘工作人員的(14人)參考題庫(kù)附答案
- 云南師大附中2026屆高三高考適應(yīng)性月考卷(六)歷史試卷(含答案及解析)
- PCR技術(shù)在食品中的應(yīng)用
- 輸液滲漏處理課件
- 教育培訓(xùn)行業(yè)發(fā)展趨勢(shì)與機(jī)遇分析
- 物業(yè)與商戶裝修協(xié)議書(shū)
- 湖南鐵道職業(yè)技術(shù)學(xué)院2025年單招職業(yè)技能測(cè)試題
- GB/T 46318-2025塑料酚醛樹(shù)脂分類和試驗(yàn)方法
- 果農(nóng)水果出售合同范本
- 小學(xué)三年級(jí)數(shù)學(xué)選擇題專項(xiàng)測(cè)試100題帶答案
- 2025年尿液分析儀行業(yè)分析報(bào)告及未來(lái)發(fā)展趨勢(shì)預(yù)測(cè)
評(píng)論
0/150
提交評(píng)論