版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
第7章:分類和預(yù)測Whatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianClassificationClassificationbyNeuralNetworksClassificationbySupportVectorMachines(SVM)ClassificationbasedonconceptsfromassociationruleminingOtherClassificationMethodsPredictionClassificationaccuracySummary2023/1/201Classification:
predictscategoricalclasslabels(discreteornominal)classifiesdata(constructsamodel)basedonthetrainingsetandthevalues(classlabels)inaclassifyingattributeandusesitinclassifyingnewdataPrediction:modelscontinuous-valuedfunctions,i.e.,predictsunknownormissingvaluesTypicalApplicationscreditapprovaltargetmarketingmedicaldiagnosistreatmenteffectivenessanalysisClassificationvs.Prediction2023/1/202Classification—ATwo-StepProcess
Modelconstruction:describingasetofpredeterminedclassesEachtuple/sampleisassumedtobelongtoapredefinedclass,asdeterminedbytheclasslabelattributeThesetoftuplesusedformodelconstructionistrainingsetThemodelisrepresentedasclassificationrules,decisiontrees,ormathematicalformulaeModelusage:forclassifyingfutureorunknownobjectsEstimateaccuracyofthemodelTheknownlabeloftestsampleiscomparedwiththeclassifiedresultfromthemodelAccuracyrateisthepercentageoftestsetsamplesthatarecorrectlyclassifiedbythemodelTestsetisindependentoftrainingset,otherwiseover-fittingwilloccurIftheaccuracyisacceptable,usethemodeltoclassifydatatupleswhoseclasslabelsarenotknown2023/1/203ClassificationProcess(1):ModelConstructionTrainingDataClassificationAlgorithmsIFrank=‘professor’ORyears>6THENtenured=‘yes’Classifier(Model)2023/1/204ClassificationProcess(2):UsetheModelinPredictionClassifierTestingDataUnseenData(Jeff,Professor,4)Tenured?2023/1/205Supervisedvs.UnsupervisedLearningSupervisedlearning(classification)Supervision:Thetrainingdata(observations,measurements,etc.)areaccompaniedbylabelsindicatingtheclassoftheobservationsNewdataisclassifiedbasedonthetrainingsetUnsupervisedlearning
(clustering)TheclasslabelsoftrainingdataisunknownGivenasetofmeasurements,observations,etc.withtheaimofestablishingtheexistenceofclassesorclustersinthedata2023/1/206第7章:分類和預(yù)測Whatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianClassificationClassificationbyNeuralNetworksClassificationbySupportVectorMachines(SVM)ClassificationbasedonconceptsfromassociationruleminingOtherClassificationMethodsPredictionClassificationaccuracySummary2023/1/207IssuesRegardingClassificationandPrediction(1):DataPreparationDatacleaningPreprocessdatainordertoreducenoiseandhandlemissingvaluesRelevanceanalysis(featureselection)RemovetheirrelevantorredundantattributesDatatransformationGeneralizeand/ornormalizedata2023/1/208Issuesregardingclassificationandprediction(2):EvaluatingClassificationMethodsPredictiveaccuracySpeedandscalabilitytimetoconstructthemodeltimetousethemodelRobustnesshandlingnoiseandmissingvaluesScalabilityefficiencyindisk-residentdatabasesInterpretability:understandingandinsightprovidedbythemodelGoodnessofrulesdecisiontreesizecompactnessofclassificationrules2023/1/209第7章:分類和預(yù)測Whatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianClassificationClassificationbyNeuralNetworksClassificationbySupportVectorMachines(SVM)ClassificationbasedonconceptsfromassociationruleminingOtherClassificationMethodsPredictionClassificationaccuracySummary2023/1/2010TrainingDatasetThisfollowsanexamplefromQuinlan’sID32023/1/611Output:ADecisionTreefor““buys_computer”age?overcaststudent?creditrating?noyesfairexcellent<=30>40nonoyesyesyes30..402023/1/612AlgorithmforDecisionTreeInductionBasicalgorithm(agreedyalgorithm)Treeisconstructedinatop-downrecursivedivide-and-conquermannerAtstart,allthetrainingexamplesareattherootAttributesarecategorical(ifcontinuous-valued,theyarediscretizedinadvance)ExamplesarepartitionedrecursivelybasedonselectedattributesTestattributesareselectedonthebasisofaheuristicorstatisticalmeasure(e.g.,informationgain)ConditionsforstoppingpartitioningAllsamplesforagivennodebelongtothesameclassTherearenoremainingattributesforfurtherpartitioning–majorityvotingisemployedforclassifyingtheleafTherearenosamplesleft2023/1/613AttributeSelectionMeasure:InformationGain(ID3/C4.5)SelecttheattributewiththehighestinformationgainScontainssituplesofclassCifori={1,……,m}informationmeasuresinforequiredtoclassifyanyarbitrarytupleentropyofattributeAwithvalues{a1,a2,…,av}informationgainedbybranchingonattributeA2023/1/614AttributeSelectionbyInformationGainComputationClassP:buys_computer=““yes”ClassN:buys_computer=““no”I(p,n)=I(9,5)=0.940Computetheentropyforage:means““age<=30””has5outof14samples,with2yes’esand3no’’s.HenceSimilarly,2023/1/615OtherAttributeSelectionMeasuresGiniindex(CART,IBMIntelligentMiner)Allattributesareassumedcontinuous-valuedAssumethereexistseveralpossiblesplitvaluesforeachattributeMayneedothertools,suchasclustering,togetthepossiblesplitvaluesCanbemodifiedforcategoricalattributes2023/1/616GiniIndex(IBMIntelligentMiner)IfadatasetTcontainsexamplesfromnclasses,giniindex,gini(T)isdefinedaswherepjistherelativefrequencyofclassjinT.IfadatasetTissplitintotwosubsetsT1andT2withsizesN1andN2respectively,theginiindexofthesplitdatacontainsexamplesfromnclasses,theginiindexgini(T)isdefinedasTheattributeprovidesthesmallestginisplit(T)ischosentosplitthenode(needtoenumerateallpossiblesplittingpointsforeachattribute).2023/1/617ExtractingClassificationRulesfromTreesRepresenttheknowledgeintheformofIF-THENrulesOneruleiscreatedforeachpathfromtheroottoaleafEachattribute-valuepairalongapathformsaconjunctionTheleafnodeholdstheclasspredictionRulesareeasierforhumanstounderstandExampleIFage=“<=30”ANDstudent=“no”THENbuys_computer=“no”IFage=“<=30”ANDstudent=“yes”THENbuys_computer=“yes”IFage=“31…40”THENbuys_computer=“yes”IFage=“>40””ANDcredit_rating=“excellent”THENbuys_computer=“yes”IFage=“<=30”ANDcredit_rating=“fair”THENbuys_computer=“no”2023/1/618AvoidOverfittinginClassificationOverfitting:AninducedtreemayoverfitthetrainingdataToomanybranches,somemayreflectanomaliesduetonoiseoroutliersPooraccuracyforunseensamplesTwoapproachestoavoidoverfittingPrepruning:Halttreeconstructionearly——donotsplitanodeifthiswouldresultinthegoodnessmeasurefallingbelowathresholdDifficulttochooseanappropriatethresholdPostpruning:Removebranchesfroma““fullygrown””tree—getasequenceofprogressivelyprunedtreesUseasetofdatadifferentfromthetrainingdatatodecidewhichisthe“bestprunedtree”2023/1/619ApproachestoDeterminetheFinalTreeSizeSeparatetraining(2/3)andtesting(1/3)setsUsecrossvalidation,e.g.,10-foldcrossvalidationUseallthedatafortrainingbutapplyastatisticaltest(e.g.,chi-square)toestimatewhetherexpandingorpruninganodemayimprovetheentiredistributionUseminimumdescriptionlength(MDL)principlehaltinggrowthofthetreewhentheencodingisminimized2023/1/620EnhancementstobasicdecisiontreeinductionAllowforcontinuous-valuedattributesDynamicallydefinenewdiscrete-valuedattributesthatpartitionthecontinuousattributevalueintoadiscretesetofintervalsHandlemissingattributevaluesAssignthemostcommonvalueoftheattributeAssignprobabilitytoeachofthepossiblevaluesAttributeconstructionCreatenewattributesbasedonexistingonesthataresparselyrepresentedThisreducesfragmentation,repetition,andreplication2023/1/621ClassificationinLargeDatabasesClassification—aclassicalproblemextensivelystudiedbystatisticiansandmachinelearningresearchersScalability:ClassifyingdatasetswithmillionsofexamplesandhundredsofattributeswithreasonablespeedWhydecisiontreeinductionindatamining?relativelyfasterlearningspeed(thanotherclassificationmethods)convertibletosimpleandeasytounderstandclassificationrulescanuseSQLqueriesforaccessingdatabasescomparableclassificationaccuracywithothermethods2023/1/622ScalableDecisionTreeInductionMethodsinDataMiningStudiesSLIQ(EDBT’96——Mehtaetal.)buildsanindexforeachattributeandonlyclasslistandthecurrentattributelistresideinmemorySPRINT(VLDB’’96——J.Shaferetal.)constructsanattributelistdatastructurePUBLIC(VLDB’’98——Rastogi&Shim)integratestreesplittingandtreepruning:stopgrowingthetreeearlierRainForest(VLDB’’98——Gehrke,Ramakrishnan&Ganti)separatesthescalabilityaspectsfromthecriteriathatdeterminethequalityofthetreebuildsanAVC-list(attribute,value,classlabel)2023/1/623DataCube-BasedDecision-TreeInductionIntegrationofgeneralizationwithdecision-treeinduction(Kamberetal’’97).ClassificationatprimitiveconceptlevelsE.g.,precisetemperature,humidity,outlook,etc.Low-levelconcepts,scatteredclasses,bushyclassification-treesSemanticinterpretationproblems.Cube-basedmulti-levelclassificationRelevanceanalysisatmulti-levels.Information-gainanalysiswithdimension+level.2023/1/624PresentationofClassificationResults2023/1/625VisualizationofaDecisionTreeinSGI/MineSet3.02023/1/626InteractiveVisualMiningbyPerception-BasedClassification(PBC)2023/1/627第7章:分分類和預(yù)預(yù)測Whatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianClassificationClassificationbyNeuralNetworksClassificationbySupportVectorMachines(SVM)ClassificationbasedonconceptsfromassociationruleminingOtherClassificationMethodsPredictionClassificationaccuracySummary2023/1/628BayesianClassification:Why?Probabilisticlearning:Calculateexplicitprobabilitiesforhypothesis,amongthemostpracticalapproachestocertaintypesoflearningproblemsIncremental:Eachtrainingexamplecanincrementallyincrease/decreasetheprobabilitythatahypothesisiscorrect.Priorknowledgecanbecombinedwithobserveddata.Probabilisticprediction:Predictmultiplehypotheses,weightedbytheirprobabilitiesStandard:EvenwhenBayesianmethodsarecomputationallyintractable,theycanprovideastandardofoptimaldecisionmakingagainstwhichothermethodscanbemeasured2023/1/629BayesianTheorem:BasicsLetXbeadatasamplewhoseclasslabelisunknownLetHbeahypothesisthatXbelongstoclassCForclassificationproblems,determineP(H/X):theprobabilitythatthehypothesisholdsgiventheobserveddatasampleXP(H):priorprobabilityofhypothesisH(i.e.theinitialprobabilitybeforeweobserveanydata,reflectsthebackgroundknowledge)P(X):probabilitythatsampledataisobservedP(X|H):probabilityofobservingthesampleX,giventhatthehypothesisholds2023/1/630BayesianTheoremGiventrainingdataX,posterioriprobabilityofahypothesisH,P(H|X)followstheBayestheoremInformally,thiscanbewrittenasposterior=likelihoodxprior/evidenceMAP(maximumposteriori)hypothesisPracticaldifficulty:requireinitialknowledgeofmanyprobabilities,significantcomputationalcost2023/1/631Na??veBayesClassifierAsimplifiedassumption:attributesareconditionallyindependent:Theproductofoccurrenceofsay2elementsx1andx2,giventhecurrentclassisC,istheproductoftheprobabilitiesofeachelementtakenseparately,giventhesameclassP([y1,y2],C)=P(y1,C)*P(y2,C)NodependencerelationbetweenattributesGreatlyreducesthecomputationcost,onlycounttheclassdistribution.OncetheprobabilityP(X|Ci)isknown,assignXtotheclasswithmaximumP(X|Ci)*P(Ci)2023/1/632TrainingdatasetClass:C1:buys_computer=‘yes’C2:buys_computer=‘no’’DatasampleX=(age<=30,Income=medium,Student=yesCredit_rating=Fair)2023/1/633Na?veBayesianClassifier:ExampleComputeP(X/Ci)foreachclassP(age=““<30”|buys_computer=“yes”)=2/9=0.222P(age=““<30”|buys_computer=“no”)=3/5=0.6P(income=““medium”|buys_computer=“yes”)=4/9=0.444P(income=““medium”|buys_computer=“no”)=2/5=0.4P(student=“yes”|buys_computer=“yes)=6/9=0.667P(student=“yes”|buys_computer=“no”)=1/5=0.2P(credit_rating=“fair””|buys_computer=““yes”)=6/9=0.667P(credit_rating=“fair””|buys_computer=““no””)=2/5=0.4X=(age<=30,income=medium,student=yes,credit_rating=fair)P(X|Ci):P(X|buys_computer=““yes”)=0.222x0.444x0.667x0.0.667=0.044P(X|buys_computer=““no””)=0.6x0.4x0.2x0.4=0.019P(X|Ci)*P(Ci):P(X|buys_computer=““yes”)*P(buys_computer=“yes”)=0.028P(X|buys_computer=““yes”)*P(buys_computer=“yes”)=0.007Xbelongstoclass“buys_computer=yes”2023/1/634Na?veBayesianClassifier:CommentsAdvantages:EasytoimplementGoodresultsobtainedinmostofthecasesDisadvantagesAssumption:classconditionalindependence,thereforelossofaccuracyPractically,dependenciesexistamongvariablesE.g.,hospitals:patients:Profile:age,familyhistoryetcSymptoms:fever,coughetc.,Disease:lungcancer,diabetesetcDependenciesamongthesecannotbemodeledbyNa??veBayesianClassifierHowtodealwiththesedependencies?BayesianBeliefNetworks2023/1/635BayesianNetworksBayesianbeliefnetworkallowsasubsetofthevariablesconditionallyindependentAgraphicalmodelofcausalrelationshipsRepresentsdependencyamongthevariablesGivesaspecificationofjointprobabilitydistributionXYZPNodes:randomvariablesLinks:dependencyX,YaretheparentsofZ,andYistheparentofPNodependencybetweenZandPHasnoloopsorcycles2023/1/636BayesianBeliefNetwork:AnExampleFamilyHistoryLungCancerPositiveXRaySmokerEmphysemaDyspneaLC~LC(FH,S)(FH,~S)(~FH,S)(~FH,~S)0.10.9BayesianBeliefNetworksTheconditionalprobabilitytableforthevariableLungCancer:Showstheconditionalprobabilityforeachpossiblecombinationofitsparents2023/1/637LearningBayesianNetworksSeveralcasesGivenboththenetworkstructureandallvariablesobservable:learnonlytheCPTsNetworkstructureknown,somehiddenvariables:methodofgradientdescent,analogoustoneuralnetworklearningNetworkstructureunknown,allvariablesobservable:searchthroughthemodelspacetoreconstructgraphtopologyUnknownstructure,allhiddenvariables:nogoodalgorithmsknownforthispurposeD.Heckerman,Bayesiannetworksfordatamining2023/1/638第7章章:分分類類和預(yù)預(yù)測Whatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianClassificationClassificationbyNeuralNetworksClassificationbySupportVectorMachines(SVM)ClassificationbasedonconceptsfromassociationruleminingOtherClassificationMethodsPredictionClassificationaccuracySummary2023/1/639Classification:predictscategoricalclasslabelsTypicalApplications{credithistory,salary}->creditapproval(Yes/No){Temp,Humidity}-->Rain(Yes/No)ClassificationMathematically2023/1/640LinearClassificationBinaryClassificationproblemThedataabovetheredlinebelongstoclass‘‘x’Thedatabelowredlinebelongstoclass‘o’Examples––SVM,Perceptron,ProbabilisticClassifiersxxxxxxxxxxooooooooooooo2023/1/641DiscriminativeClassifiersAdvantagespredictionaccuracyisgenerallyhigh(ascomparedtoBayesianmethods––ingeneral)robust,workswhentrainingexamplescontainerrorsfastevaluationofthelearnedtargetfunction(Bayesiannetworksarenormallyslow)Criticismlongtrainingtimedifficulttounderstandthelearnedfunction(weights)(Bayesiannetworkscanbeusedeasilyforpatterndiscovery)noteasytoincorporatedomainknowledge(easyintheformofpriorsonthedataordistributions)2023/1/642NeuralNetworksAnalogytoBiologicalSystems(Indeedagreatexampleofagoodlearningsystem)MassiveParallelismallowingforcomputationalefficiencyThefirstlearningalgorithmcamein1959(Rosenblatt)whosuggestedthatifatargetoutputvalueisprovidedforasingleneuronwithfixedinputs,onecanincrementallychangeweightstolearntoproducetheseoutputsusingtheperceptronlearningrule2023/1/643ANeuronThen-dimensionalinputvectorxismappedintovariableybymeansofthescalarproductandanonlinearfunctionmappingmk-fweightedsumInputvectorxoutputyActivationfunctionweightvectorw?w0w1wnx0x1xn2023/1/644ANeuronmk-fweightedsumInputvectorxoutputyActivationfunctionweightvectorw?w0w1wnx0x1xn2023/1/645Multi-LayerPerceptronOutputnodesInputnodesHiddennodesOutputvectorInputvector:xiwijNetworkTrainingTheultimateobjectiveoftrainingobtainasetofweightsthatmakesalmostallthetuplesinthetrainingdataclassifiedcorrectlyStepsInitializeweightswithrandomvaluesFeedtheinputtuplesintothenetworkonebyoneForeachunitComputethenetinputtotheunitasalinearcombinationofalltheinputstotheunitComputetheoutputvalueusingtheactivationfunctionComputetheerrorUpdatetheweightsandthebiasNetworkPruningandRuleExtractionNetworkpruningFullyconnectednetworkwillbehardtoarticulateNinputnodes,hhiddennodesandmoutputnodesleadtoh(m+N)weightsPruning:RemovesomeofthelinkswithoutaffectingclassificationaccuracyofthenetworkExtractingrulesfromatrainednetworkDiscretizeactivationvalues;replaceindividualactivationvaluebytheclusteraveragemaintainingthenetworkaccuracyEnumeratetheoutputfromthediscretizedactivationvaluestofindrulesbetweenactivationvalueandoutputFindtherelationshipbetweentheinputandactivationvalueCombinetheabovetwotohaverulesrelatingtheoutputtoinputChapter7.ClassificationandPredictionWhatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianClassificationClassificationbyNeuralNetworksClassificationbySupportVectorMachines(SVM)ClassificationbasedonconceptsfromassociationruleminingOtherClassificationMethodsPredictionClassificationaccuracySummary2023/1/649SVM––SupportVectorMachinesSupportVectorsSmallMarginLargeMarginSVM––Cont.LinearSupportVectorMachineGivenasetofpointswithlabelTheSVMfindsahyperplanedefinedbythepair(w,b)(wherewisthenormaltotheplaneandbisthedistancefromtheorigin)s.t.x––featurevector,b-bias,y-classlabel,||w||-margin2023/1/651SVM––Cont.Whatifthedataisnotlinearlyseparable?ProjectthedatatohighdimensionalspacewhereitislinearlyseparableandthenwecanuselinearSVM––(UsingKernels)-10+1++-(1,0)(0,0)(0,1)++-2023/1/652Non-LinearSVMClassificationusingSVM(w,b)InnonlinearcasewecanseethisasKernel––Canbethoughtofasdoingdotproductinsomehighdimensionalspace2023/1/653ExampleofNon-linearSVM2023/1/654Results2023/1/655SVMvs.NeuralNetworkSVMRelativelynewconceptNiceGeneralizationpropertiesHardtolearn–learnedinbatchmodeusingquadraticprogrammingtechniquesUsingkernelscanlearnverycomplexfunctionsNeuralNetworkQuietOldGeneralizeswellbutdoesn’thavestrongmathematicalfoundationCaneasilybelearnedinincrementalfashionTolearncomplexfunctions–usemultilayerperceptron(notthattrivial)2023/1/656SVMRelatedLinks2023/1/657第7章:分分類和預(yù)測Whatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianClassificationClassificationbyNeuralNetworksClassificationbySupportVectorMachines(SVM)ClassificationbasedonconceptsfromassociationruleminingOtherClassificationMethodsPredictionClassificationaccuracySummary2023/1/658Association-BasedClassificationSeveralmethodsforassociation-basedclassificationARCS:Quantitativeassociationminingandclusteringofassociationrules(Lentetal’’97)ItbeatsC4.5in(mainly)scalabilityandalsoaccuracyAssociativeclassification:(Liuetal’98)Itmineshighsupportandhighconfidencerulesintheformof“cond_set=>y”,whereyisaclasslabelCAEP(Classificationbyaggregatingemergingpatterns)(Dongetal’99)Emergingpatterns(EPs):theitemsetswhosesupportincreasessignificantlyfromoneclasstoanotherMineEpsbasedonminimumsupportandgrowthrate2023/1/659第7章章:分分類類和預(yù)預(yù)測Whatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianClassificationClassificationbyNeuralNetworksClassificationbySupportVectorMachines(SVM)ClassificationbasedonconceptsfromassociationruleminingOtherClassificationMethodsPredictionClassificationaccuracySummary2023/1/660OtherClassificationMethodsk-nearestneighborclassifiercase-basedreasoningGeneticalgorithmRoughsetapproachFuzzysetapproaches2023/1/661Instance-BasedMethodsInstance-basedlearning:Storetrainingexamplesanddelaytheprocessing(“l(fā)azyevaluation”)untilanewinstancemustbeclassifiedTypicalapproachesk-nearestneighborapproachInstancesrepresentedaspointsinaEuclideanspace.LocallyweightedregressionConstructslocalapproximationCase-basedreasoningUsessymbolicrepresentationsandknowledge-basedinference2023/1/662Thek-NearestNeighborAlgorithmAllinstancescorrespondtopointsinthen-Dspace.ThenearestneighboraredefinedintermsofEuclideandistance.Thetargetfunctioncouldbediscrete-orreal-valued.Fordiscrete-valued,thek-NNreturnsthemostcommonvalueamongthektrainingexamplesnearesttoxq.Vonoroidiagram:thedecisionsurfaceinducedby1-NNforatypicalsetoftrainingexamples.._+_xq+__+__+.....2023/1/663Discussiononthek-NNAlgorithmThek-NNalgorithmforcontinuous-valuedtargetfunctionsCalculatethemeanvaluesoftheknearestneighborsDistance-weightednearestneighboralgorithmWeightthecontributionofeachofthekneighborsaccordingtotheirdistancetothequerypointxqgivinggreaterweighttocloserneighborsSimilarly,forreal-valuedtargetfunctionsRobusttonoisydatabyaveragingk-nearestneighborsCurseofdimensionality:distancebetweenneighborscouldbedominatedbyirrelevantattributes.Toovercomeit,axesstretchoreliminationoftheleastrelevantattributes.2023/1/664Case-BasedReasoningAlsouses:lazyevaluation+analyzesimilarinstancesDifference:Instancesarenot““pointsinaEuclideanspace””Example:WaterfaucetprobleminCADET(Sycaraetal’’92)MethodologyInstancesrepresentedbyrichsymbolicdescriptions(e.g.,functiongraphs)MultipleretrievedcasesmaybecombinedTightcouplingbetweencaseretrieval,knowledge-basedreasoning,andproblemsolvingResearchissuesIndexingbasedonsyntacticsimilaritymeasure,andwhenfailure,backtracking,andadaptingtoadditionalcases2023/1/665RemarksonLazyvs.EagerLearningInstance-basedlearning:lazyevaluationDecision-treeandBayesianclassification:eagerevaluationKeydifferencesLazymethodmayconsiderqueryinstancexqwhendecidinghowtogeneralizebeyondthetrainingdataDEagermethodcannotsincetheyhavealreadychosenglobalapproximationwhenseeingthequeryEfficiency:Lazy-lesstimetrainingbutmoretimepredictingAccuracyLazymethodeffectivelyusesaricherhypothesisspacesinceitusesmanylocallinearfunctionstoformitsimplicitglobalapproximationtothetargetfunctionEager:mustcommittoasinglehypothesisthatcoverstheentireinstancespace2023/1/666GeneticAlgorithmsGA:basedonananalogytobiologicalevolutionEachruleisrepresentedbyastringofbitsAninitialpopulationiscreatedconsistingofrandomlygeneratedrulese.g.,IFA1andNotA2thenC2canbeencodedas100Basedonthenotionofsurvivalofthefittest,anewpopulationisformedtoconsistsofthefittestrulesandtheiroffspringsThefitnessofaruleisrepresentedbyitsclassificationaccuracyonasetoftrainingexamplesOffspringsaregeneratedbycrossoverandmutation2023/1/667RoughSetApproachRoughsetsareusedtoapproximatelyor““roughly”defineequivalentclassesAroughsetforagivenclassCisapproximatedbytwosets:alowerapproximation(certaintobeinC)andanupperapproximation(cannotbedescribedasnotbelongingtoC)Findingtheminimalsubsets(reducts)ofattributes(forfeaturereduction)isNP-hardbutadiscernibilitymatrixisusedtoreducethecomputationintensity2023/1/668FuzzySetApproachesFuzzylogicusestruthvaluesbetween0.0and1.0torepresentthedegreeofmembership(suchasusingfuzzymembershipgraph)Attributevaluesareconvertedtofuzzyvaluese.g.,incomeismappedintothediscretecategories{low,medium,high}withfuzzyvaluescalculatedForagivennewsample,morethanonefuzzyvaluemayapplyEachapplicablerulecontributesavoteformembershipinthecategoriesTypically,thetruthvaluesforeachpredictedcategoryaresummed2023/1/669第7章:分分類和預(yù)預(yù)測Whatisclassification?Whatisprediction?IssuesregardingclassificationandpredictionClassificationbydecisiontreeinductionBayesianClassificationClassificationbyNeuralNetworksClassificationbySupportVectorMachines(SVM)ClassificationbasedonconceptsfromassociationruleminingOtherClassificationMethodsPredictionClassificationaccuracySummary2023/1/670WhatIsPrediction?PredictionissimilartoclassificationFirst,constructamodelSecond,usemodeltopredictunknownvalueMajormethodforpredictionisregressionLinearandmultipleregressionNon-linearregressionPredictionisdifferentfromclassificationClassificationreferstopredictcategoricalclasslabelPredictionmodelscontinuous-valuedfunctions2023/1/671Predictivemodeling:Predictdatavaluesorconstructgeneralizedlinearmodelsbasedonthedatabasedata.OnecanonlypredictvaluerangesorcategorydistributionsMethodoutline:MinimalgeneralizationAttributerelevanceanalysisGeneralizedlinearmodelconstructionPredictionDeterminethemajorfactorswhichinfluencethepredictionDatarelevanceanalysis:uncertaintymeasurement,entropyanalysis,expertjudgement,etc.Multi-levelprediction:drill-downandroll-upanalysisPredictiveModelinginDatabases2023/1/672Linearregression:Y=+XTwoparameters,andspecifythelineandaretobeestimatedbyusingthedataathand.usingtheleastsquarescriteriontotheknownvaluesofY1,Y2,…,X1,X2,….Multipleregression:Y=b0+b1X1+b2X2.Manynonlinearfunctionscanbetransformedintotheabove.Log-linearmodels:Themulti-waytableofjointprobabilitiesisapproximatedbyaproductoflower-ordertables.Probability:p(a,b,c,d)=abacadbcdRegressAnalysisandLog-LinearModelsinPrediction2023/1/673LocallyWeightedRegressionConstructanexplicitapproximationtofoveralocalregionsurroundingqueryinstancexq.Locallyweightedlinearregression:Thetargetfunctionfisapproximatednearxqusingthelinearfunction:minimizethesquarederror:distance-decreasingweightKthegradientdescenttrainingrule:Inmostcases,thetargetfunctionisapproximatedbyaconstan
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 建筑防火門設(shè)置方案
- 施工現(xiàn)場照明設(shè)施設(shè)置方案
- 道路地下管線探測方案
- 消防疏散指示標(biāo)志設(shè)置方案
- 輸水管道漏水檢測技術(shù)方案
- 裝飾裝修工程驗收方案
- 2026年及未來5年市場數(shù)據(jù)中國園林規(guī)劃行業(yè)市場全景分析及投資策略研究報告
- 溝通與協(xié)調(diào)能力培訓(xùn)
- 建筑工程施工合同管理方案
- 《物理電磁學(xué)基本原理:高中物理教學(xué)方案》
- 優(yōu)衣庫服裝設(shè)計風(fēng)格
- (正式版)YST 1693-2024 銅冶煉企業(yè)節(jié)能診斷技術(shù)規(guī)范
- 1999年勞動合同范本【不同附錄版】
- 2024年重慶中考物理模擬考試試題
- 全國優(yōu)質(zhì)課一等獎職業(yè)學(xué)校教師信息化大賽《語文》(基礎(chǔ)模塊)《我愿意是急流》說課課件
- 初三寒假家長會ppt課件全面版
- 2023年中國造紙化學(xué)品發(fā)展現(xiàn)狀與趨勢
- 《干部履歷表》1999版電子版
- 視頻旋轉(zhuǎn)滴方法在界面擴張流變性質(zhì)研究中的應(yīng)用
- 傳統(tǒng)醫(yī)學(xué)師承關(guān)系合同書
- 血流動力學(xué)不穩(wěn)定骨盆骨折急診處理
評論
0/150
提交評論