版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、第八(d b)講 分類(lèi)與預(yù)測(cè)1廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)Classification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian classificationRule-based classification(自學(xué)(zxu))Classification by back propagationSupport Vecto
2、r Machines (SVM) (自學(xué)(zxu))Associative classification (自學(xué))Lazy learners (or learning from your neighbors) (自學(xué))Other classification methods (自學(xué))PredictionAccuracy and error measures (自學(xué))Ensemble methods (自學(xué))Model selection (自學(xué))Summary2廈門(mén)大學(xué)軟件學(xué)院共九十五頁(yè)分類(lèi)(Classification): 預(yù)測(cè)(yc)分類(lèi)標(biāo)號(hào)(離散值或名詞性詞)建立一個(gè)模型,基于訓(xùn)練集的數(shù)
3、據(jù)和分類(lèi)屬性的值(類(lèi)標(biāo)識(shí))來(lái)分類(lèi),并在新數(shù)據(jù)中使用該模型。預(yù)測(cè)(Prediction): 連續(xù)值函數(shù)上的建模, 例如,預(yù)測(cè)未知或缺失的值典型應(yīng)用信用度目標(biāo)市場(chǎng)醫(yī)療診斷分類(lèi)(fn li) vs. 預(yù)測(cè)3廈門(mén)大學(xué)軟件學(xué)院共九十五頁(yè)分類(lèi)(fn li)的兩個(gè)步驟模型創(chuàng)建: 描述預(yù)定的數(shù)據(jù)類(lèi)集或概念集。假定每個(gè)元組或樣本屬于一個(gè)(y )預(yù)定義的類(lèi),由一個(gè)(y )稱(chēng)為類(lèi)標(biāo)號(hào)屬性(class label attribute)的屬性確定。用于建模的元組集稱(chēng)為訓(xùn)練集(training set)模型表示為:分類(lèi)規(guī)則、判定樹(shù)或?qū)傩怨侥P蛻?yīng)用:用于分類(lèi)未來(lái)或未知對(duì)象評(píng)估模型的準(zhǔn)確率測(cè)試樣本的已知標(biāo)號(hào)與根據(jù)模型獲得的
4、分類(lèi)結(jié)果作比較。準(zhǔn)確率定義為正確被模型分類(lèi)的測(cè)試樣本的百分比測(cè)試集獨(dú)立于訓(xùn)練集,否則,學(xué)習(xí)模型傾向于過(guò)分適合(over-fitting)數(shù)據(jù)如果準(zhǔn)確率可被接受,使用模型用于分類(lèi)類(lèi)標(biāo)號(hào)未知的數(shù)據(jù)。4廈門(mén)大學(xué)軟件學(xué)院共九十五頁(yè)Classification Process (1): Model ConstructionTrainingDataClassificationAlgorithmsIF rank = professorOR years 6THEN tenured = yes Classifier(Model)5廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)Classification P
5、rocess (2): Use the Model in PredictionClassifierTestingDataUnseen Data(Jeff, Professor, 4)Tenured?6廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)決策樹(shù)算法(sun f)基本算法(貪心算法),樹(shù)的構(gòu)建方式:自頂向下遞歸的各個(gè)擊破方式樹(shù)以代表(dibio)訓(xùn)練樣本的單個(gè)節(jié)點(diǎn)開(kāi)始如果樣本都在同一類(lèi)中,則節(jié)點(diǎn)為葉子節(jié)點(diǎn),并用該類(lèi)標(biāo)記否則,選擇能夠最好地將樣本分類(lèi)的屬性(稱(chēng)為測(cè)試屬性,必須是離散值的)對(duì)測(cè)試屬性的每個(gè)已知值,創(chuàng)建一個(gè)分支,并據(jù)此劃分樣本遞歸形成每個(gè)劃分上的樣本決策樹(shù)7廈門(mén)大學(xué)軟件學(xué)院共九十
6、五頁(yè)遞歸劃分步驟僅當(dāng)下列(xili)條件之一成立時(shí)立即停止給定節(jié)點(diǎn)的所有樣本屬于同一個(gè)類(lèi)沒(méi)有剩余屬性可作進(jìn)一步劃分樣本 majority voting is employed for classifying the leaf沒(méi)有剩余的樣本age?student?credit rating?40noyesyesyes31.40nofairexcellentyesno8廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)Attribute Selection Measure: Information Gain (ID3/C4.5)選擇(xunz)具有最高信息增益( information gain )的屬
7、性。pi 是D中任意元組屬于類(lèi)Ci的概率,用估計(jì)|Ci, D|/|D|Expected information (熵:entropy)對(duì)D中元組分類(lèi)所需的期望信息 :Information needed (after using A to split D into v partitions) to classify D:Information gained by branching on attribute A9廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)Attribute Selection: Information GainClass P: buys_computer = “yes”Cla
8、ss N: buys_computer = “no” means “age split-point11廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)Gain Ratio for Attribute Selection (C4.5)信息增益度量偏向具有(jyu)許多輸出的測(cè)試C4.5 (a successor of ID3) uses gain ratio to overcome the problem (normalization to information gain)GainRatio(A) = Gain(A)/SplitInfo(A)Ex.gain_ratio(income) = 0
9、.029/0.926 = 0.031The attribute with the maximum gain ratio is selected as the splitting attribute12廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)Gini 指標(biāo)(zhbio) (CART, IBM IntelligentMiner)If a data set D contains examples from n classes, gini index, gini(D) is defined as where pj is the relative frequency of class j in
10、 DIf a data set D is split on A into two subsets D1 and D2, the gini index gini(D) is defined as不純度(chnd)降低:The attribute provides the smallest ginisplit(D) (or the largest reduction in impurity) is chosen to split the node (need to enumerate all the possible splitting points for each attribute)13廈門(mén)
11、大學(xué)軟件學(xué)院共九十五頁(yè)Gini index (CART, IBM IntelligentMiner)Ex. D has 9 tuples in buys_computer = “yes” and 5 in “no”Suppose the attribute income partitions D into 10 in D1: low, medium and 4 in D2but ginimedium,high is 0.30 and thus the best since it is the lowestAll attributes are assumed continuous-valuedM
12、ay need other tools, e.g., clustering, to get the possible split valuesCan be modified for categorical attributes14廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)Comparing Attribute Selection MeasuresThe three measures, in general, return good results butInformation gain: biased towards multivalued attributesGain ratio: ten
13、ds to prefer unbalanced splits in which one partition is much smaller than the othersGini index: biased to multivalued attributeshas difficulty when # of classes is largetends to favor tests that result in equal-sized partitions and purity in both partitions15廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)Overfitting and
14、 Tree PruningOverfitting:過(guò)份擬合(n h) An induced tree may overfit the training data Too many branches, some may reflect anomalies due to noise or outliersPoor accuracy for unseen samplesTwo approaches to avoid overfitting Prepruning: 提前停止樹(shù)的構(gòu)造,如果劃分一個(gè)節(jié)點(diǎn)的元組導(dǎo)致低于預(yù)定義閾值的分裂,則結(jié)束進(jìn)一步的劃分Difficult to choose an appr
15、opriate thresholdPostpruning:由“完全生長(zhǎng)”的樹(shù)減去子樹(shù)(用樹(shù)葉替代對(duì)應(yīng)的子樹(shù))Use a set of data different from the training data to decide which is the “best pruned tree”16廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)決策樹(shù)歸納(gun)基本算法的加強(qiáng)允許連續(xù)值屬性通過(guò)將連續(xù)屬性值劃分到離散值空間,以動(dòng)態(tài)(dngti)定義新的離散值屬性處理缺失屬性值缺失值賦值以屬性的最常見(jiàn)值缺失值賦值以屬性的最或然值屬性構(gòu)建在給定的屬性上創(chuàng)建新的屬性,改變給定屬性的首先表示,是數(shù)據(jù)變
16、換的一種形式解決決策樹(shù)歸納可能面臨的 碎片、復(fù)制和重復(fù)問(wèn)題。17廈門(mén)大學(xué)軟件學(xué)院共九十五頁(yè)大型(dxng)數(shù)據(jù)庫(kù)中的分類(lèi)分類(lèi)問(wèn)題是統(tǒng)計(jì)學(xué)和機(jī)器學(xué)習(xí)研究者研究的經(jīng)典難題的一個(gè)擴(kuò)展。可伸縮性:在具有數(shù)以百萬(wàn)計(jì)的樣本和成千上百個(gè)屬性(shxng)的數(shù)據(jù)集上,以可被接受的速度作分類(lèi)為什么在數(shù)據(jù)挖掘中使用決策樹(shù)歸納?學(xué)習(xí)速度相當(dāng)快(與其他分類(lèi)方法比較)分類(lèi)規(guī)則簡(jiǎn)單易懂對(duì)數(shù)據(jù)庫(kù)的訪問(wèn)可使用SQL查詢可比較分類(lèi)的準(zhǔn)確度18廈門(mén)大學(xué)軟件學(xué)院共九十五頁(yè)可伸縮(shn su)的決策樹(shù)歸納方法SLIQ (EDBT96 Mehta et al.)builds an index for each attribute a
17、nd only class list and the current attribute list reside in memorySPRINT (VLDB96 J. Shafer et al.)constructs an attribute list data structure PUBLIC (VLDB98 Rastogi & Shim)integrates tree splitting and tree pruning: stop growing the tree earlierRainForest (VLDB98 Gehrke, Ramakrishnan & Ganti)separat
18、es the scalability aspects from the criteria that determine the quality of the treebuilds an AVC-list (attribute, value, class label)19廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)Classification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by deci
19、sion tree inductionBayesian classificationRule-based classificationClassification by back propagationSupport Vector Machines (SVM) Associative classification Lazy learners (or learning from your neighbors)Other classification methodsPredictionAccuracy and error measuresEnsemble methodsModel selectio
20、nSummary20廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)貝葉斯分類(lèi)(fn li)貝葉斯分類(lèi)是統(tǒng)計(jì)學(xué)分類(lèi)方法貝葉斯分類(lèi)基于(jy)貝葉斯定理樸素貝葉斯分類(lèi)假定一個(gè)屬性值對(duì)給定類(lèi)的影響?yīng)毩⒂谄渌麑傩缘闹?類(lèi)條件獨(dú)立)。貝葉斯信念網(wǎng)絡(luò)能表示屬性子集間的依賴(lài),也可用于分類(lèi)。21廈門(mén)大學(xué)軟件學(xué)院共九十五頁(yè)貝葉斯定理(dngl): BasicsX 是類(lèi)標(biāo)識(shí)未知的數(shù)據(jù)樣本(或數(shù)據(jù)元組)如:35歲收入$4000的顧客H 是數(shù)據(jù)樣本X屬于某特定類(lèi)C的某種假定。如:假設(shè)(jish)顧客將購(gòu)買(mǎi)計(jì)算機(jī)P(H/X):條件X下H的后驗(yàn)概率如:知道顧客年齡與收入時(shí),顧客X將購(gòu)買(mǎi)計(jì)算機(jī)的概率P(H):H的先驗(yàn)概
21、率,即在我們觀察任何樣本前的初始概率,它反應(yīng)了背景知識(shí)。如:任意給定的顧客將購(gòu)買(mǎi)計(jì)算機(jī)的概率。P(X):被觀察的樣本數(shù)據(jù)的概率如:顧客中年齡35歲收入$4000的概率P(X|H) :條件H下,X的后驗(yàn)概率如:已知顧客購(gòu)買(mǎi)計(jì)算機(jī),該顧客為35歲收入$4000的概率貝葉斯定理:22廈門(mén)大學(xué)軟件學(xué)院共九十五頁(yè)Towards Nave Bayesian ClassifierLet D be a training set of tuples and their associated class labels, and each tuple is represented by an n-D attribu
22、te vector X = (x1, x2, , xn)Suppose there are m classes C1, C2, , Cm.Classification is to derive the maximum posteriori, i.e., the maximal P(Ci|X)This can be derived from Bayes theoremSince P(X) is constant for all classes, only needs to be maximized23廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)Derivation of Nave Baye
23、s Classifier A simplified assumption: attributes are conditionally independent (i.e., no dependence relation between attributes):This greatly reduces the computation cost: Only counts the class distributionIf Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk for Ak divided by |Ci,
24、 D| (# of tuples of Ci in D)If Ak is continous-valued, P(xk|Ci) is usually computed based on Gaussian distribution with a mean and standard deviation and P(xk|Ci) is 24廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)Nave Bayesian Classifier: Training DatasetClass:C1:buys_computer = yesC2:buys_computer = noData sample X = (ag
25、e =30,Income = medium,Student = yesCredit_rating = Fair)25廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)Nave Bayesian Classifier: An ExampleP(Ci): P(buys_computer = “yes”) = 9/14 = 0.643 P(buys_computer = “no”) = 5/14= 0.357Compute P(X|Ci) for each class P(age = “=30” | buys_computer = “yes”) = 2/9 = 0.222 P(age = “= 30” |
26、 buys_computer = “no”) = 3/5 = 0.6 P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444 P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4 P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667 P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | buys_computer
27、 = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4 X = (age = 30 , income = medium, student = yes, credit_rating = fair) P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044 P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019P(X|Ci)*P(C
28、i) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028 P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007Therefore, X belongs to class (“buys_computer = yes”)26廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)Avoiding the 0-Probability ProblemNave Bayesian prediction requires each conditional prob. be
29、non-zero. Otherwise, the predicted prob. will be zeroEx. Suppose a dataset with 1000 tuples, income=low (0), income= medium (990), and income = high (10), Use Laplacian correction (or Laplacian estimator)Adding 1 to each caseProb(income = low) = 1/1003Prob(income = medium) = 991/1003Prob(income = hi
30、gh) = 11/1003The “corrected” prob. estimates are close to their “uncorrected” counterparts27廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)Nave Bayesian Classifier: CommentsAdvantages Easy to implement Good results obtained in most of the casesDisadvantagesAssumption: class conditional independence, therefore loss of accura
31、cyPractically, dependencies exist among variables How to deal with these dependencies?Bayesian Belief Networks 28廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)貝葉斯網(wǎng)絡(luò)(wnglu)貝葉斯信念網(wǎng)絡(luò)(wnglu)允許在變量的子集間定義類(lèi)條件的獨(dú)立性因果關(guān)系的圖形建模表示了變量間的依賴(lài)關(guān)系說(shuō)明聯(lián)合條件概率分布XYZP節(jié)點(diǎn): 隨機(jī)變量(離散或連續(xù))弧: 概率依賴(lài)X,Y are the parents of Z, and Y is the parent of PNo dependency b
32、etween Z and P有向無(wú)環(huán)圖29廈門(mén)大學(xué)軟件學(xué)院共九十五頁(yè)An ExampleFamilyHistoryLungCancerPositiveXRaySmokerEmphysemaDyspneaLCLC(FH, S)(FH, S)(FH, S)(FH, S)0.80.20.50.50.70.30.10.9Bayesian Belief Networks對(duì)于每個(gè)變量,信念(xnnin)網(wǎng)絡(luò)有一個(gè)條件概率表CPT,P(Y|Parent(Y)30廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)訓(xùn)練貝葉斯信念(xnnin)網(wǎng)絡(luò)許多情況都有可能如果網(wǎng)絡(luò)結(jié)構(gòu)已知并且變量是可見(jiàn)的:只需計(jì)算CPT(
33、條件概率表)網(wǎng)絡(luò)結(jié)構(gòu)已知,但有些變量隱藏在樣本中: 使用梯度(t d)下降方法訓(xùn)練信念網(wǎng)絡(luò), 類(lèi)似于神經(jīng)網(wǎng)絡(luò)的學(xué)習(xí)網(wǎng)絡(luò)結(jié)構(gòu)未知,所有變量都可見(jiàn):從模型空間查找,重新構(gòu)建圖的拓?fù)浣Y(jié)構(gòu)未知網(wǎng)絡(luò)結(jié)構(gòu),所有變量都隱藏:還未有好的算法31廈門(mén)大學(xué)軟件學(xué)院共九十五頁(yè)Classification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian class
34、ificationRule-based classification(自學(xué)(zxu))Classification by back propagationSupport Vector Machines (SVM) Associative classification Lazy learners (or learning from your neighbors)Other classification methodsPredictionAccuracy and error measuresEnsemble methodsModel selectionSummary32廈門(mén)大學(xué)軟件(run jin
35、)學(xué)院共九十五頁(yè)Using IF-THEN Rules for Classification以IF-THEN規(guī)則的形式來(lái)表示知識(shí)R: IF age = youth AND student = yes THEN buys_computer = yes規(guī)則前件或前提 vs.規(guī)則結(jié)論規(guī)則的覆蓋率與準(zhǔn)確率ncovers = # of tuples covered by R(規(guī)則前件被滿足)ncorrect = # of tuples correctly classified by Rcoverage(R) = ncovers /|D| /* D: training data set */accurac
36、y(R) = ncorrect / ncovers如果元組被多條規(guī)則所“觸發(fā)(chf)”,則需要解決沖突規(guī)模序(Size ordering): assign the highest priority to the triggering rules that has the “toughest” requirement (i.e., with the most attribute test)基于類(lèi)的序(Class-based ordering): 類(lèi)按重要性遞減排序規(guī)則序(Rule-based ordering (decision list)): rules are organized into
37、 one long priority list, according to some measure of rule quality or by experts33廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)age?student?credit rating?40noyesyesyes31.40nofairexcellentyesno從決策樹(shù)中提取(tq)規(guī)則以IF-THEN規(guī)則的形式來(lái)表示知識(shí)對(duì)從根到樹(shù)葉的每一條路徑創(chuàng)建一條規(guī)則。沿著給定路徑上的每個(gè)屬性值對(duì),形成規(guī)則前件(“IF”部分)葉子節(jié)點(diǎn)(ji din)包含類(lèi)預(yù)測(cè),形成規(guī)則后件(“THEN”部分)IF-THEN規(guī)則易于理解。Examp
38、leIF age = “=30” AND student = “no” THEN buys_computer = “no”IF age = “40” AND credit_rating = “excellent” THEN buys_computer = “yes”IF age = “ credit approval ( Yes/No)Temp, Humidity - Rain (Yes/No)分類(lèi)(fn li)的數(shù)學(xué)映射數(shù)學(xué)方式:39廈門(mén)大學(xué)軟件學(xué)院共九十五頁(yè)線性分類(lèi)(fn li)二進(jìn)制分類(lèi)問(wèn)題紅線上方的數(shù)據(jù)(shj)屬于類(lèi)x 紅線下方的數(shù)據(jù)屬于類(lèi) o例子 SVM, Perceptron,
39、Probabilistic Classifiersxxxxxxxxxxooooooooooooo40廈門(mén)大學(xué)軟件學(xué)院共九十五頁(yè)Discriminative Classifiers優(yōu)點(diǎn)預(yù)測(cè)準(zhǔn)確率一般更高(與貝葉斯方法比較)對(duì)在樣本包含錯(cuò)誤信息的數(shù)據(jù)集上工作時(shí),具有較強(qiáng)的魯棒性快速(kui s)評(píng)估被學(xué)習(xí)的目標(biāo)函數(shù)(貝葉斯網(wǎng)絡(luò)通常較慢)限制漫長(zhǎng)的訓(xùn)練時(shí)間網(wǎng)絡(luò)構(gòu)建和訓(xùn)練過(guò)程中,參數(shù)設(shè)置多,很難確定可理解性差41廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)Classification by Backpropagation后向傳播(Backpropagation):一種神經(jīng)網(wǎng)絡(luò)(neural netwo
40、rk)學(xué)習(xí)算法最早由心理學(xué)家和神經(jīng)學(xué)家開(kāi)創(chuàng),旨在尋求開(kāi)發(fā)和測(cè)試神經(jīng)的計(jì)算模擬神經(jīng)網(wǎng)絡(luò):一組連接的輸入輸出單元,每個(gè)連接都與一個(gè)權(quán)重(weight)關(guān)聯(lián)在學(xué)習(xí)過(guò)程中,通過(guò)(tnggu)調(diào)整權(quán)重,能預(yù)測(cè)輸入元組的正確類(lèi)標(biāo)號(hào)又稱(chēng)“連接者學(xué)習(xí)”( connectionist learning)42廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)Neural Network as a ClassifierWeaknessLong training time Require a number of parameters typically best determined empirically, e.g.,
41、the network topology or structure. Poor interpretability: Difficult to interpret the symbolic meaning behind the learned weights and of hidden units in the networkStrengthHigh tolerance to noisy data Ability to classify untrained patterns Well-suited for continuous-valued inputs and outputsSuccessfu
42、l on a wide array of real-world dataAlgorithms are inherently parallelTechniques have recently been developed for the extraction of rules from trained neural networks43廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)神經(jīng)元(感知器)The n-dimensional input vector x is mapped into variable y by means of the scalar product and a nonlin
43、ear function mappingmk-fweighted sumInputvector xoutput yActivationfunctionweightvector ww0w1wnx0 x1xn44廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)A Multi-Layer Feed-Forward Neural Network Output layerInput layerHidden layerOutput vectorInput vector: Xwij45廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)How A Multi-Layer Neural Network Wor
44、ks?網(wǎng)絡(luò)的輸入對(duì)應(yīng)于每個(gè)訓(xùn)練元組測(cè)量的屬性 輸入同時(shí)(tngsh)提供給稱(chēng)作輸入層(input layer)的單元層。輸入層同時(shí)加權(quán)提供給隱藏層(hidden layer)隱藏層的數(shù)量是任意的,實(shí)踐中通常只有一層最后一個(gè)隱藏層的加權(quán)輸出作為輸出層的單元的輸入,輸出層發(fā)布給定元組的網(wǎng)絡(luò)預(yù)測(cè)。前饋網(wǎng)絡(luò):權(quán)重都不回送到輸入單元或前一層的輸出單元的網(wǎng)絡(luò)從統(tǒng)計(jì)學(xué)觀點(diǎn)來(lái)看,神經(jīng)網(wǎng)絡(luò)進(jìn)行的時(shí)非線性回歸。給定足夠多的隱藏單元和足夠的訓(xùn)練樣本,多層前饋網(wǎng)絡(luò)可以逼近任何函數(shù)。46廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)Defining a Network TopologyFirst decide th
45、e network topology: # of units in the input layer, # of hidden layers (if 1), # of units in each hidden layer, and # of units in the output layerNormalizing the input values for each attribute measured in the training tuples to 0.01.0One input unit per domain value, each initialized to 0Output, if f
46、or classification and more than two classes, one output unit per class is usedOnce a network has been trained and its accuracy is unacceptable, repeat the training process with a different network topology or a different set of initial weights47廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)Backpropagation迭代地處理訓(xùn)練元組數(shù)據(jù)集,將每
47、個(gè)元組的網(wǎng)絡(luò)預(yù)測(cè)與實(shí)際已知的目標(biāo)值做比較對(duì)于每個(gè)訓(xùn)練樣本,修改權(quán)重使網(wǎng)絡(luò)預(yù)測(cè)和實(shí)際目標(biāo)間的均方誤差最小修改“后向”進(jìn)行(jnxng):由輸出層經(jīng)由每個(gè)隱藏藏,到第一個(gè)隱藏層StepsInitialize weights (to small random #s) and biases in the networkPropagate the inputs forward (by applying activation function) Backpropagate the error (by updating weights and biases)Terminating condition (wh
48、en error is very small, etc.)48廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)49廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)Classification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian classificationRule-based classificationClassification b
49、y back propagationSupport Vector Machines (SVM) (自學(xué)(zxu))Associative classification Lazy learners (or learning from your neighbors)Other classification methodsPredictionAccuracy and error measuresEnsemble methodsModel selectionSummary50廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)SVMSupport Vector Machines線性和非線性數(shù)據(jù)的新分類(lèi)方
50、法使用非線性映射,將原訓(xùn)練數(shù)據(jù)映射到較高的維在新的維上,它搜索線性最佳分離超平面(決策邊界) 使用一個(gè)適當(dāng)?shù)淖銐?zgu)高維的非線性映射,兩類(lèi)數(shù)據(jù)總可以被超平面所分開(kāi)。SVM 使用支持向量 (“essential” training tuples) 和邊緣 (defined by the support vectors)發(fā)現(xiàn)超平面51廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)SVMHistory and ApplicationsVapnik and colleagues (1992)groundwork from Vapnik & Chervonenkis statistical lear
51、ning theory in 1960sFeatures: training can be slow but accuracy is high owing to their ability to model complex nonlinear decision boundaries (margin maximization)Used both for classification and predictionApplications: handwritten digit recognition, object recognition, speaker identification, bench
52、marking time-series prediction tests 52廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)SVMGeneral PhilosophySupport VectorsSmall MarginLarge Margin53廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)SVMMargins and Support Vectors54廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)SVMWhen Data Is Linearly SeparablemLet data D be (X1, y1), , (X|D|, y|D|), where Xi is the set of tra
53、ining tuples associated with the class labels yiThere are infinite lines (hyperplanes) separating the two classes but we want to find the best one (the one that minimizes classification error on unseen data)SVM searches for the hyperplane with the largest margin, i.e., maximum marginal hyperplane (M
54、MH)55廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)SVMLinearly SeparableA separating hyperplane can be written asW X + b = 0where W=w1, w2, , wn is a weight vector and b a scalar (bias)For 2-D it can be written asw0 + w1 x1 + w2 x2 = 0The hyperplane defining the sides of the margin: H1: w0 + w1 x1 + w2 x2 1 for yi = +1, an
55、dH2: w0 + w1 x1 + w2 x2 1 for yi = 1Any training tuples that fall on hyperplanes H1 or H2 (i.e., the sides defining the margin) are support vectors56廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)SVMLinearly InseparableTransform the original input data into a higher dimensional spaceSearch for a linear separating hyperplane
56、 in the new space57廈門(mén)大學(xué)(sh mn d xu)軟件學(xué)院共九十五頁(yè)SVMKernel functionsInstead of computing the dot product on the transformed data tuples, it is mathematically equivalent to instead applying a kernel function K(Xi, Xj) to the original data, i.e., K(Xi, Xj) = (Xi) (Xj) Typical Kernel FunctionsSVM can also b
57、e used for classifying multiple ( 2) classes and for regression analysis (with additional user parameters)58廈門(mén)大學(xué)軟件(run jin)學(xué)院共九十五頁(yè)Classification and PredictionWhat is classification? What is prediction?Issues regarding classification and predictionClassification by decision tree inductionBayesian cl
58、assificationRule-based classificationClassification by back propagationSupport Vector Machines (SVM) Associative classification (自學(xué)(zxu)) Lazy learners (or learning from your neighbors)Other classification methodsPredictionAccuracy and error measuresEnsemble methodsModel selectionSummary59廈門(mén)大學(xué)(sh mn
59、 d xu)軟件學(xué)院共九十五頁(yè)Associative ClassificationAssociative classification關(guān)聯(lián)規(guī)則(guz)的產(chǎn)生和分析旨在用于分類(lèi)搜索頻繁模式與類(lèi)標(biāo)號(hào)之間的強(qiáng)關(guān)聯(lián)規(guī)則。Classification: Based on evaluating a set of rules in the form of P1 p2 pl “Aclass = C” (conf, sup)Why effective? 關(guān)聯(lián)規(guī)則考察了多屬性間的高置信度關(guān)聯(lián),可以克服決策樹(shù)歸納一次只能考慮一個(gè)屬性的局限性。關(guān)聯(lián)分類(lèi)比傳統(tǒng)的分類(lèi)方法更準(zhǔn)確60廈門(mén)大學(xué)軟件(run jin)學(xué)院共九
60、十五頁(yè)Typical Associative Classification MethodsCBA (Classification By Association: Liu, Hsu & Ma, KDD98)Mine association possible rules in the form ofCond-set (a set of attribute-value pairs) class labelBuild classifier: Organize rules according to decreasing precedence based on confidence and then su
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2026貴州貴陽(yáng)長(zhǎng)黔汽車(chē)銷(xiāo)售服務(wù)有限公司招聘考試備考試題及答案解析
- 樂(lè)清市人力資源和社會(huì)保障局公開(kāi)選調(diào)1名下屬事業(yè)單位工作人員考試備考試題及答案解析
- 2026年紹興嵊州市水務(wù)投資發(fā)展集團(tuán)有限公司公開(kāi)招聘工作人員8人筆試備考題庫(kù)及答案解析
- 2026湖南長(zhǎng)沙廣播中心招聘考試參考試題及答案解析
- 2026廣西梧州市萬(wàn)秀區(qū)殘疾人聯(lián)合會(huì)招聘社區(qū)殘協(xié)專(zhuān)職委員3人考試參考試題及答案解析
- 2026年河南醫(yī)藥大學(xué)誠(chéng)聘研究生輔導(dǎo)員10名考試參考題庫(kù)及答案解析
- 2026重慶九龍坡區(qū)鐵馬小學(xué)校招聘3人考試參考試題及答案解析
- 2026年甘肅省平?jīng)鍪星f浪縣第一批城鎮(zhèn)公益性崗位工作人員招聘47人筆試備考題庫(kù)及答案解析
- 2026江蘇泰州市興化市人才儲(chǔ)備中心招募見(jiàn)習(xí)人員(第1號(hào))考試參考試題及答案解析
- 2026福建漳龍集團(tuán)有限公司面向集團(tuán)競(jìng)聘權(quán)屬地產(chǎn)集團(tuán)兩個(gè)副總經(jīng)理崗位2人考試備考試題及答案解析
- 民航安全檢查掌握開(kāi)箱包檢查課件
- 北京市海淀區(qū)2023-2024學(xué)年高三上學(xué)期期末考試地理試卷及答案
- 學(xué)生學(xué)業(yè)成績(jī)?cè)u(píng)估表-各科目成績(jī)統(tǒng)計(jì)與比較分析
- 家畜繁殖知到智慧樹(shù)章節(jié)測(cè)試課后答案2024年秋漢中職業(yè)技術(shù)學(xué)院
- 四川省廣安市2024-2025學(xué)年高一上學(xué)期期末數(shù)學(xué)試題
- 北京市海淀區(qū)2024-2025學(xué)年高一上學(xué)期期末考試英語(yǔ)試題
- 水庫(kù)大壩安全評(píng)價(jià)報(bào)告
- DB41T 877-2013 食用菊花栽培技術(shù)規(guī)程
- 大學(xué)生創(chuàng)業(yè)導(dǎo)論學(xué)習(xí)通超星期末考試答案章節(jié)答案2024年
- 亨元順煤礦防治水分區(qū)管理論證報(bào)告修改
- 《公路橋涵養(yǎng)護(hù)規(guī)范》(JTG5120-2021)
評(píng)論
0/150
提交評(píng)論