版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
1、數(shù)據(jù)挖掘第二次作業(yè)第一題:1. a) Compute the Information Gain for Gender, Car Type and Shirt Size.b) Construct a decision tree with Information Gain.答案:a) 因為class分為兩類:C0和C1,其中C0的頻數(shù)為10個,C1的頻數(shù)為10,所以class元組的信息增益為Info(D)=-1020*log21020-1020*log2(10/20)=11.按照Gender進行分類:Infogender(D)=1020*-410*log2410-610*log2610+1020*
2、(-410*log2410-610*log2610)=0.971Gain(Gender)=1-0.971=0.0292.按照Car Type進行分類InfocarType(D)=420*-14*log214-34*log234+820*-78*log278-18*log218+820* -88*log288-08*log208=0.314Gain(Car Type)=1-0.314=0.6863.按照Shirt Size進行分類:InfoshirtSize(D)= 520*-35*log235-25*log225+720*-47*log247-37*log237+420* -24*log224
3、-24*log224+420*-24*log224-24*log224=0.988Gain(Shirt Size)=1-0.988=0.012b) 由a中的信息增益結(jié)果可以看出采用Car Type進行分類得到的信息增益最大,所以決策樹為:Car Type?medium,large, extra largesmallC1C0C0luxurySportfamilyShirt Size?C1第二題:2. (a) Design a multilayer feed-forward neural network (one hidden layer) for the data set in Q1. Labe
4、l the nodes in the input and output layers.(b) Using the neural network obtained above, show the weight values after one iteration of the back propagation algorithm, given the training instance “(M, Family, Small). Indicate your initial weight values and biases and the learning rate used.a)b) 由a可以設(shè)每
5、個輸入單元代表的屬性和初始賦值X11X12X21X22X23X31X32X33X34FMFamilySportsLuxurySmallMediumLargeExtra Large011001000由于初始的權(quán)重和偏倚值是隨機生成的所以在此定義初始值為:W1,10W1,11W2,10W2,11W3,10W3,11W4,10W4,11W5,10W5,110.20.2-0.2-0.10.40.3-0.2-0.10.1-0.1W6,10W6,11W7,10W7,11W8,10W8,11W9,10W9,11W10,12W11,120.1-0.2-0.40.20.20.2-0.10.3-0.3-0.110
6、1112-0.20.20.3凈輸入和輸出:單元 j凈輸入 Ij輸出Oj100.10.52110.20.55120.0890.48每個節(jié)點的誤差表:單元jErrj100.0089110.003012-0.12權(quán)重和偏倚的更新:W1,10W1,11W2,10W2,11W3,10W3,11W4,10W4,11W5,10W5,110.2010.198-0.211-0.0990.40.308-0.202-0.0980.101-0.100W6,10W6,11W7,10W7,11W8,10W8,11W9,10W9,11W10,12W11,120.092-0.211-0.4000.1980.2010.190-
7、0.1100.300-0.304-0.099101112-0.2870.1790.344第三題:3.a) Suppose the fraction of undergraduate students who smoke is 15% and the fraction of graduate students who smoke is 23%. If one-fth of the college students are graduate students and the rest are undergraduates, what is the probability that a studen
8、t who smokes is a graduate student?b) Given the information in part (a), is a randomly chosen college student more likely to be a graduate or undergraduate student?c) Suppose 30% of the graduate students live in a dorm but only 10% of the undergraduate students live in a dorm. If a student smokes an
9、d lives in the dorm, is he or she more likely to be a graduate or undergraduate student? You can assume independence between students who live in a dorm and those who smoke.答:a) 定義:A=A1 ,A2其中A1表示沒有畢業(yè)的學(xué)生,A2表示畢業(yè)的學(xué)生,B表示抽煙則由題意而知:P(B|A1)=15% P(B|A2)=23% P(A1)=4/5 P(A2)=1/5 則問題則是求P(A2|B)由則b) 由a可以看出隨機抽取一個抽
10、煙的大學(xué)生,是畢業(yè)生的概率是0.277,未畢業(yè)的學(xué)生是0.723,所以有很大的可能性是未畢業(yè)的學(xué)生。c) 設(shè)住在宿舍為事件C則P(C|A2)=30% P(C|A1)=10% =0.4所以由上面的結(jié)果可以看出是畢業(yè)生的概率大一些第四題:4. Suppose that the data mining task is to cluster the following ten points (with(x, y, z) representing location) into three clusters:A1(4,2,5), A2(10,5,2), A3(5,8,7), B1(1,1,1), B2(2
11、,3,2), B3(3,6,9), C1(11,9,2), C2(1,4,6), C3(9,1,7), C4(5,6,7)The distance function is Euclidean distance. Suppose initially we assign A1, B1, C1 as the center of each cluster, respectively. Use the K-Means algorithm to show only(a) The three cluster center after the first round execution(b) The fina
12、l three clusters答:a) 各點到中心點的歐式距離第一輪:A1B1C1A2549817A34110162B2146165B33393122C21434141C33010093C4217770從而得到的三個簇為:A1, A3,B3,C2, C3, C4 B1,B2 C1,A2所以三個簇新的中心為:(4.5,4.5,6.83),(1.5,2,1.5),(10.5,7,2)第二輪:新的簇均值為:(4.5,4.5,6.83),(1.5,2,1.5),(10.5,7,2)(4.5,4.5,6.83)(1.5,2,1.5)C1(10.5,7,2)A19.18.576.25A253.86111
13、81.54.25A312.5277878.556.25B158.527781.5127.25B231.861111.588.25B39.74.5106.25C185.86111139.54.25C213.1944424.5115.25C332.5277887.563.25C42.58.556.25所以得到的新的簇為:A1, A3,B3,C2, C3, C4 B1,B2 C1,A2得到的新的簇跟第一輪結(jié)束得到的簇的結(jié)果相同,不再變化,所以上面的簇是最終的結(jié)果。Part II: LabQuestion 1 Assume this supermarket would like to promote
14、milk. Use the data in “transactions” as training data to build a decision tree (C5.0 algorithm) model to predict whether the customer would buy milk or not. 1. Build a decision tree using data set “transactions” that predicts milk as a function of the other fields. Set the “type” of each field to “F
15、lag”, set the “direction” of “milk” as “out”, set the “type” of COD as “Typeless”, select “Expert” and set the “pruning severity” to 65, and set the “minimum records per child branch” to be 95. Hand-in: A figure showing your tree.2. Use the model (the full tree generated by Clementine in step 1 abov
16、e) to make a prediction for each of the 20 customers in the “rollout” data to determine whether the customer would buy milk. Hand-in: your prediction for each of the 20 customers.3. Hand-in: rules for positive (yes) prediction of milk purchase identified from the decision tree (up to the fifth level
17、. The root is considered as level 1). Compare with the rules generated by Apriori in Homework 1, and submit your brief comments on the rules (e.g., pruning effect)答:1生成的決策樹為:生成的決策樹模型為:juices = 1 Mode: 1 water = 1 Mode: 1 = 1 water = 0 Mode: 0 pasta = 1 Mode: 1 = 1 pasta = 0 Mode: 0 tomato souce = 1
18、Mode: 1 = 1 tomato souce = 0 Mode: 0 biscuits = 1 Mode: 1 = 1 biscuits = 0 Mode: 0 = 0 juices = 0 Mode: 0 yoghurt = 1 Mode: 1 water = 1 Mode: 1 = 1 water = 0 Mode: 0 biscuits = 1 Mode: 1 = 1 biscuits = 0 Mode: 0 brioches = 1 Mode: 1 = 1 brioches = 0 Mode: 0 beer = 1 Mode: 1 = 1 beer = 0 Mode: 0 = 0
19、yoghurt = 0 Mode: 0 beer = 1 Mode: 0 biscuits = 1 Mode: 1 = 1 biscuits = 0 Mode: 0 rice = 1 Mode: 1 = 1 rice = 0 Mode: 0 coffee = 1 Mode: 1 water = 1 Mode: 1 = 1 water = 0 Mode: 0 = 0 coffee = 0 Mode: 0 = 0 beer = 0 Mode: 0 frozen vegetables = 1 Mode: 0 biscuits = 1 Mode: 1 pasta = 1 Mode: 1 = 1 pas
20、ta = 0 Mode: 0 = 0 biscuits = 0 Mode: 0 oil = 1 Mode: 1 = 1 oil = 0 Mode: 0 brioches = 1 Mode: 0 water = 1 Mode: 1 = 1 water = 0 Mode: 0 = 0 brioches = 0 Mode: 0 = 0 frozen vegetables = 0 Mode: 0 pasta = 1 Mode: 0 mozzarella = 1 Mode: 1 = 1 mozzarella = 0 Mode: 0 water = 1 Mode: 1 biscuits = 1 Mode:
21、 1 = 1 biscuits = 0 Mode: 0 brioches = 1 Mode: 1 = 1 brioches = 0 Mode: 0 coffee = 1 Mode: 1 = 1 coffee = 0 Mode: 0 = 0 water = 0 Mode: 0 coke = 1 Mode: 0 coffee = 1 Mode: 1 = 1 coffee = 0 Mode: 0 = 0 coke = 0 Mode: 0 = 0 pasta = 0 Mode: 0 water = 1 Mode: 0 coffee = 1 Mode: 1 = 1 coffee = 0 Mode: 0
22、= 0 water = 0 Mode: 1 rice = 1 Mode: 0 = 0 rice = 0 Mode: 1 tunny = 1 Mode: 0 biscuits = 1 Mode: 1 = 1 biscuits = 0 Mode: 0 = 0 tunny = 0 Mode: 1 brioches = 1 Mode: 0 = 0 brioches = 0 Mode: 1 coke = 1 Mode: 0 = 0 coke = 0 Mode: 1 coffee = 1 Mode: 0 = 0 coffee = 0 Mode: 1 biscuits = 1 Mode: 0 = 0 bis
23、cuits = 0 Mode: 1 oil = 1 Mode: 0 = 0 oil = 0 Mode: 1 tomato souce = 1 Mode: 0 = 0 tomato souce = 0 Mode: 1 mozzarella = 1 Mode: 0 = 0 mozzarella = 0 Mode: 1 crackers = 1 Mode: 0 = 0 crackers = 0 Mode: 1 frozen fish = 1 Mode: 0 = 0 frozen fish = 0 Mode: 1 = 12按照1中生成的據(jù)冊數(shù)進行預(yù)測的結(jié)果:4. 生成的關(guān)聯(lián)規(guī)則為:Question 2: Churn ManagementThe
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 電氣接地檢測技術(shù)要領(lǐng)
- 數(shù)控編程考試題庫及答案
- 審評規(guī)則考試題及答案
- 審計實務(wù)試卷試題及答案
- 融資專崗招聘考試題庫及答案
- 《GAT 974.90-2015消防信息代碼 第90部分:滅火器類型代碼》專題研究報告
- 2026年深圳中考英語任務(wù)型閱讀專項試卷(附答案可下載)
- 2026年深圳中考英語創(chuàng)新題型特訓(xùn)試卷(附答案可下載)
- 2026年深圳中考數(shù)學(xué)圓的相關(guān)性質(zhì)試卷(附答案可下載)
- 2026年深圳中考生物人體的神經(jīng)調(diào)節(jié)專項試卷(附答案可下載)
- 設(shè)計成果保密管理制度
- 珠寶文化課件
- GB/T 43590.506-2025激光顯示器件第5-6部分:投影屏幕光學(xué)性能測試方法
- 電工職業(yè)衛(wèi)生試題及答案
- 五年級第一學(xué)期勞動課教學(xué)計劃和總結(jié)
- 《骨及關(guān)節(jié)疾病》課件
- QES三體系建筑施工企業(yè)管理手冊(含50430)
- 物業(yè)管理技巧與經(jīng)驗分享
- GB/T 44179-2024交流電壓高于1 000 V和直流電壓高于1 500 V的變電站用空心支柱復(fù)合絕緣子定義、試驗方法和接收準則
- 德漢翻譯入門智慧樹知到期末考試答案章節(jié)答案2024年中國海洋大學(xué)
- MT-T 1199-2023 煤礦用防爆柴油機無軌膠輪運輸車輛安全技術(shù)條件
評論
0/150
提交評論