版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、.,1,Contributed by Yizhou Sun 2008,An Introduction to WEKA,23.06.2020,.,2,Content,What is WEKA? The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources,23.06.2020,.,3,What is WEKA?,Waikato Environment for Knowledge Ana
2、lysis Its a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand.,23.06.2020,.,4,Download and Install WEKA,Website: http:/www.cs.waikato.ac.nz/ml/weka/index.html Support multipl
3、e platforms (written in java): Windows, Mac OS X and Linux,23.06.2020,.,5,Main Features,49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search algorithms for feature selection,2
4、3.06.2020,.,6,Main GUI,Three graphical user interfaces “The Explorer” (exploratory data analysis) “The Experimenter” (experimental environment) “The KnowledgeFlow” (new process model inspired interface),23.06.2020,.,7,Content,What is WEKA? The Explorer: Preprocess data Classification Clustering Asso
5、ciation Rules Attribute Selection Data Visualization References and Resources,23.06.2020,.,8,Explorer: pre-processing the data,Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WE
6、KA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, ,23.06.2020,.,9,relation heart-disease-simplified attribute age numeric attribute sex female, male attribute chest_pain_type typ_angina, asympt, n
7、on_anginal, atyp_angina attribute cholesterol numeric attribute exercise_induced_angina no, yes attribute class present, not_present data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present .,WEKA only deals with
8、“flat” files,Flat file in ARFF format,23.06.2020,.,10,relation heart-disease-simplified attribute age numeric attribute sex female, male attribute chest_pain_type typ_angina, asympt, non_anginal, atyp_angina attribute cholesterol numeric attribute exercise_induced_angina no, yes attribute class pres
9、ent, not_present data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present .,WEKA only deals with “flat” files,numeric attribute,nominal attribute,23.06.2020,.,11,23.06.2020,.,12,23.06.2020,.,13,23.06.2020,.,14,23.
10、06.2020,.,15,23.06.2020,.,16,23.06.2020,.,17,23.06.2020,.,18,23.06.2020,.,19,23.06.2020,.,20,23.06.2020,.,21,23.06.2020,.,22,23.06.2020,.,23,23.06.2020,.,24,23.06.2020,.,25,23.06.2020,.,26,23.06.2020,.,27,23.06.2020,.,28,23.06.2020,.,29,23.06.2020,.,30,23.06.2020,.,31,23.06.2020,.,32,Explorer: build
11、ing “classifiers”,Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets, ,23.06.2020,.,33,This follows a
12、n example of Quinlans ID3 (Playing Tennis),Decision Tree Induction: Training Dataset,23.06.2020,.,34,Output: A Decision Tree for “buys_computer”,23.06.2020,.,35,Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examp
13、les are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain),Algorithm for Decision Tre
14、e Induction,23.06.2020,.,36,23.06.2020,.,37,23.06.2020,.,38,23.06.2020,.,39,23.06.2020,.,40,23.06.2020,.,41,23.06.2020,.,42,23.06.2020,.,43,23.06.2020,.,44,23.06.2020,.,45,23.06.2020,.,46,23.06.2020,.,47,23.06.2020,.,48,23.06.2020,.,49,23.06.2020,.,50,23.06.2020,.,51,23.06.2020,.,52,23.06.2020,.,53,
15、23.06.2020,.,54,23.06.2020,.,55,23.06.2020,.,56,23.06.2020,.,57,23.06.2020,.,58,Explorer: clustering data,WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “t
16、rue” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution,23.06.2020,.,59,Explorer: finding associations,WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statis
17、tical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence,23.06.2020,.,60,23.06.2020,.,61,23.06.2020,.,62,23.06.2020,.,63,23.06.2020,.,64,23.06.2020,.,
18、65,Explorer: attribute selection,Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correl
19、ation-based, wrapper, information gain, chi-squared, Very flexible: WEKA allows (almost) arbitrary combinations of these two,23.06.2020,.,66,23.06.2020,.,67,23.06.2020,.,68,23.06.2020,.,69,23.06.2020,.,70,23.06.2020,.,71,23.06.2020,.,72,23.06.2020,.,73,23.06.2020,.,74,Explorer: data visualization,Vi
20、sualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function,23.06.2020
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2026年中國(guó)鐵路上海局集團(tuán)有限公司杭州客運(yùn)段列車乘務(wù)員崗位招聘?jìng)淇碱}庫(kù)有答案詳解
- 2026年廣東外語(yǔ)外貿(mào)大學(xué)附屬番禺小學(xué)招聘?jìng)淇碱}庫(kù)及一套答案詳解
- 2026年北京郵電大學(xué)集成電路學(xué)院招聘?jìng)淇碱}庫(kù)(人才派遣)完整答案詳解
- 2026年哈爾濱市香坊第二幼兒園教師招聘?jìng)淇碱}庫(kù)期待你的加入完整參考答案詳解
- 2026年傳染病預(yù)防控制所國(guó)家致病菌識(shí)別網(wǎng)中心實(shí)驗(yàn)室外聘人員公開招聘?jìng)淇碱}庫(kù)有答案詳解
- 行政政府采購(gòu)內(nèi)控制度
- 售后內(nèi)控制度
- 人事勞資內(nèi)控制度
- 測(cè)量?jī)?nèi)控制度
- 超市收款管理內(nèi)控制度
- 粉刷安全晨會(huì)(班前會(huì))
- 2024年國(guó)網(wǎng)35條嚴(yán)重違章及其釋義解讀-知識(shí)培訓(xùn)
- (中職)中職生創(chuàng)新創(chuàng)業(yè)能力提升教課件完整版
- 部編版八年級(jí)語(yǔ)文上冊(cè)課外文言文閱讀訓(xùn)練5篇()【含答案及譯文】
- 高三英語(yǔ)一輪復(fù)習(xí)人教版(2019)全七冊(cè)單元寫作主題匯 總目錄清單
- 路基工程危險(xiǎn)源辨識(shí)與風(fēng)險(xiǎn)評(píng)價(jià)清單
- NB-T+10131-2019水電工程水庫(kù)區(qū)工程地質(zhì)勘察規(guī)程
- 大學(xué)基礎(chǔ)課《大學(xué)物理(一)》期末考試試題-含答案
- 管理大略與領(lǐng)導(dǎo)小言智慧樹知到期末考試答案章節(jié)答案2024年山東大學(xué)
- 小班科學(xué)《瓶子和蓋子》教案
- 草地生態(tài)系統(tǒng)的地上與地下相互作用
評(píng)論
0/150
提交評(píng)論