WEKA詳細(xì)介紹PPT課件_第1頁(yè)
WEKA詳細(xì)介紹PPT課件_第2頁(yè)
WEKA詳細(xì)介紹PPT課件_第3頁(yè)
WEKA詳細(xì)介紹PPT課件_第4頁(yè)
WEKA詳細(xì)介紹PPT課件_第5頁(yè)
已閱讀5頁(yè),還剩80頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、.,1,Contributed by Yizhou Sun 2008,An Introduction to WEKA,23.06.2020,.,2,Content,What is WEKA? The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources,23.06.2020,.,3,What is WEKA?,Waikato Environment for Knowledge Ana

2、lysis Its a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand.,23.06.2020,.,4,Download and Install WEKA,Website: http:/www.cs.waikato.ac.nz/ml/weka/index.html Support multipl

3、e platforms (written in java): Windows, Mac OS X and Linux,23.06.2020,.,5,Main Features,49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search algorithms for feature selection,2

4、3.06.2020,.,6,Main GUI,Three graphical user interfaces “The Explorer” (exploratory data analysis) “The Experimenter” (experimental environment) “The KnowledgeFlow” (new process model inspired interface),23.06.2020,.,7,Content,What is WEKA? The Explorer: Preprocess data Classification Clustering Asso

5、ciation Rules Attribute Selection Data Visualization References and Resources,23.06.2020,.,8,Explorer: pre-processing the data,Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WE

6、KA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, ,23.06.2020,.,9,relation heart-disease-simplified attribute age numeric attribute sex female, male attribute chest_pain_type typ_angina, asympt, n

7、on_anginal, atyp_angina attribute cholesterol numeric attribute exercise_induced_angina no, yes attribute class present, not_present data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present .,WEKA only deals with

8、“flat” files,Flat file in ARFF format,23.06.2020,.,10,relation heart-disease-simplified attribute age numeric attribute sex female, male attribute chest_pain_type typ_angina, asympt, non_anginal, atyp_angina attribute cholesterol numeric attribute exercise_induced_angina no, yes attribute class pres

9、ent, not_present data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present .,WEKA only deals with “flat” files,numeric attribute,nominal attribute,23.06.2020,.,11,23.06.2020,.,12,23.06.2020,.,13,23.06.2020,.,14,23.

10、06.2020,.,15,23.06.2020,.,16,23.06.2020,.,17,23.06.2020,.,18,23.06.2020,.,19,23.06.2020,.,20,23.06.2020,.,21,23.06.2020,.,22,23.06.2020,.,23,23.06.2020,.,24,23.06.2020,.,25,23.06.2020,.,26,23.06.2020,.,27,23.06.2020,.,28,23.06.2020,.,29,23.06.2020,.,30,23.06.2020,.,31,23.06.2020,.,32,Explorer: build

11、ing “classifiers”,Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets, ,23.06.2020,.,33,This follows a

12、n example of Quinlans ID3 (Playing Tennis),Decision Tree Induction: Training Dataset,23.06.2020,.,34,Output: A Decision Tree for “buys_computer”,23.06.2020,.,35,Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examp

13、les are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain),Algorithm for Decision Tre

14、e Induction,23.06.2020,.,36,23.06.2020,.,37,23.06.2020,.,38,23.06.2020,.,39,23.06.2020,.,40,23.06.2020,.,41,23.06.2020,.,42,23.06.2020,.,43,23.06.2020,.,44,23.06.2020,.,45,23.06.2020,.,46,23.06.2020,.,47,23.06.2020,.,48,23.06.2020,.,49,23.06.2020,.,50,23.06.2020,.,51,23.06.2020,.,52,23.06.2020,.,53,

15、23.06.2020,.,54,23.06.2020,.,55,23.06.2020,.,56,23.06.2020,.,57,23.06.2020,.,58,Explorer: clustering data,WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “t

16、rue” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution,23.06.2020,.,59,Explorer: finding associations,WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statis

17、tical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence,23.06.2020,.,60,23.06.2020,.,61,23.06.2020,.,62,23.06.2020,.,63,23.06.2020,.,64,23.06.2020,.,

18、65,Explorer: attribute selection,Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correl

19、ation-based, wrapper, information gain, chi-squared, Very flexible: WEKA allows (almost) arbitrary combinations of these two,23.06.2020,.,66,23.06.2020,.,67,23.06.2020,.,68,23.06.2020,.,69,23.06.2020,.,70,23.06.2020,.,71,23.06.2020,.,72,23.06.2020,.,73,23.06.2020,.,74,Explorer: data visualization,Vi

20、sualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function,23.06.2020

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論