數(shù)據(jù)挖掘?qū)д撚⑽腸hap1-intro_第1頁
數(shù)據(jù)挖掘?qū)д撚⑽腸hap1-intro_第2頁
數(shù)據(jù)挖掘?qū)д撚⑽腸hap1-intro_第3頁
數(shù)據(jù)挖掘?qū)д撚⑽腸hap1-intro_第4頁
數(shù)據(jù)挖掘?qū)д撚⑽腸hap1-intro_第5頁
已閱讀5頁,還剩26頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

1、Data Mining: Introduction,Lecture Notes for Chapter 1 Introduction to Data Mining by Tan, Steinbach, Kumar,Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card transactions Computers have become cheaper and more powerful Competi

2、tive Pressure is Strong Provide better, customized services for an edge (e.g. in Customer Relationship Management),Why Mine Data? Commercial Viewpoint,Why Mine Data? Scientific Viewpoint,Data collected and stored at enormous speeds (GB/hour) remote sensors on a satellite telescopes scanning the skie

3、s microarrays generating gene expression data scientific simulations generating terabytes of data Traditional techniques infeasible for raw data Data mining may help scientists in classifying and segmenting data in Hypothesis Formation,Mining Large Data Sets - Motivation,There is often information “

4、hidden” in the data that is not readily evident Human analysts may take weeks to discover useful information Much of the data is never analyzed at all,The Data Gap,Total new disk (TB) since 1995,Number of analysts,What is Data Mining?,Many Definitions Non-trivial extraction of implicit, previously u

5、nknown and potentially useful information from data Exploration Produce dependency rules which will predict occurrence of an item based on occurrences of other items.,Rules Discovered: Milk - Coke Diaper, Milk - Beer,Association Rule Discovery: Application 1,Marketing and Sales Promotion: Let the ru

6、le discovered be Bagels, - Potato Chips Potato Chips as consequent = Can be used to determine what should be done to boost its sales. Bagels in the antecedent = Can be used to see which products would be affected if the store discontinues selling bagels. Bagels in antecedent and Potato chips in cons

7、equent = Can be used to see what products should be sold with Bagels to promote sale of Potato chips!,Association Rule Discovery: Application 2,Supermarket shelf management. Goal: To identify items that are bought together by sufficiently many customers. Approach: Process the point-of-sale data coll

8、ected with barcode scanners to find dependencies among items. A classic rule - If a customer buys diaper and milk, then he is very likely to buy beer. So, dont be surprised if you find six-packs stacked next to diapers!,Association Rule Discovery: Application 3,Inventory Management: Goal: A consumer

9、 appliance repair company wants to anticipate the nature of repairs on its consumer products and keep the service vehicles equipped with right parts to reduce on number of visits to consumer households. Approach: Process the data on tools and parts required in previous repairs at different consumer

10、locations and discover the co-occurrence patterns.,Sequential Pattern Discovery: Definition,Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events. Rules are formed by first disovering patt

11、erns. Event occurrences in the patterns are governed by timing constraints.,Sequential Pattern Discovery: Examples,In telecommunications alarm logs, (Inverter_Problem Excessive_Line_Current) (Rectifier_Alarm) - (Fire_Alarm) In point-of-sale transaction sequences, Computer Bookstore: (Intro_To_Visual

12、_C) (C+_Primer) - (Perl_for_dummies,Tcl_Tk) Athletic Apparel Store: (Shoes) (Racket, Racketball) - (Sports_Jacket),Regression,Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. Greatly studied in statistic

13、s, neural network fields. Examples: Predicting sales amounts of new product based on advetising expenditure. Predicting wind velocities as a function of temperature, humidity, air pressure, etc. Time series prediction of stock market indices.,Deviation/Anomaly Detection,Detect significant deviations from normal behavior Applications: Credit Card Fraud Detection Network Intrusion Detection,Typical network traffic at University level

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論