版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)
文檔簡介
1、Data Mining: Introduction,Lecture Notes for Chapter 1 Introduction to Data Mining by Tan, Steinbach, Kumar,Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card transactions Computers have become cheaper and more powerful Competi
2、tive Pressure is Strong Provide better, customized services for an edge (e.g. in Customer Relationship Management),Why Mine Data? Commercial Viewpoint,Why Mine Data? Scientific Viewpoint,Data collected and stored at enormous speeds (GB/hour) remote sensors on a satellite telescopes scanning the skie
3、s microarrays generating gene expression data scientific simulations generating terabytes of data Traditional techniques infeasible for raw data Data mining may help scientists in classifying and segmenting data in Hypothesis Formation,Mining Large Data Sets - Motivation,There is often information “
4、hidden” in the data that is not readily evident Human analysts may take weeks to discover useful information Much of the data is never analyzed at all,The Data Gap,Total new disk (TB) since 1995,Number of analysts,What is Data Mining?,Many Definitions Non-trivial extraction of implicit, previously u
5、nknown and potentially useful information from data Exploration Produce dependency rules which will predict occurrence of an item based on occurrences of other items.,Rules Discovered: Milk - Coke Diaper, Milk - Beer,Association Rule Discovery: Application 1,Marketing and Sales Promotion: Let the ru
6、le discovered be Bagels, - Potato Chips Potato Chips as consequent = Can be used to determine what should be done to boost its sales. Bagels in the antecedent = Can be used to see which products would be affected if the store discontinues selling bagels. Bagels in antecedent and Potato chips in cons
7、equent = Can be used to see what products should be sold with Bagels to promote sale of Potato chips!,Association Rule Discovery: Application 2,Supermarket shelf management. Goal: To identify items that are bought together by sufficiently many customers. Approach: Process the point-of-sale data coll
8、ected with barcode scanners to find dependencies among items. A classic rule - If a customer buys diaper and milk, then he is very likely to buy beer. So, dont be surprised if you find six-packs stacked next to diapers!,Association Rule Discovery: Application 3,Inventory Management: Goal: A consumer
9、 appliance repair company wants to anticipate the nature of repairs on its consumer products and keep the service vehicles equipped with right parts to reduce on number of visits to consumer households. Approach: Process the data on tools and parts required in previous repairs at different consumer
10、locations and discover the co-occurrence patterns.,Sequential Pattern Discovery: Definition,Given is a set of objects, with each object associated with its own timeline of events, find rules that predict strong sequential dependencies among different events. Rules are formed by first disovering patt
11、erns. Event occurrences in the patterns are governed by timing constraints.,Sequential Pattern Discovery: Examples,In telecommunications alarm logs, (Inverter_Problem Excessive_Line_Current) (Rectifier_Alarm) - (Fire_Alarm) In point-of-sale transaction sequences, Computer Bookstore: (Intro_To_Visual
12、_C) (C+_Primer) - (Perl_for_dummies,Tcl_Tk) Athletic Apparel Store: (Shoes) (Racket, Racketball) - (Sports_Jacket),Regression,Predict a value of a given continuous valued variable based on the values of other variables, assuming a linear or nonlinear model of dependency. Greatly studied in statistic
13、s, neural network fields. Examples: Predicting sales amounts of new product based on advetising expenditure. Predicting wind velocities as a function of temperature, humidity, air pressure, etc. Time series prediction of stock market indices.,Deviation/Anomaly Detection,Detect significant deviations from normal behavior Applications: Credit Card Fraud Detection Network Intrusion Detection,Typical network traffic at University level
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 礦山巡查年終總結(jié)范文(3篇)
- 職業(yè)健康監(jiān)護中的跨區(qū)域協(xié)作機制
- 職業(yè)健康數(shù)據(jù)挖掘工具開發(fā)與應(yīng)用
- 職業(yè)健康促進的成本效益優(yōu)化策略-1
- 高中三年級歷史《中國現(xiàn)代教育、文化與科技》
- 職業(yè)健康與生產(chǎn)效率關(guān)聯(lián)性
- 長沙2025年湖南長郡雙語星沙學(xué)校公開招聘校聘教師筆試歷年參考題庫附帶答案詳解
- 金華浙江金華武義經(jīng)濟開發(fā)區(qū)管理委員會招聘應(yīng)急消防協(xié)管員筆試歷年參考題庫附帶答案詳解
- 迪慶2025年云南迪慶香格里拉市小學(xué)教師自主招聘22人筆試歷年參考題庫附帶答案詳解
- 院感感控督查員培訓(xùn)課件
- 2026年高級人工智能訓(xùn)練師(三級)理論考試題庫(附答案)
- 2026北京印鈔有限公司招聘26人筆試備考試題及答案解析
- 2026山西杏花村汾酒集團有限責(zé)任公司生產(chǎn)一線技術(shù)工人招聘220人筆試參考題庫及答案解析
- 百師聯(lián)盟2025-2026學(xué)年高三上學(xué)期1月期末考試俄語試題含答案
- 2026年湖北中煙工業(yè)有限責(zé)任公司招聘169人筆試參考題庫及答案解析
- 2026年六年級寒假體育作業(yè)(1月31日-3月1日)
- 干部培訓(xùn)行業(yè)現(xiàn)狀分析報告
- 東海藥業(yè)校招測評題庫
- 精準(zhǔn)定位式漏水檢測方案
- 2023氣管插管意外拔管的不良事件分析及改進措施
- 2023自動啟閉噴水滅火系統(tǒng)技術(shù)規(guī)程
評論
0/150
提交評論