數(shù)據(jù)倉(cāng)庫(kù)與數(shù)據(jù)挖掘綜述課件_第1頁(yè)
數(shù)據(jù)倉(cāng)庫(kù)與數(shù)據(jù)挖掘綜述課件_第2頁(yè)
數(shù)據(jù)倉(cāng)庫(kù)與數(shù)據(jù)挖掘綜述課件_第3頁(yè)
數(shù)據(jù)倉(cāng)庫(kù)與數(shù)據(jù)挖掘綜述課件_第4頁(yè)
數(shù)據(jù)倉(cāng)庫(kù)與數(shù)據(jù)挖掘綜述課件_第5頁(yè)
已閱讀5頁(yè),還剩76頁(yè)未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、數(shù)據(jù)倉(cāng)庫(kù)與數(shù)據(jù)挖掘綜述概念、體系結(jié)構(gòu)、趨勢(shì)、應(yīng)用報(bào)告人:朱建秋 2019年6月7日數(shù)據(jù)倉(cāng)庫(kù)與數(shù)據(jù)挖掘綜述概念、體系結(jié)構(gòu)、趨勢(shì)、應(yīng)用報(bào)告人:朱建提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)及組件數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)數(shù)據(jù)倉(cāng)庫(kù)技術(shù)(與數(shù)據(jù)庫(kù)技術(shù)的區(qū)別)數(shù)據(jù)倉(cāng)庫(kù)性能數(shù)據(jù)倉(cāng)庫(kù)應(yīng)用數(shù)據(jù)挖掘應(yīng)用概述數(shù)據(jù)挖掘技術(shù)與趨勢(shì)數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目)提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)概念基本概念對(duì)數(shù)據(jù)倉(cāng)庫(kù)的一些誤解數(shù)據(jù)倉(cāng)庫(kù)概念基本概念基本概念數(shù)據(jù)倉(cāng)庫(kù)Data warehouse is a subject oriented, integrated,non-volatile and time variant collection of dat

2、a in support of managements decision Inmon,2019.Data warehouse is a set of methods, techniques,and tools that may be leveraged together to produce a vehicle that delivers data to end-users on an integrated platform Ladley,2019.Data warehouse is a process of crating, maintaining,and using a decision-

3、support infrastructure Appleton,2019Haley,2019Gardner 2019.基本概念數(shù)據(jù)倉(cāng)庫(kù)Data warehouse is a s基本概念數(shù)據(jù)倉(cāng)庫(kù)特征Inmon,2019面向主題一個(gè)主題領(lǐng)域的表來(lái)源于多個(gè)操作型應(yīng)用(如:客戶主題,來(lái)源于:定單處理;應(yīng)收帳目;應(yīng)付帳目;)典型的主題領(lǐng)域:客戶;產(chǎn)品;交易;帳目主題領(lǐng)域以一組相關(guān)的表來(lái)具體實(shí)現(xiàn)相關(guān)的表通過(guò)公共的鍵碼聯(lián)系起來(lái)(如:顧客標(biāo)識(shí)號(hào)Customer ID)每個(gè)鍵碼都有時(shí)間元素(從日期到日期;每月累積;單獨(dú)日期)主題內(nèi)數(shù)據(jù)可以存儲(chǔ)在不同介質(zhì)上(綜合級(jí),細(xì)節(jié)級(jí),多粒度)集成數(shù)據(jù)提取、凈化、轉(zhuǎn)換、裝載穩(wěn)

4、定性批處理增加,倉(cāng)庫(kù)已經(jīng)存在的數(shù)據(jù)不會(huì)改變隨時(shí)間而變化(時(shí)間維)管理決策支持基本概念數(shù)據(jù)倉(cāng)庫(kù)特征Inmon,2019面向主題基本概念Data Mart, ODSData Mart數(shù)據(jù)集市 - 小型的,面向部門或工作組級(jí)數(shù)據(jù)倉(cāng)庫(kù)。Operation Data Store操作數(shù)據(jù)存儲(chǔ) ODS是能支持企業(yè)日常的全局應(yīng)用的數(shù)據(jù)集合,是不同于DB的一種新的數(shù)據(jù)環(huán)境, 是DW 擴(kuò)展后得到的一個(gè)混合形式。四個(gè)基本特點(diǎn):面向主題的(Subject -Oriented)、集成的、可變的、 當(dāng)前或接近當(dāng)前的。基本概念Data Mart, ODSData Mart基本概念ETL, 元數(shù)據(jù),粒度,分割ETLETL(E

5、xtract/Transformation/Load)數(shù)據(jù)裝載、轉(zhuǎn)換、抽取工具。Microsoft DTS; IBM Visual Warehouse etc.元數(shù)據(jù)關(guān)于數(shù)據(jù)的數(shù)據(jù),用于構(gòu)造、維持、管理、和使用數(shù)據(jù)倉(cāng)庫(kù),在數(shù)據(jù)倉(cāng)庫(kù)中尤為重要。粒度數(shù)據(jù)倉(cāng)庫(kù)的數(shù)據(jù)單位中保存數(shù)據(jù)的細(xì)化或綜合程度的級(jí)別。細(xì)化程度越高,粒度越小。分割數(shù)據(jù)分散到各自的物理單元中去,它們能獨(dú)立地處理。基本概念ETL, 元數(shù)據(jù),粒度,分割ETL對(duì)數(shù)據(jù)倉(cāng)庫(kù)的一些誤解數(shù)據(jù)倉(cāng)庫(kù)與OLAP星型數(shù)據(jù)模型多維分析數(shù)據(jù)倉(cāng)庫(kù)不是一個(gè)虛擬的概念數(shù)據(jù)倉(cāng)庫(kù)與范式理論需要非范式化處理對(duì)數(shù)據(jù)倉(cāng)庫(kù)的一些誤解數(shù)據(jù)倉(cāng)庫(kù)與OLAP提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)體系

6、結(jié)構(gòu)及組件數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)數(shù)據(jù)倉(cāng)庫(kù)技術(shù)(與數(shù)據(jù)庫(kù)技術(shù)的區(qū)別)數(shù)據(jù)倉(cāng)庫(kù)性能數(shù)據(jù)倉(cāng)庫(kù)應(yīng)用數(shù)據(jù)挖掘應(yīng)用概述數(shù)據(jù)挖掘技術(shù)與趨勢(shì)數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目)提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)及組件體系結(jié)構(gòu)ETL工具元數(shù)據(jù)庫(kù)(Repository)及元數(shù)據(jù)管理數(shù)據(jù)訪問(wèn)和分析工具數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)及組件體系結(jié)構(gòu)體系結(jié)構(gòu) Pieter ,2019SourceDatabasesData Extraction,Transformation, loadWarehouseAdmin.ToolsExtract, Transformand LoadDataModelingToolCentralMetadataArchitec

7、tedData MartsData Accessand AnalysisEnd-UserDW ToolsCentral DataWarehouseCentral DataWarehouseMid-TierMid-TierDataMartDataMartLocal MetadataLocal MetadataLocal MetadataMetadataExchangeMDBDataCleansingToolRelationalAppl. PackageLegacyExternalRDBMSRDBMS體系結(jié)構(gòu) Pieter ,2019SourceData 帶ODS的體系結(jié)構(gòu)SourceDataba

8、sesHub - Data Extraction,Transformation, loadWarehouseAdmin.ToolsExtract, Transformand LoadDataModelingToolCentralMetadataArchitectedData MartsData Accessand AnalysisCentral Data Ware-house and ODSCentral DataWarehouseMid-TierRDBMSDataMartMid-TierRDBMSDataMartLocal MetadataLocal MetadataLocal Metada

9、taMetadataExchangeODSOLTPToolsDataCleansingToolRelationalAppl. PackageLegacyExternalMDBEnd-UserDW Tools帶ODS的體系結(jié)構(gòu)SourceHub - Data Extr現(xiàn)實(shí)環(huán)境異質(zhì)性Douglas Hackney ,2019CustomMarketingData WarehousePackagedOracle FinancialData WarehousePackagedI2 Supply ChainNon- ArchitectedData MartSubsetData MartsOracle F

10、inancialsi2 Supply ChainSiebel CRM3rd Partye-Commerce現(xiàn)實(shí)環(huán)境異質(zhì)性Douglas Hackney ,2019聯(lián)合型數(shù)據(jù)倉(cāng)庫(kù)/數(shù)據(jù)集市體系結(jié)構(gòu)Real TimeODSFederatedFinancialData WarehouseSubsetData MartsCommonStagingAreaOracle Financialsi2 Supply ChainSiebel CRM3rd PartyFederatedPackagedI2 SupplyChainData MartsAnalyticalApplicationse-CommerceRe

11、al TimeData Miningand AnalyticsReal TimeSegmentation,Classification, Qualification,Offerings, etc.FederatedMarketingData Warehouse聯(lián)合型數(shù)據(jù)倉(cāng)庫(kù)/數(shù)據(jù)集市體系結(jié)構(gòu)Real TimeFederETL tools & DW templatesData profiling & reengineering toolsDemand-driven data acquisition & analysisMetadata InterchangeFederated data ware

12、house and data mart systemsDecision engine models, rules and metricsOLAP & data mining tools, Analysis templatesAnalytic application development tools & componentsAnalytic applicationsFront- and back-office OLTPe-Business systemsExternal information providersCRM Analytics & ReportingSupply Chain Ana

13、lytics & ReportingEKP - Enterprise Knowledge Management PortalEPM Analytics & ReportingBusiness information & recommendationsInformed decisions & actionsFinancial Analytics & ReportingHR Analytics & Reporting閉環(huán)的聯(lián)合型BI體系結(jié)構(gòu)ETL tools & DW templatesData p數(shù)據(jù)倉(cāng)庫(kù)的焦點(diǎn)問(wèn)題-數(shù)據(jù)的獲得、存儲(chǔ)和使用RelationalPackageLegacyExtern

14、alsourceDataCleanToolDataStagingEnterprise DataWarehouse DatamartDatamartRDBMSROLAPRDBMSEnd-UserToolEnd-UserToolMDBEnd-UserToolEnd-UserTool數(shù)據(jù)倉(cāng)庫(kù)和集市的加載能力至關(guān)重要數(shù)據(jù)倉(cāng)庫(kù)和集市的查詢輸出能力至關(guān)重要數(shù)據(jù)倉(cāng)庫(kù)的焦點(diǎn)問(wèn)題-數(shù)據(jù)的獲得、存儲(chǔ)和使用RelationETL工具去掉操作型數(shù)據(jù)庫(kù)中的不需要的數(shù)據(jù)統(tǒng)一轉(zhuǎn)換數(shù)據(jù)的名稱和定義計(jì)算匯總數(shù)據(jù)和派生數(shù)據(jù)估計(jì)遺失數(shù)據(jù)的缺省值調(diào)節(jié)源數(shù)據(jù)的定義變化 ETL工具去掉操作型數(shù)據(jù)庫(kù)中的不需要的數(shù)據(jù)ETL工具體系結(jié)構(gòu)ETL

15、工具體系結(jié)構(gòu)元數(shù)據(jù)庫(kù)及元數(shù)據(jù)管理元數(shù)據(jù)分類:技術(shù)元數(shù)據(jù);商業(yè)元數(shù)據(jù);數(shù)據(jù)倉(cāng)庫(kù)操作型信息。-Alex Berson etc, 2019技術(shù)元數(shù)據(jù)包括為數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)人員和管理員使用的數(shù)據(jù)倉(cāng)庫(kù)數(shù)據(jù)信息,用于執(zhí)行數(shù)據(jù)倉(cāng)庫(kù)開發(fā)和管理任務(wù)。包括:數(shù)據(jù)源信息轉(zhuǎn)換描述(從操作數(shù)據(jù)庫(kù)到數(shù)據(jù)倉(cāng)庫(kù)的映射方法,以及轉(zhuǎn)換數(shù)據(jù)的算法)目標(biāo)數(shù)據(jù)的倉(cāng)庫(kù)對(duì)象和數(shù)據(jù)結(jié)構(gòu)定義數(shù)據(jù)清洗和數(shù)據(jù)增加的規(guī)則數(shù)據(jù)映射操作訪問(wèn)權(quán)限,備份歷史,存檔歷史,信息傳輸歷史,數(shù)據(jù)獲取歷史,數(shù)據(jù)訪問(wèn),等等元數(shù)據(jù)庫(kù)及元數(shù)據(jù)管理元數(shù)據(jù)分類:技術(shù)元數(shù)據(jù);商業(yè)元數(shù)據(jù);數(shù)據(jù)元數(shù)據(jù)庫(kù)及元數(shù)據(jù)管理 商業(yè)元數(shù)據(jù)給用戶易于理解的信息,包括:主題區(qū)和信息對(duì)象類型,包括查詢、

16、報(bào)表、圖像、音頻、視頻等Internet主頁(yè)支持?jǐn)?shù)據(jù)倉(cāng)庫(kù)的其它信息,例如對(duì)于信息傳輸系統(tǒng)包括預(yù)約信息、調(diào)度信息、傳送目標(biāo)的詳細(xì)描述、商業(yè)查詢對(duì)象,等數(shù)據(jù)倉(cāng)庫(kù)操作型信息例如,數(shù)據(jù)歷史(快照,版本),擁有權(quán),抽取的審計(jì)軌跡,數(shù)據(jù)用法 元數(shù)據(jù)庫(kù)及元數(shù)據(jù)管理 商業(yè)元數(shù)據(jù)元數(shù)據(jù)庫(kù)及元數(shù)據(jù)管理元數(shù)據(jù)庫(kù)(metadata repository)和工具 Martin Stardt,2000元數(shù)據(jù)庫(kù)及元數(shù)據(jù)管理元數(shù)據(jù)庫(kù)(metadata reposi數(shù)據(jù)訪問(wèn)和分析工具報(bào)表OLAP數(shù)據(jù)挖掘數(shù)據(jù)訪問(wèn)和分析工具報(bào)表提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)及組件數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)數(shù)據(jù)倉(cāng)庫(kù)技術(shù)(與數(shù)據(jù)庫(kù)技術(shù)的區(qū)別)數(shù)據(jù)倉(cāng)庫(kù)性能數(shù)據(jù)倉(cāng)庫(kù)

17、應(yīng)用數(shù)據(jù)挖掘應(yīng)用概述數(shù)據(jù)挖掘技術(shù)與趨勢(shì)數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目)提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)自上而下(Top-Down)自底而上(Bottom Up)混合的方法數(shù)據(jù)倉(cāng)庫(kù)建模數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)自上而下(Top-Down)Top-down ApproachBuild Enterprise data warehouseCommon central data modelData re-engineering performed onceMinimize redundancy and inconsistencyDetailed and history data; global data discover

18、yBuild datamarts from the Enterprise Data Warehouse (EDW)Subset of EDW relevant to departmentMostly summarized dataDirect dependency on EDW data availabilityLocal Data MartExternal DataLocal Data MartOperational DataEnterprise WarehouseTop-down ApproachBuild Enterpr自底而上設(shè)計(jì)方法創(chuàng)建部門的數(shù)據(jù)集市范圍局限于一個(gè)主題區(qū)域快速的 RO

19、I - 局部的商業(yè)需求得到滿足本部門自治 - 設(shè)計(jì)上具有靈活性對(duì)其他部門數(shù)據(jù)集市是一個(gè)好的指導(dǎo)容易復(fù)制到其他部門 需要為每個(gè)部門做數(shù)據(jù)重建有一定級(jí)別的冗余和不一致性一個(gè)切實(shí)可行的方法擴(kuò)大到企業(yè)數(shù)據(jù)倉(cāng)庫(kù)創(chuàng)建EDB作為一個(gè)長(zhǎng)期的目標(biāo)局部數(shù)據(jù)集市外部數(shù)據(jù)操作型數(shù)據(jù) (全部)操作型數(shù)據(jù)(局部)操作型數(shù)據(jù)(局部)局部數(shù)據(jù)集市企業(yè)數(shù)據(jù)倉(cāng)庫(kù)EDB自底而上設(shè)計(jì)方法創(chuàng)建部門的數(shù)據(jù)集市局部數(shù)據(jù)集市外部數(shù)據(jù)操作型數(shù)據(jù)倉(cāng)庫(kù)建模 星型模式Example of Star SchemaDateMonthYearDateCustIdCustNameCustCityCustCountryCustSales Fact Table

20、 Date Product Store Customer unit_sales dollar_sales Yen_salesMeasurementsProductNoProdNameProdDescCategoryQOHProductStoreIDCityStateCountryRegionStore數(shù)據(jù)倉(cāng)庫(kù)建模 星型模式Example of Star S數(shù)據(jù)倉(cāng)庫(kù)建模 雪片模式 DateMonthDateCustIdCustNameCustCityCustCountryCustSales Fact Table Date Product Store Customer unit_sales dol

21、lar_sales Yen_salesMeasurementsProductNoProdNameProdDescCategoryQOHProductMonthYearMonthYearYearCityStateCityCountryRegionCountryStateCountryStateStoreIDCityStoreExample of Snowflake Schema數(shù)據(jù)倉(cāng)庫(kù)建模 雪片模式 DateDateCustId操作型(OLTP)數(shù)據(jù)源 銷售庫(kù)操作型(OLTP)數(shù)據(jù)源 銷售庫(kù)星形模式時(shí)間維事實(shí)表星形模式時(shí)間維事實(shí)表多維模型事實(shí)度量(Metrics)時(shí)間維時(shí)間維的屬性多維模型事實(shí)度

22、量時(shí)間維時(shí)間維的屬性提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)及組件數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)數(shù)據(jù)倉(cāng)庫(kù)技術(shù)(與數(shù)據(jù)庫(kù)技術(shù)的區(qū)別)數(shù)據(jù)倉(cāng)庫(kù)性能數(shù)據(jù)倉(cāng)庫(kù)應(yīng)用數(shù)據(jù)挖掘應(yīng)用概述數(shù)據(jù)挖掘技術(shù)與趨勢(shì)數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目)提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)技術(shù) Inmon,2019管理大量數(shù)據(jù)能夠管理大量數(shù)據(jù)的能力能夠管理好的能力管理多介質(zhì)(層次)主存、擴(kuò)展內(nèi)存、高速緩存、DASD、光盤、縮微膠片監(jiān)視數(shù)據(jù)決定是否應(yīng)數(shù)據(jù)重組決定索引是否建立得不恰當(dāng)決定是否有太多數(shù)據(jù)溢出決定剩余的可用空間利用多種技術(shù)獲得和傳送數(shù)據(jù)批模式,聯(lián)機(jī)模式并不非常有用程序員/設(shè)計(jì)者對(duì)數(shù)據(jù)存放位置的控制(塊/頁(yè))數(shù)據(jù)的并行存儲(chǔ)/管理元數(shù)據(jù)管理數(shù)據(jù)倉(cāng)庫(kù)技術(shù) Inm

23、on,2019管理大量數(shù)據(jù)數(shù)據(jù)倉(cāng)庫(kù)技術(shù) Inmon,2019數(shù)據(jù)倉(cāng)庫(kù)語(yǔ)言接口能夠一次訪問(wèn)一組數(shù)據(jù)能夠一次訪問(wèn)一條記錄支持一個(gè)或多個(gè)索引有SQL接口數(shù)據(jù)的高效裝入高效索引的利用用位映像的方法、多級(jí)索引等數(shù)據(jù)壓縮I/O資源比CPU資源少得多,因此數(shù)據(jù)解壓縮不是主要問(wèn)題復(fù)合鍵碼(因?yàn)閿?shù)據(jù)隨時(shí)間變化)變長(zhǎng)數(shù)據(jù)加鎖管理(程序員能顯式控制鎖管理程序)單獨(dú)索引處理(查看索引就能提供某些服務(wù))快速恢復(fù)數(shù)據(jù)倉(cāng)庫(kù)技術(shù) Inmon,2019數(shù)據(jù)倉(cāng)庫(kù)語(yǔ)言接口數(shù)據(jù)倉(cāng)庫(kù)技術(shù) Inmon,2019其他技術(shù)特征,傳統(tǒng)技術(shù)起很小作用事務(wù)集成性、高速緩存、行/頁(yè)級(jí)鎖定、參照完整性、數(shù)據(jù)視圖傳統(tǒng)DBMS與數(shù)據(jù)倉(cāng)庫(kù)DBMS區(qū)別為數(shù)據(jù)

24、倉(cāng)庫(kù)和決策支持優(yōu)化設(shè)計(jì)管理更多數(shù)據(jù):10GB/100GB/TB傳統(tǒng)DBMS適合記錄級(jí)更新,提供:鎖定Lock、提交Commit、檢測(cè)點(diǎn)CheckPoint、日志處理Log、死鎖處理DeadLock、回退 Roolback.基本數(shù)據(jù)管理,如:塊管理,傳統(tǒng)DBMS需要預(yù)留空間索引區(qū)別:傳統(tǒng)DBMS限制索引數(shù)量,數(shù)據(jù)倉(cāng)庫(kù)DBMS沒有限制通用DBMS物理上優(yōu)化便于事務(wù)訪問(wèn)處理,而數(shù)據(jù)倉(cāng)庫(kù)便于DSS訪問(wèn)分析改變DBMS技術(shù)多維DBMS和數(shù)據(jù)倉(cāng)庫(kù)多維DBMS作為數(shù)據(jù)倉(cāng)庫(kù)的數(shù)據(jù)庫(kù)技術(shù),這種想法是不正確的多維DBMS(OLAP)是一種技術(shù),數(shù)據(jù)倉(cāng)庫(kù)是一種體系結(jié)構(gòu)的基礎(chǔ)雙重粒度級(jí)別(DASD/磁帶)數(shù)據(jù)倉(cāng)庫(kù)技術(shù)

25、 Inmon,2019其他技術(shù)特征,傳數(shù)據(jù)倉(cāng)庫(kù)技術(shù) Inmon,2019數(shù)據(jù)倉(cāng)庫(kù)環(huán)境中的元數(shù)據(jù)DSS分析人員和IT專業(yè)人員不同,需要元數(shù)據(jù)的幫助操作型環(huán)境和數(shù)據(jù)倉(cāng)庫(kù)環(huán)境之間的映射需要元數(shù)據(jù)數(shù)據(jù)倉(cāng)庫(kù)包含很長(zhǎng)時(shí)間的數(shù)據(jù),必須有元數(shù)據(jù)標(biāo)記數(shù)據(jù)結(jié)構(gòu)/定義上下文和內(nèi)容(上下文維)簡(jiǎn)單上下文信息(數(shù)據(jù)結(jié)構(gòu)/編碼/命名約定/度量)復(fù)雜上下文信息(產(chǎn)品定義/市場(chǎng)領(lǐng)域/定價(jià)/包裝/組織結(jié)構(gòu))外部上下文信息(經(jīng)濟(jì)預(yù)測(cè):通貨膨脹、金融、稅收/政治信息/競(jìng)爭(zhēng)信息/技術(shù)進(jìn)展)刷新數(shù)據(jù)倉(cāng)庫(kù)數(shù)據(jù)復(fù)制(觸發(fā)器)變化數(shù)據(jù)捕獲(CDC)(日志)數(shù)據(jù)倉(cāng)庫(kù)技術(shù) Inmon,2019數(shù)據(jù)倉(cāng)庫(kù)環(huán)境中的提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)及組

26、件數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)數(shù)據(jù)倉(cāng)庫(kù)技術(shù)(與數(shù)據(jù)庫(kù)技術(shù)的區(qū)別)數(shù)據(jù)倉(cāng)庫(kù)性能數(shù)據(jù)倉(cāng)庫(kù)應(yīng)用數(shù)據(jù)挖掘應(yīng)用概述數(shù)據(jù)挖掘技術(shù)與趨勢(shì)數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目)提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)性能 Inmon, 2019使用數(shù)據(jù)平臺(tái)服務(wù)管理王天佑 等譯,數(shù)據(jù)倉(cāng)庫(kù)管理, 電子工業(yè)出版社,2000年5月數(shù)據(jù)倉(cāng)庫(kù)性能 Inmon, 2019使用王天佑 等譯提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)及組件數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)數(shù)據(jù)倉(cāng)庫(kù)技術(shù)(與數(shù)據(jù)庫(kù)技術(shù)的區(qū)別)數(shù)據(jù)倉(cāng)庫(kù)性能數(shù)據(jù)倉(cāng)庫(kù)應(yīng)用數(shù)據(jù)挖掘應(yīng)用概述數(shù)據(jù)挖掘技術(shù)與趨勢(shì)數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目)提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)應(yīng)用 DW用戶數(shù)的調(diào)查“DW系統(tǒng)的用戶在100-500以內(nèi)或以上是未來(lái)一段時(shí)期內(nèi)

27、的主要部分“DW用戶的調(diào)查最近一年Meta Group Survey調(diào)查對(duì)象:3000+ 用戶或意向用戶數(shù)據(jù)倉(cāng)庫(kù)應(yīng)用 DW用戶數(shù)的調(diào)查“DW系統(tǒng)的用戶DW用戶的DW數(shù)據(jù)規(guī)模的調(diào)查DW規(guī)模的調(diào)查最近一年Meta Group Survey調(diào)查對(duì)象:3000+ 用戶或意向用戶DW數(shù)據(jù)規(guī)模的調(diào)查DW規(guī)模的調(diào)查How Much?$3-6m for mid-size company, less if smaller, more if larger$10m+ for large organizations, large data sets10-50+% annual maintenance costs33%

28、 Hardware / 33% Software / 33% ServicesHow Much?$3-6m for mid-size coHow Long?2-4 years for 80/20 of full system for mid-size company6-12 months for initial iteration3-6 months for subsequent iterationsHow Long?2-4 years for 80/20 oHow Risky?For EDW Projects, 20% (Meta) to 70% (OTR, DWN) failHigh fa

29、ilure rate for non-business driven initiativesVery few systems meet the expectations of the businessFailure not due to technology, due to “soft” issuesMassive upside to successful projects (100% - 2000+% ROI)99% politics - 1% technologyHow Risky?For EDW Projects, 20參考文獻(xiàn)Inmon,W.H.,” Building the Data

30、 Warehouse” ,Johm Wiley and Sons,2019.Ladley,John,”O(jiān)perational Data Stores:Building an Effective Strategy”,Data warehouse:Pratical Advice form the Experts,Prentice Hall,Englewood Cliffs,NJ,2019.Gardmer,Stephen R., “Building the Data warehouse”,Communication of ACM, September 2019, Volume 41, Numver

31、9, 52-60.Douglas Hackney , Http:/ egltd, DW101: A Practical Overview, 2019 Pieter R. Mimno, “The Big Picture - How Brio Competes in the Data Warehousing Market”, Presentation to Brio Technology - August 4, 2019.Alex Berson, Stephen Smith, Kurt Therling, “Building Data Mining Application for CRM”, Mc

32、Graw-Hill, 2019Martin Stardt, Anca Vaduva, Thomas Vetterli, “The Role of Meta for Data Warehouse”, 2000W.H.Inmon, Ken Rudin, Christopher K. Buss, Ryan Sousa, “Data Warehouse Performance”, John Wiley & Sons , 2019參考文獻(xiàn)Inmon,W.H.,” Building the 提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)及組件數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)數(shù)據(jù)倉(cāng)庫(kù)技術(shù)(與數(shù)據(jù)庫(kù)技術(shù)的區(qū)別)數(shù)據(jù)倉(cāng)庫(kù)性能數(shù)據(jù)倉(cāng)庫(kù)應(yīng)用數(shù)據(jù)挖掘

33、應(yīng)用概述數(shù)據(jù)挖掘技術(shù)與趨勢(shì)數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目)提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)挖掘應(yīng)用綜述數(shù)據(jù)挖掘應(yīng)用概述數(shù)據(jù)挖掘技術(shù)與趨勢(shì)數(shù)據(jù)挖掘應(yīng)用平臺(tái)數(shù)據(jù)挖掘應(yīng)用綜述數(shù)據(jù)挖掘應(yīng)用概述數(shù)據(jù)挖掘應(yīng)用概述應(yīng)用比例Data Mining UpsidesData Mining DownsidesData Mining UseData Mining Industry and ApplicationData Mining Costs數(shù)據(jù)挖掘應(yīng)用概述應(yīng)用比例應(yīng)用比例Clustering 22%Direct Marketing 14% Cross-Sell Models 12% kdnuggets 2019/6/11 N

34、ews應(yīng)用比例Clustering 22%Discovery of previously unknown relationships, trends, anomalies, etc. Powerful competitive weaponAutomation of repetitive analysisPredictive capabilitiesData Mining UpsidesDiscovery of previously unknowKnowledge discovery technology immatureLong learning and tuning cycles for s

35、ome technologies“Black box” technology minimizes confidenceVLDB (Very Large Data Base) requirementsData Mining DownsidesKnowledge discovery technologyData Mining UsesDiscover anomalies, outliers and exceptions in process dataDiscover behavior and predict outcomes of customer relationshipsChurn manag

36、ementTarget marketing (market of one)Promotion managementFraud detectionPattern ID & matching (dark programs, science)Data Mining UsesDiscover anomaData Mining Industry and ApplicationsFrom research prototypes to data mining products, languages, and standardsIBM Intelligent Miner, SAS Enterprise Min

37、er, SGI MineSet, Clementine, MS/SQLServer 2000, DBMiner, BlueMartini, MineIt, DigiMine, etc.A few data mining languages and standards (esp. MS OLEDB for Data Mining).Application achievements in many domainsMarket analysis, trend analysis, fraud detection, outlier analysis, Web mining, etc.Data Minin

38、g Industry and AppliData Mining CostsDesktop tools: $500 and up (MSFT coming at low price point)Server / MF based: $20,000 to $700,000+Must also add cost of extensive consulting for high end toolsDont forget long training and learning curve timeOngoing process, not task automation softwareData Minin

39、g CostsDesktop tools提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)倉(cāng)庫(kù)體系結(jié)構(gòu)及組件數(shù)據(jù)倉(cāng)庫(kù)設(shè)計(jì)數(shù)據(jù)倉(cāng)庫(kù)技術(shù)(與數(shù)據(jù)庫(kù)技術(shù)的區(qū)別)數(shù)據(jù)倉(cāng)庫(kù)性能數(shù)據(jù)倉(cāng)庫(kù)應(yīng)用數(shù)據(jù)挖掘應(yīng)用概述數(shù)據(jù)挖掘技術(shù)與趨勢(shì)數(shù)據(jù)挖掘應(yīng)用平臺(tái)(科委申請(qǐng)項(xiàng)目)提綱數(shù)據(jù)倉(cāng)庫(kù)概念數(shù)據(jù)挖掘趨勢(shì)歷史回顧多學(xué)科交叉數(shù)據(jù)挖掘從多個(gè)角度分類最近十年的研究進(jìn)展數(shù)據(jù)挖掘的趨勢(shì)數(shù)據(jù)挖掘與標(biāo)準(zhǔn)化進(jìn)程數(shù)據(jù)挖掘趨勢(shì)歷史回顧歷史回顧1989 IJCAI Workshop on Knowledge Discovery in Databases Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. F

40、rawley, 1991)1991-1994 Workshops on Knowledge Discovery in DatabasesAdvances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 2019)2019-2019 International Conferences on Knowledge Discovery in Databases and Data Mining (KDD95-98)Journal of Data Mi

41、ning and Knowledge Discovery (2019)2019 ACM SIGKDD, SIGKDD2019-2019 conferences, and SIGKDD ExplorationsMore conferences on data miningPAKDD, PKDD, SIAM-Data Mining, (IEEE) ICDM, DaWaK, SPIE-DM, etc.歷史回顧1989 IJCAI Workshop on KnoData Mining: Confluence of Multiple Disciplines Data MiningDatabase Tec

42、hnologyStatisticsOtherDisciplinesInformationScienceMachineLearning (AI)VisualizationData Mining: Confluence of MulA Multi-Dimensional View of Data MiningDatabases to be minedRelational, transactional, object-relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW, etc

43、.Knowledge to be minedCharacterization, discrimination, association, classification, clustering, trend, deviation and outlier analysis, etc.Techniques utilizedDatabase-oriented, data warehouse (OLAP), machine learning, statistics, visualization, neural network, etc.Applications adaptedRetail, teleco

44、mmunication, banking, fraud analysis, DNA mining, stock market analysis, Web mining, We analysis, etc.A Multi-Dimensional View of DaResearch Progress in the Last DecadeMulti-dimensional data analysis: Data warehouse and OLAP (on-line analytical processing)Association, correlation, and causality anal

45、ysisClassification: scalability and new approachesClustering and outlier analysisSequential patterns and time-series analysisSimilarity analysis: curves, trends, images, texts, etc.Text mining, Web mining and We analysisSpatial, multimedia, scientific data analysisData preprocessing and database com

46、pressionData visualization and visual data miningMany others, e.g., collaborative filteringResearch Progress in the Last Research Directions Han J. W. , 2019Web miningTowards integrated data mining environments and tools“Vertical” (or application-specific) data mining Invisible data miningTowards in

47、telligent, efficient, and scalable data mining methodsResearch Directions Han J. Towards Integrated Data Mining Environments and ToolsOLAP Mining: Integration of Data Warehousing and Data MiningQuerying and Mining: An Integrated Information Analysis EnvironmentBasic Mining Operations and Mining Quer

48、y Optimization“Vertical” (or application-specific) data mining Invisible data miningTowards Integrated Data MiningQuerying and Mining: An Integrated Information Analysis EnvironmentData mining as a component of DBMS, data warehouse, or Web information systemIntegrated information processing environm

49、entMS/SQLServer-2000 (Analysis service)IBM IntelligentMiner on DB2SAS EnterpriseMiner: data warehousing + miningQuery-based miningQuerying database/DW/Web knowledgeEfficiency and flexibility: preprocessing, on-line processing, optimization, integration, etc.Querying and Mining: An Integr“Vertical” D

50、ata MiningGeneric data mining tools? Too simple to match domain-specific, sophisticated applicationsExpert knowledge and business logic represent many years of work in their own fields!Data mining + business logic + domain expertsA multi-dimensional view of data minersComplexity of data: Web, sequen

51、ce, spatial, multimedia, Complexity of domains: DNA, astronomy, market, telecom, Domain-specific data mining toolsProvide concrete, killer solution to specific problemsFeedback to build more powerful tools“Vertical” Data MiningGeneric Invisible Data MiningBuild mining functions into daily informatio

52、n servicesWeb search engine (link analysis, authoritative pages, user profiles)adaptive web sites, etc.Improvement of query processing: history + dataMaking service smart and efficientBenefits from/to data mining researchData mining research has produced many scalable, efficient, novel mining soluti

53、onsApplications feed new challenge problems to researchInvisible Data MiningBuild minTowards Intelligent Tools for Data MiningIntegration paves the way to intelligent miningSmart interface brings intelligence Easy to use, understand and manipulateOne picture may worth 1,000 wordsVisual and audio dat

54、a miningHuman-Centered Data MiningTowards self-tuning, self-managing, self-triggering data miningTowards Intelligent Tools for Integrated Mining: A Booster for Intelligent MiningIntegration paves the way to intelligent miningData mining integrates with DBMS, DW, WebDB, etcIntegration inherits the power of up-to-date information technology: querying, MD analysis, similarity search, etc.Mining can be view

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論