版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
09二月2025DataWarehousingandOLAPTechnology1數(shù)據(jù)倉庫和OLAP技術什么是數(shù)據(jù)倉庫(Whatisadatawarehouse)?多維數(shù)據(jù)模型(Amulti-dimensionaldatamodel)數(shù)據(jù)倉庫體系結(jié)構(Datawarehousearchitecture)數(shù)據(jù)倉庫實現(xiàn)(Datawarehouseimplementation)FurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining09二月2025DataWarehousingandOLAPTechnology2數(shù)據(jù)庫的定義傳統(tǒng)的數(shù)據(jù)庫技術是以單一的數(shù)據(jù)資源為中心,同時進行從事務處理,批處理到?jīng)Q策分析的各類處理;數(shù)據(jù)庫主要是為自動化,精簡工作任務和高速數(shù)據(jù)采集服務的。它的運行是事務驅(qū)動,面向應用的,數(shù)據(jù)庫的根本任務是完成數(shù)據(jù)操作,即及時安全地將當前事務所產(chǎn)生的記錄保存下來。09二月2025DataWarehousingandOLAPTechnology3兩種不同的數(shù)據(jù)處理需求計算機系統(tǒng)中存在著兩類不同的數(shù)據(jù)處理需求,即:操作型處理(事務處理):主要是對一個或一組記錄的查詢和修改,這時候人們關心的是響應時間、數(shù)據(jù)的安全性和完整性;分析型處理(信息型處理):用于管理人員的決策分析,如DDS(decisionsupportsystem)、多維分析等。
09二月2025DataWarehousingandOLAPTechnology4為什么要建立數(shù)據(jù)倉庫?數(shù)據(jù)DATA知識KNOWLEDGE決定DECISIONSPatternsTrendsFactsRelationsModelsAssociationsSequencesTargetMarketsFundsallocationTradingoptionsWheretoadvertiseCatalogmailinglistSalesgeography財經(jīng)的Financial經(jīng)濟的Economic政府Government銷售分數(shù)Point-of-Sale人口統(tǒng)計學Demographic生活方式Lifestyle
痛苦:太多數(shù)據(jù),無法作出正確判斷!09二月2025DataWarehousingandOLAPTechnology5WhatisDataWarehouse?"數(shù)據(jù)倉庫是在企業(yè)管理和決策中面向主題的,集成的,與時間相關的和不可修改的數(shù)據(jù)集合“Adatawarehouseisa
subject-oriented,integrated,time-variant,andnonvolatile
collectionofdatainsupportofmanagement’sdecision-makingprocess.”—W.H.InmonDatawarehousing:Theprocessofconstructingandusingdatawarehouses09二月2025DataWarehousingandOLAPTechnology6DataWarehouse—Subject-OrientedOrganizedaroundmajorsubjects,suchascustomer,product,sales.Focusingonthemodelingandanalysisofdatafordecisionmakers,notondailyoperationsortransactionprocessing.Provideasimpleandconciseviewaroundparticularsubjectissuesbyexcludingdatathatarenotusefulinthedecisionsupportprocess.09二月2025DataWarehousingandOLAPTechnology7面向應用舉例采購子系統(tǒng):訂單(訂單號,供應商號,總金額,日期)訂單細則(訂單號,商品號,類別,單價,數(shù)量)供應商(供應商號,供應商名,地址,電話)銷售子系統(tǒng):顧客(顧客號,姓名,性別,年齡,地址,電話)銷售(員工號,顧客號,商品號,數(shù)量,單價日期)庫存管理子系統(tǒng):領料單(領料單號,領料人,商品號,數(shù)量,日期)進料單(進料單號,訂單號,進料人,收料人,日期)庫存(商品號,庫房號,庫存量,日期)庫房(庫房號,倉庫保管員,地點,庫存商品描述)人事管理子系統(tǒng):員工(員工號,姓名,性別,年齡,部門號)部門(部門號,部門名稱,部門主管,電話)面向主題舉例:商品:商品固有信息:商品號,商品名,類別,顏色等商品采購信息:商品號,供應商號,供應價,供應日期,供應量等商品銷售信息:商品號,顧客號,售價,銷售日期,銷售量等商品庫存信息:商品號,庫房號,日期,庫存量等供應商:供應商固有信息:供應商號,供應商名,地址,電話等供應商品信息:供應商號,商品號,供應價,供應日期,供應量等顧客:顧客固有信息:顧客號,顧客名,性別,年齡,住址,電話等顧客購物信息:顧客號,商品號,售價,購買日期,購買量等09二月2025DataWarehousingandOLAPTechnology8DataWarehouse—IntegratedConstructedbyintegratingmultiple,heterogeneousdatasourcesrelationaldatabases,flatfiles,on-linetransactionrecordsDatacleaninganddataintegrationtechniquesareapplied.Ensureconsistencyinnamingconventions,encodingstructures,attributemeasures,etc.amongdifferentdatasourcesE.g.,Hotelprice:currency,tax,breakfastcovered,etc.Whendataismovedtothewarehouse,itisconverted.09二月2025DataWarehousingandOLAPTechnology9DataWarehouse—TimeVariantThetimehorizonforthedatawarehouseissignificantlylongerthanthatofoperationalsystems.Operationaldatabase:currentvaluedata.Datawarehousedata:provideinformationfromahistoricalperspective(e.g.,past5-10years)EverykeystructureinthedatawarehouseContainsanelementoftime,explicitlyorimplicitlyButthekeyofoperationaldatamayormaynotcontain“timeelement”.09二月2025DataWarehousingandOLAPTechnology10DataWarehouse—Non-VolatileAphysicallyseparatestoreofdatatransformedfromtheoperationalenvironment.Operationalupdateofdatadoesnotoccurinthedatawarehouseenvironment.Doesnotrequiretransactionprocessing,recovery,andconcurrencycontrolmechanismsRequiresonlytwooperationsindataaccessing:initialloadingofdataandaccessofdata.09二月2025DataWarehousingandOLAPTechnology11DataWarehousevs.HeterogeneousDBMSTraditionalheterogeneousDBintegration:Buildwrappers/mediatorsontopofheterogeneousdatabasesQuerydrivenapproachWhenaqueryisposedtoaclientsite,ameta-dictionaryisusedtotranslatethequeryintoqueriesappropriateforindividualheterogeneoussitesinvolved,andtheresultsareintegratedintoaglobalanswersetComplexinformationfiltering,competeforresourcesDatawarehouse:update-driven,highperformanceInformationfromheterogeneoussourcesisintegratedinadvanceandstoredinwarehousesfordirectqueryandanalysis09二月2025DataWarehousingandOLAPTechnology12DataWarehousevs.OperationalDBMSOLTP(on-linetransactionprocessing)MajortaskoftraditionalrelationalDBMSDay-to-dayoperations:purchasing,inventory,banking,manufacturing,payroll,registration,accounting,etc.OLAP(on-lineanalyticalprocessing)MajortaskofdatawarehousesystemDataanalysisanddecisionmakingDistinctfeatures(OLTPvs.OLAP):Userandsystemorientation:customervs.marketDatacontents:current,detailedvs.historical,consolidatedDatabasedesign:ER+applicationvs.star+subjectView:current,localvs.evolutionary,integratedAccesspatterns:updatevs.read-onlybutcomplexqueries09二月2025DataWarehousingandOLAPTechnology13OLTPvs.OLAP09二月2025DataWarehousingandOLAPTechnology14WhySeparateDataWarehouse?HighperformanceforbothsystemsDBMS—tunedforOLTP:accessmethods,indexing,concurrencycontrol,recoveryWarehouse—tunedforOLAP:complexOLAPqueries,multidimensionalview,consolidation.Differentfunctionsanddifferentdata:missingdata:DecisionsupportrequireshistoricaldatawhichoperationalDBsdonottypicallymaintaindataconsolidation:DSrequiresconsolidation(aggregation,summarization)ofdatafromheterogeneoussourcesdataquality:differentsourcestypicallyuseinconsistentdatarepresentations,codesandformatswhichhavetobereconciled09二月2025DataWarehousingandOLAPTechnology15DataWarehousingandOLAPTechnologyWhatisadatawarehouse?Amulti-dimensionaldatamodelDatawarehousearchitectureDatawarehouseimplementationFurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining09二月2025DataWarehousingandOLAPTechnology16FromTablesandSpreadsheetstoDataCubesAdatawarehouseisbasedonamultidimensionaldatamodelwhichviewsdataintheformofadatacubeAdatacube,suchassales,allowsdatatobemodeledandviewedinmultipledimensionsDimensiontables,suchasitem(item_name,brand,type),ortime(day,week,month,quarter,year)Facttablecontainsmeasures(suchasdollars_sold)andkeystoeachoftherelateddimensiontablesIndatawarehousingliterature,ann-Dbasecubeiscalledabasecuboid.Thetopmost0-Dcuboid,whichholdsthehighest-levelofsummarization,iscalledtheapexcuboid.Thelatticeofcuboidsformsadatacube.09二月2025DataWarehousingandOLAPTechnology17Cube:ALatticeofCuboidsalltimeitemlocationsuppliertime,itemtime,locationtime,supplieritem,locationitem,supplierlocation,suppliertime,item,locationtime,item,suppliertime,location,supplieritem,location,suppliertime,item,location,supplier0-D(apex)cuboid1-Dcuboids2-Dcuboids3-Dcuboids4-D(base)cuboid09二月2025DataWarehousingandOLAPTechnology18ConceptualModelingofDataWarehousesModelingdatawarehouses:dimensions&measuresStarschema:AfacttableinthemiddleconnectedtoasetofdimensiontablesSnowflakeschema:Arefinementofstarschemawheresomedimensionalhierarchyisnormalizedintoasetofsmallerdimensiontables,formingashapesimilartosnowflakeFactconstellations:Multiplefacttablessharedimensiontables,viewedasacollectionofstars,thereforecalledgalaxyschemaorfactconstellation
09二月2025DataWarehousingandOLAPTechnology19ExampleofStarSchema
time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFactTable
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranch09二月2025DataWarehousingandOLAPTechnology20ExampleofSnowflakeSchematime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcity_keylocationSalesFactTable
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasuresitem_keyitem_namebrandtypesupplier_keyitembranch_keybranch_namebranch_typebranchsupplier_keysupplier_typesuppliercity_keycityprovince_or_streetcountrycity09二月2025DataWarehousingandOLAPTechnology21ExampleofFactConstellationtime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFactTabletime_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranchShippingFactTabletime_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shippedshipper_keyshipper_namelocation_keyshipper_typeshipper09二月2025DataWarehousingandOLAPTechnology22ADataMiningQueryLanguage,DMQL:LanguagePrimitivesCubeDefinition(FactTable)definecube<cube_name>[<dimension_list>]:<measure_list>DimensionDefinition(DimensionTable)definedimension<dimension_name>as(<attribute_or_subdimension_list>)SpecialCase(SharedDimensionTables)Firsttimeas“cubedefinition”definedimension<dimension_name>as<dimension_name_first_time>incube<cube_name_first_time>09二月2025DataWarehousingandOLAPTechnology23DefiningaStarSchemainDMQLdefinecubesales_star[time,item,branch,location]:dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier_type)definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city,province_or_state,country)09二月2025DataWarehousingandOLAPTechnology24DefiningaSnowflakeSchemainDMQLdefinecubesales_snowflake[time,item,branch,location]:dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier(supplier_key,supplier_type))definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city(city_key,province_or_state,country))09二月2025DataWarehousingandOLAPTechnology25DefiningaFactConstellationinDMQLdefinecubesales[time,item,branch,location]:dollars_sold=sum(sales_in_dollars),avg_sales=avg(sales_in_dollars),units_sold=count(*)definedimensiontimeas(time_key,day,day_of_week,month,quarter,year)definedimensionitemas(item_key,item_name,brand,type,supplier_type)definedimensionbranchas(branch_key,branch_name,branch_type)definedimensionlocationas(location_key,street,city,province_or_state,country)definecubeshipping[time,item,shipper,from_location,to_location]:dollar_cost=sum(cost_in_dollars),unit_shipped=count(*)definedimensiontimeastimeincubesalesdefinedimensionitemasitemincubesalesdefinedimensionshipperas(shipper_key,shipper_name,locationaslocationincubesales,shipper_type)definedimensionfrom_locationaslocationincubesalesdefinedimensionto_locationaslocationincubesales09二月2025DataWarehousingandOLAPTechnology26Measures:ThreeCategoriesdistributive:iftheresultderivedbyapplyingthefunctiontonaggregatevaluesisthesameasthatderivedbyapplyingthefunctiononallthedatawithoutpartitioning.E.g.,count(),sum(),min(),max().algebraic:
ifitcanbecomputedbyanalgebraicfunctionwithMarguments(whereMisaboundedinteger),eachofwhichisobtainedbyapplyingadistributiveaggregatefunction.E.g.,
avg(),min_N(),standard_deviation().holistic:ifthereisnoconstantboundonthestoragesizeneededtodescribeasubaggregate.
E.g.,median(),mode(),rank().09二月2025DataWarehousingandOLAPTechnology27AConceptHierarchy:Dimension(location)allEuropeNorth_AmericaMexicoCanadaSpainGermanyVancouverM.WindL.Chan..................allregionofficecountryTorontoFrankfurtcity09二月2025DataWarehousingandOLAPTechnology28ViewofWarehousesandHierarchiesSpecificationofhierarchiesSchemahierarchyday<{month<quarter;week}<yearSet_groupinghierarchy{1..10}<inexpensive09二月2025DataWarehousingandOLAPTechnology29MultidimensionalDataSalesvolumeasafunctionofproduct,month,andregionProductRegionMonthDimensions:Product,Location,TimeHierarchicalsummarizationpathsIndustryRegionYearCategoryCountryQuarterProductCityMonthWeekOfficeDay09二月2025DataWarehousingandOLAPTechnology30ASampleDataCubeTotalannualsalesofTVinU.S.A.DateProductCountryAll,All,Allsumsum
TVVCRPC1Qtr2Qtr3Qtr4QtrU.S.ACanadaMexicosum09二月2025DataWarehousingandOLAPTechnology31CuboidsCorrespondingtotheCubeallproductdatecountryproduct,dateproduct,countrydate,countryproduct,date,country0-D(apex)cuboid1-Dcuboids2-Dcuboids3-D(base)cuboid09二月2025DataWarehousingandOLAPTechnology32BrowsingaDataCubeVisualizationOLAPcapabilitiesInteractivemanipulation09二月2025DataWarehousingandOLAPTechnology33TypicalOLAPOperationsRollup(drill-up):summarizedatabyclimbinguphierarchyorbydimensionreductionDrilldown(rolldown):reverseofroll-upfromhigherlevelsummarytolowerlevelsummaryordetaileddata,orintroducingnewdimensionsSliceanddice:
projectandselect
Pivot(rotate):
reorientthecube,visualization,3Dtoseriesof2Dplanes.Otheroperationsdrillacross:involving(across)morethanonefacttabledrillthrough:throughthebottomlevelofthecubetoitsback-endrelationaltables(usingSQL)09二月2025DataWarehousingandOLAPTechnology34AStar-NetQueryModel
ShippingMethodAIR-EXPRESSTRUCKORDERCustomerOrdersCONTRACTSCustomerProductPRODUCTGROUPPRODUCTLINEPRODUCTITEMSALESPERSONDISTRICTDIVISIONOrganizationPromotionCITYCOUNTRYREGIONLocationDAILYQTRLYANNUALYTimeEachcircleiscalledafootprint09二月2025DataWarehousingandOLAPTechnology35DataWarehousingandOLAPTechnologyforDataMiningWhatisadatawarehouse?Amulti-dimensionaldatamodelDatawarehousearchitectureDatawarehouseimplementationFurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining09二月2025DataWarehousingandOLAPTechnology36DesignofaDataWarehouse:ABusinessAnalysisFrameworkFourviewsregardingthedesignofadatawarehouseTop-downviewallowsselectionoftherelevantinformationnecessaryforthedatawarehouseDatasourceviewexposestheinformationbeingcaptured,stored,andmanagedbyoperationalsystemsDatawarehouseviewconsistsoffacttablesanddimensiontablesBusinessqueryview
seestheperspectivesofdatainthewarehousefromtheviewofend-user09二月2025DataWarehousingandOLAPTechnology37DataWarehouseDesignProcessTop-down,bottom-upapproachesoracombinationofbothTop-down:Startswithoveralldesignandplanning(mature)Bottom-up:Startswithexperimentsandprototypes(rapid)FromsoftwareengineeringpointofviewWaterfall:structuredandsystematicanalysisateachstepbeforeproceedingtothenextSpiral:rapidgenerationofincreasinglyfunctionalsystems,shortturnaroundtime,quickturnaroundTypicaldatawarehousedesignprocessChooseabusinessprocesstomodel,e.g.,orders,invoices,etc.Choosethegrain(atomiclevelofdata)ofthebusinessprocessChoosethedimensionsthatwillapplytoeachfacttablerecordChoosethemeasurethatwillpopulateeachfacttablerecord09二月2025DataWarehousingandOLAPTechnology38Multi-TieredArchitectureDataWarehouseExtractTransformLoadRefreshOLAPEngineAnalysisQueryReportsDataminingMonitor&IntegratorMetadataDataSourcesFront-EndToolsServeDataMartsOperational
DBsothersourcesDataStorageOLAPServer09二月2025DataWarehousingandOLAPTechnology39SourceDatabasesDataExtraction,Transformation,loadWarehouseAdmin.ToolsExtract,TransformandLoadDataModelingToolCentralMetadataArchitectedDataMartsDataAccessandAnalysisEnd-UserDWToolsCentralDataWarehouseCentralDataWarehouseMid-TierMid-TierDataMartDataMartLocalMetadataLocalMetadataLocalMetadataMetadataExchangeMDBDataCleansingToolRelationalAppl.PackageLegacyExternalRDBMSRDBMS體系結(jié)構
[Pieter,1998]數(shù)據(jù)倉庫的焦點問題-數(shù)據(jù)的獲得、存儲和使用
RelationalPackageLegacyExternalsourceDataCleanToolDataStagingEnterpriseDataWarehouseDatamartDatamartRDBMSROLAPRDBMSEnd-UserToolEnd-UserToolMDBEnd-UserToolEnd-UserTool數(shù)據(jù)倉庫和集市的加載能力至關重要數(shù)據(jù)倉庫和集市的查詢輸出能力至關重要ETL工具去掉操作型數(shù)據(jù)庫中的不需要的數(shù)據(jù)統(tǒng)一轉(zhuǎn)換數(shù)據(jù)的名稱和定義計算匯總數(shù)據(jù)和派生數(shù)據(jù)估計遺失數(shù)據(jù)的缺省值調(diào)節(jié)源數(shù)據(jù)的定義變化09二月2025DataWarehousingandOLAPTechnology42ThreeDataWarehouseModelsEnterprisewarehousecollectsalloftheinformationaboutsubjectsspanningtheentireorganizationDataMartasubsetofcorporate-widedatathatisofvaluetoaspecificgroupsofusers.Itsscopeisconfinedtospecific,selectedgroups,suchasmarketingdatamartIndependentvs.dependent(directlyfromwarehouse)datamartVirtualwarehouseAsetofviewsoveroperationaldatabasesOnlysomeofthepossiblesummaryviewsmaybematerialized09二月2025DataWarehousingandOLAPTechnology43DataWarehouseDevelopment:ARecommendedApproachDefineahigh-levelcorporatedatamodelDataMartDataMartDistributedDataMartsMulti-TierDataWarehouseEnterpriseDataWarehouseModelrefinementModelrefinement09二月2025DataWarehousingandOLAPTechnology44OLAPServerArchitecturesRelationalOLAP(ROLAP)
Userelationalorextended-relationalDBMStostoreandmanagewarehousedataandOLAPmiddlewaretosupportmissingpiecesIncludeoptimizationofDBMSbackend,implementationofaggregationnavigationlogic,andadditionaltoolsandservicesgreaterscalabilityMultidimensionalOLAP(MOLAP)
Array-basedmultidimensionalstorageengine(sparsematrixtechniques)fastindexingtopre-computedsummarizeddataHybridOLAP(HOLAP)Userflexibility,e.g.,lowlevel:relational,high-level:arraySpecializedSQLserversspecializedsupportforSQLqueriesoverstar/snowflakeschemas09二月2025DataWarehousingandOLAPTechnology45DataWarehousingandOLAPTechnologyforDataMiningWhatisadatawarehouse?Amulti-dimensionaldatamodelDatawarehousearchitectureDatawarehouseimplementationFurtherdevelopmentofdatacubetechnologyFromdatawarehousingtodatamining09二月2025DataWarehousingandOLAPTechnology46EfficientDataCubeComputationDatacubecanbeviewedasalatticeofcuboidsThebottom-mostcuboidisthebasecuboidThetop-mostcuboid(apex)containsonlyonecellHowmanycuboidsinann-dimensionalcubewithLlevels?MaterializationofdatacubeMaterializeevery(cuboid)(fullmaterialization),none(nomaterialization),orsome(partialmaterialization)SelectionofwhichcuboidstomaterializeBasedonsize,sharing,accessfrequency,etc.09二月2025DataWarehousingandOLAPTechnology47CubeOperationCubedefinitionandcomputationinDMQLdefinecubesales[item,city,year]:sum(sales_in_dollars)computecubesalesTransformitintoaSQL-likelanguage(withanewoperatorcubeby,introducedbyGrayetal.’96)SELECTitem,city,year,SUM(amount)FROMSALESCUBEBYitem,city,yearNeedcomputethefollowingGroup-Bys
(date,product,customer),(date,product),(date,customer),(product,customer),(date),(product),(customer)()(item)(city)()(year)(city,item)(city,year)(item,year)(city,item,year)09二月2025DataWarehousingandOLAPTechnology48CubeComputation:ROLAP-BasedMethodEfficientcubecomputationmethodsROLAP-basedcubingalgorithms(Agarwaletal’96)Array-basedcubingalgorithm(Zhaoetal’97)Bottom-upcomputationmethod(Bayer&Ramarkrishnan’99)ROLAP-basedcubingalgorithmsSorting,hashing,andgroupingoperationsareappliedtothedimensionattributesinordertoreorderandclusterrelatedtuplesGroupingisperformedonsomesubaggregatesasa“partialgroupingstep”Aggregatesmaybecomputedfrompreviouslycomputedaggregates,ratherthanfromthebasefacttable09二月2025DataWarehousingandOLAPTechnology49CubeComputation:ROLAP-BasedMethod(2)ThisisnotinthetextbookbutinaresearchpaperHash/sortbasedmethods(Agarwalet.al.
VLDB’96)Smallest-parent:computingacuboidfromthesmallestcubodpreviouslycomputedcuboid.Cache-results:cachingresultsofacuboidfromwhichothercuboidsarecomputedtoreducediskI/OsAmortize-scans:computingasmanyaspossiblecuboidsatthesametimetoamortizediskreadsShare-sorts:sharingsortingcostscrossmultiplecuboidswhensort-basedmethodisusedShare-partitions:sharingthepartitioningcostcrossmultiplecuboidswhenhash-basedalgorithmsareused09二月2025DataWarehousingandOLAPTechnology50Multi-wayArrayAggregationforCubeComputationPartitionarraysintochunks(asmallsubcubewhichfitsinmemory).Compressedsparsearrayaddressing:(chunk_id,offset)Computeaggregatesin“multiway”byvisitingcubecellsintheorderwhichminimizesthe#oftimestovisiteachcell,andreducesmemoryaccessandstoragecost.Whatisthebesttraversingordertodomulti-wayaggregation?AB29303132123459131415166463626148474645a1a0c3c2c1c0b3b2b1b0a2a3CB44285640245236206009二月2025DataWarehousingandOLAPTechnology51Multi-wayArrayAggregationforCubeComputationAB29303132123459131415166463626148474645a1a0c3c2c1c0b3b2b1b0a2a3C442856402452362060B09二月2025DataWarehousingandOLAPTechnology52Multi-wayArrayAggregationforCubeComputationAB29303132123459131415166463626148474645a1a0c3c2c1c0b3b2b1b0a2a3C442856402452362060B09二月2025DataWarehousingandOLAPTechnology53Multi-WayArrayAggregationforCubeComputation(Cont.)Method:theplanesshouldbesortedandcomputedaccordingtotheirsizeinascendingorder.SeethedetailsofExample2.12(pp.75-78)Idea:keepthesmallestplaneinthemainmemory,fetchandcomputeonlyonechunkatatimeforthelargestplaneLimitationofthemethod:computingwellonlyforasmallnumberofdimensionsIftherearealargenumberofdimensions,“bottom-upcomputation”andicebergcubecomputationmethodscanbeexplored09二月2025DataWarehousingandOLAPTechnology54IndexingOLAPData:BitmapIndexIndexonaparticularcolumnEachvalueinthecolumnhasabitvector:bit-opisfastThelengthofthebitvector:#ofrecordsinthebasetableThei-thbitissetifthei-throwofthebasetablehasthevaluefortheindexedcolumnnotsuitableforhighcardinalitydomainsBasetableIndexonRegionIndexonType09二月2025DataWarehousingandOLAPTechnology55IndexingOLAPData:JoinIndicesJoinindex:JI(R-id,S-id)whereR(R-id,…)S(S-id,…)TraditionalindicesmapthevaluestoalistofrecordidsItmaterializesrelationaljoininJIfileandspeedsuprelationaljoin—arathercostlyoperationIndatawarehouses,joinindexrelatesthevaluesofthedimensionsofastartschematorowsinthefacttable.E.g.facttable:SalesandtwodimensionscityandproductAjoinindexoncitymaintainsforeachdistinctcityalistofR-IDsofthetuplesrecordingtheSalesinthecityJoinindicescanspanmultipledimensions09二月2025DataWarehousingandOLAPTechnology56EfficientProcessingOLAPQueriesDeterminewhichoperationsshouldbeperformedontheavailablecuboids:transformdrill,roll,etc.intocorrespondingSQLand/orOLAPoperations,e.g,dice=selection+projectionDeterminetowhichmaterializedcuboid(s)therelevantoperationsshouldbeapplied.Exploringindexingstructuresandcompressedvs.densearraystructuresinMOLAP09二月2025DataWarehousingandOLAPTechnology57MetadataRepositoryMetadataisthedatadefiningwarehouseobjects.IthasthefollowingkindsDescriptionofthestructureofthewarehouseschema,view,dimensions,hierarchies,deriveddatadefn,datamartlocationsandcontentsOperationalmeta-datadatalineage(historyofmigrateddataandtransformationpath),currencyofdata(active,archived,orpurged),monitoringinformation(warehouseusagestatistics,errorreports,audittrails)ThealgorithmsusedforsummarizationThemappingfromoperationalenvironmenttothedatawarehouseDatarelatedtosystemperformancewarehouseschema,viewandderiveddatadefinitionsBusinessdatabusinesstermsanddefinitions,ownershipofdata,chargingpolicies09二月2025DataWarehousingandOLAPTechnology58DataWarehouseBack-EndToolsandUtilitiesDataextraction:getdatafrommultiple,heterogeneous,andexternalsourcesDatacleaning:detecterrorsinthedataandrectifythemwhenpossibleDatatransformation:convertdatafromlegacyorhostformattowarehouseformatLoad:sort,summarize,consolidate,computeviews,checkintegrity,andbuildindiciesandpartitionsRefreshpropagatetheupdatesfromthedatasourcestothewarehouse09二月2025DataWarehousingandOLAPTechnology59Discovery-DrivenExplorationofDataCubesHypothesis-driven:explorationbyuser,hugesearchspaceDiscovery-driven(Sarawagietal.’98)pre-computemeasuresindicatingexceptions,guideuserinthedataanalysis,atalllevelsofaggregationException:significantlydifferentfromthevalueanticipated,basedonastatisticalmodelVisualcuessuchasbackgroundcolorareusedtoreflectthedegreeofexceptionofeachcellComputationofexceptionindicato
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2026屆山東省濱州市惠民縣數(shù)學高一上期末聯(lián)考試題含解析
- 內(nèi)兒科護理培訓課件講解
- 獸藥飼料培訓班課件
- 私人口腔會計管理制度(3篇)
- 診療組長管理制度及流程(3篇)
- 金融國慶活動策劃方案(3篇)
- 防藥品誤食管理制度(3篇)
- 食品車間環(huán)保管理制度(3篇)
- 中學校園文化建設制度
- 養(yǎng)老院收費標準及退費制度
- 廣西出版?zhèn)髅郊瘓F有限公司2026年招聘備考題庫附答案詳解
- 陶瓷工藝品彩繪師改進水平考核試卷含答案
- 2025廣東百萬英才匯南粵惠州市市直事業(yè)單位招聘急需緊缺人才31人(公共基礎知識)測試題附答案
- 粉塵防護知識課件
- 2026年孝昌縣供水有限公司公開招聘正式員工備考題庫及完整答案詳解一套
- (2025年)糧食和物資儲備局招聘考試題庫(答案+解析)
- 2026年樂陵市市屬國有企業(yè)公開招聘工作人員6名備考題庫及答案詳解一套
- GB 30981-2020 工業(yè)防護涂料中有害物質(zhì)限量
- 鋼結(jié)構廠房布置及設備
- 畢業(yè)設計(論文)-全自動果蔬切丁機設計(含全套CAD圖紙)
評論
0/150
提交評論