數(shù)據(jù)倉庫與數(shù)據(jù)挖掘-雙語第10章-dw query_第1頁
數(shù)據(jù)倉庫與數(shù)據(jù)挖掘-雙語第10章-dw query_第2頁
數(shù)據(jù)倉庫與數(shù)據(jù)挖掘-雙語第10章-dw query_第3頁
數(shù)據(jù)倉庫與數(shù)據(jù)挖掘-雙語第10章-dw query_第4頁
數(shù)據(jù)倉庫與數(shù)據(jù)挖掘-雙語第10章-dw query_第5頁
免費(fèi)預(yù)覽已結(jié)束,剩余44頁可下載查看

付費(fèi)下載

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

DataWarehouseand

DataMiningYangYan2022/12/27HD-ITR2

Part1

DataWarehouse2022/12/27HD-ITR3ContentsDatawarehouseIntroductionDatamodelDataStorageandindexingOperatingAlgorithmsQueryProcessingandOptimizationFromdatawarehousingtodatamining2022/12/27HD-ITR45.1Rangequeryprocessing5.2QueryoptimizationChapter5

Queryprocessingandoptimization2022/12/27HD-ITR5Whatisarangequery?Range(R,Af):

R=[l1,h1][l2,h2]…[ld,hd]C,Af(R)=Af({C(x1,x2,…,xd)|(x1,x2,…,xd)R})Example:

Range(R=[2,5][2,5],Af=Sum)

Range(R,Af)=2+3+3+3+1+5+3+5+1+3+3+4+3+6+1+8=545.1Rangequeryprocessing2022/12/27HD-ITR6Howtoprocessarangequery?Na?vemethodAccesseachselectedcellandperformaggregationoperationRangequeryprocessingcost:O(nd)Updatecost:O(1)5.1Rangequeryprocessing2022/12/27HD-ITR7CostmodelsforrangequeryprocessingmethodsCqCu

(HoC.T.,1997)Cqnq+Cunu

(WeifaLiang,2000)Where,CqisRangequerycostCuis

Updatecostnqisthenumberofrangequeriesnuisthenumberofupdates5.1Rangequeryprocessing2022/12/27HD-ITR8PrefixSumCube(PC)WhatisthePrefixSumCubeGivenad-dimensioanldatacubeC,Cisstoredinad-dimensionalarrayAwithsizeofnd

PCisad-dimensioanlarray,whichhasthesamesizeasarrayA

Eachcellindexedby(x1,x2,…,xd)inPCcontainsthesumofallcellsuptoandincludingitselfinarrayA

forall0xinand0id,PC(x1,x2,…,xd)=Sum({A(y1,y2,…,yd)|1id,0yixi})Thus,thesumoftheentirearrayAisfoundinthelastcellofPC5.1Rangequeryprocessing2022/12/27HD-ITR9CubeCstoredinan2-diemnsioanlarrayPC5.1Rangequeryprocessing2022/12/27HD-ITR10PrefixSumCube(PC)RangequeryprocessingbasedonPCThesumofallthedataitemsintheregionselectedbyarangequerycanbecomputedbyaddingandsubtractingthesumsofvariousotherregionsuntiltheinterestingregionisextracted5.1Rangequeryprocessing2022/12/27HD-ITR11

Range([2,5][3,5],Sum)=PC(5,5)-PC(1,5)-C(5,2)+PC(1,2)=126-50-50+215.1Rangequeryprocessing2022/12/27HD-ITR12CostRangequery:

O(1)Update:

O(nd)SpaceCost:2ndHybrid

costofPCO(nd)inmodelofCq*CuO(knd)inmodelofCqnq+CunuProblemofPCMethod:

Theupdatecostistoobig5.1Rangequeryprocessing2022/12/27HD-ITR135.1Rangequeryprocessing5.2QueryoptimizationChapter5

Queryprocessingandoptimization2022/12/27HD-ITR145.2Queryoptimization單查詢的優(yōu)化處理帶聚集的查詢處理PreGroup-ByPartialPreGroup-By多查詢的優(yōu)化技術(shù)公共子表達(dá)式提取共享的查詢計劃模版2022/12/27HD-ITR155.2.1單查詢的優(yōu)化處理Pre-aggregationPre-aggregationisasimpledatareductionoperatorthatcanbeappliedtoaggregationqueries.WheneverwegroupandaggregateonacolumnsetG,wecanpreaggregateonanycolumnsetthatfunctionallydeterminesG.Preaggregationcanbeused,forexample,toreducetheinputsizetoajoin.Regularaggregationreducestheinputtoonerecordpergroup.2022/12/27HD-ITR16ThefollowingqueryagainsttheTPC-Ddatabasecomputestotalyearlysalesbysuppliernation.5.2.1單查詢的優(yōu)化處理2022/12/27HD-ITR17lineitemsupplierS_nationkey,yyS_suppkey=l_suppkeyQuery1a5.2.1單查詢的優(yōu)化處理2022/12/27HD-ITR18ThestandardwayofevaluatingthisqueryistofirstjoinLineitemandSupplierandthenapplygroupingandaggregation.WeranthisqueryusingMicrosoftSQLServeronaTPC-Ddatabaseatscalefactor1,containing6,001,215lineitemsand10,000suppliers.Usinghashjoinandhashaggregation,thequerycompletedin3:22minutes.ThebulkofthetimewasspentonscanningtheLineitemtableandcomputingthejoin.5.2.1單查詢的優(yōu)化處理2022/12/27HD-ITR19Eachsupplierbelongstoauniquenation.s_suppkeyfunctionallydeterminess_nationkey.Thisimpliesthat(s_suppkey,year)functionallydeterminess_nationkeyandyear.Wecanexploitthisfacttopreaggregateonl_suppkeyandyear,thatis,wefirstcomputeyearlysalesbysupplierandthensumtheyearlysalesforallsupplierswithineachnation.5.2.1單查詢的優(yōu)化處理2022/12/27HD-ITR20supplierS_suppkey,yylineitemS_suppkey=l_suppkeyS_nationkey,yyQuery1b5.2.1單查詢的優(yōu)化處理2022/12/27HD-ITR21Thisversionofthequerycompletedin2:01minutesusinghashaggregationandhashjoin.Preaggregatingonsuppkeyandyearreducedtheinputtothejoinfrom6,000,125rowsto70,000rows(10,000supplierstimes7years),resultinginaconsiderablesaving.5.2.1單查詢的優(yōu)化處理2022/12/27HD-ITR22ProblemsHowever,completepre-aggregationmaybeexpensiveifthenumberofgroupsislarge.hash-basedaggregationIfhashingisused,recordsmayhavetobespilledtodisk;sortbasedaggregationifsortingisused,wemayhavetosortalargenumberofrecords5.2.1單查詢的優(yōu)化處理2022/12/27HD-ITR23Hash-basedMethodsAggregationEx.Agg(R,{B,C},Sum(M)),R(A,B,C,D;M)Phase1:hash讀入R的一個塊到輸入緩沖區(qū),對于輸入緩沖區(qū)中的每一個元組t,以AD作為散列關(guān)鍵字,將t散列到桶h(t)對應(yīng)的緩沖區(qū)中。具有相同分組屬性值的元組將被散列到同一個桶中。若緩沖區(qū)滿了,將它寫到磁盤。同時清空緩沖區(qū)內(nèi)容。最后,對于每個桶的最后一塊,如果它不空的話,我們就把它寫到磁盤。Phase2:performaggregationoneachpartition依次將每一個桶讀入內(nèi)存,執(zhí)行聚集操作。5.2.1單查詢的優(yōu)化處理2022/12/27HD-ITR24Hash-basedMethodsJoinEx.R(X,Y)S(Y,Z)

Phase1:hash使用連接屬性Y作為散列關(guān)鍵字,將R和S分別散列到M-1個桶中,產(chǎn)生2(M-1)個桶:R1,R2,…,RM-1;S1,S2,…,SM-1。如果R和S的元組能連接,那么它們必然出現(xiàn)在具有某個i值的相應(yīng)桶Ri和Si中。Phase2:performaggregationoneachpartition在每一個桶對(Ri,Si)內(nèi),執(zhí)行連接操作。5.2.1單查詢的優(yōu)化處理2022/12/27HD-ITR25howtoreducethecostofpreaggregation.Partialpre-aggregationpre-aggregationneednotbecomplete--ifmultiplerecordshappentobeoutputforagroup,theywillbecombinedintothesamegroupbythefinalaggregation.Combiningjoinwithpre-aggregationcombinethetwooperatorsandhavethemusethesamerecordstore.5.2.1單查詢的優(yōu)化處理2022/12/27HD-ITR265.2.1單查詢的優(yōu)化處理2022/12/27HD-ITR270010002004012011611131120113412101255T1(G1,J1,S1)006018101912332017T2(G2,J2,S2)003434011657021651106724116566Group-by(G1,G2)Sum(S1),Sum(S2)J1=J20001060101019020101700026010219020217000460104190204170012800116810113810120810134811210331125533G1G2J1S1S22022/12/27HD-ITR280010002004012011611131120113412101255T1(G1,J1,S1)00163011821167312652Group-by(G1,J1)Sum(S1)asss1,cnt006018101912332017T2(G2,J2,S2)003434011657021651106724116566Group-by(G1,G2)Sum(ss1),Sum(S2)*cntJ1=J22022/12/27HD-ITR29maintaininmemoryasetofgrouprecords,oneforeachgroupseensofar;whenaninputrecordarrives,combineitwiththematchinggrouprecordifoneexists,otherwiseinitializeanewgrouprecord.Whenmemoryesfull,simplyoutputsomeofthegrouprecordstocreatefreespace.Theoutputrecordsthenrepresentpartiallyaggregatedgroupsinsteadofindividualinputrecords.Asimplehash-basedalgorithmforpartialpre-aggregation2022/12/27HD-ITR30t1t2t3…tn

Asimplehash-basedalgorithmforpartialpre-aggregation2022/12/27HD-ITR31Absorptionratetheabsorptionrateofagrouprecordastheprobabilitythatthenextinputrecordwillmatchwithandbeabsorbedintothegrouprecord.estimatingtheabsorptionratemaybedifficultthesameproblemoccursincaching,LRUareobviouschoices.Asimplehash-basedalgorithmforpartialpre-aggregation2022/12/27HD-ITR32Thecostofpartialpreaggregationadditionalprocessingtime(mainlyforhashing)additionalmemoryusedforstoringthegrouprecords.Thedecisionwhethertoapplypartialpreaggregationshouldbecost-based,whichrequiresanestimateoftheexpectedoutputsize.Asimplehash-basedalgorithmforpartialpre-aggregation2022/12/27HD-ITR33MathematicalmodelIftheinputstreamcontainsDdistinctgroups,thencompleteaggregationreducestheinputtoexactlyDoutputrecords,onerecordpergroup.導(dǎo)致partialpreaggregation產(chǎn)生多于D個輸出的因素:amountofmemoryavailable,numberofgroups,thegroupsizedistribution,andtheorderingoftheinput.

Asimplehash-basedalgorithmforpartialpre-aggregation2022/12/27HD-ITR345.2.2多查詢的優(yōu)化處理在多個查詢之間提取公共子表達(dá)式提取公共計劃模版2022/12/27HD-ITR355.2.2多查詢的優(yōu)化處理提取公共計劃模版amortizesthecostofqueryoptimizationthroughthereuseofplansgeneratedforearlierqueries.建立一個計劃庫

plan

database從計劃庫中為新查詢選定一個合適的計劃,使得該計劃與由優(yōu)化器產(chǎn)生的計劃是相同的。當(dāng)無法為新查詢選定合適的計劃時,則優(yōu)化器執(zhí)行查詢優(yōu)化過程,為該查詢生成一個優(yōu)化的查詢計劃,并將新產(chǎn)生的計劃加入到計劃庫中。2022/12/27HD-ITR36提取公共計劃模版把相似的查詢劃分成若干組,使用優(yōu)化器為該組產(chǎn)生的查詢計劃來執(zhí)行所有可能分到該組的未來查詢。QuerysimilarityisevaluatedbasedonacomparisonofQuerystructuresandtheassociatedtableschemasandStatisticsaclassifierisemployedforefficientclusterassignments5.2.2多查詢的優(yōu)化處理2022/12/27HD-ITR37提取公共計劃模版First,defineaqueryfeaturevector

comprisedofinformationdeterminedfromthequeryandfromthecatalogsoftheRDBMS.thenumberoftablesandjoinsinthequery,indexesonqueryattributes,thenumberofpredicatesinwhichthetableisinvolved,andthesizeofthetable.etc.5.2.2多查詢的優(yōu)化處理2022/12/27HD-ITR38提取公共計劃模版Next,defineasimilarityfunctionthattakesapairofqueryfeaturevectorsasinputandquantitativelycomputestheirseparationinfeaturespace.Athresholdvalueforthisseparationisusedtodecidewhetherornotthetwoqueriesaresimilar.5.2.2多查詢的優(yōu)化處理2022/12/27HD-ITR39提取公共計劃模版Usingthesimilaritydefinition,queryclustersaredynamicallyformedinanincrementalmanner,withthedistancethresholddeterminingthemaximumstretchoftheclusterEachclusterhasarepresentative

forwhomtheexecutionplan,asdeterminedbytheoptimizer,ispersistentlystored.

Thisplanisusedtoexecuteallfuturequeriesthatareassignedtothecluster.

5.2.2多查詢的優(yōu)化處理2022/12/27HD-ITR40提取公共計劃模版Finally,whenasufficientnumberofclustershavebeenformed,aclassifierisconstructedontheclusterstosupportefficientidentificationoftheclustertowhichanewquerymaybelong,therebyalsodeterminingitsexecutionplan.5.2.2多查詢的優(yōu)化處理2022/12/27HD-ITR41總結(jié)區(qū)域查詢處理PC結(jié)構(gòu)查詢優(yōu)化預(yù)聚集提取公共計劃模版Chapter5

Queryprocessingandoptimization2022/12/27HD-ITR42ContentsDatawarehouseIntroductionDatamodelDataStorageandindexingOperatingAlgorithmsQueryProcessingandOptimizationFromdatawarehousingtodatamining2022/12/27HD-ITR43FromOn-LineAnalyticalProcessingtoOnLineAnalyticalMining(OLAM)Whyonlineanalyticalmining?HighqualityofdataindatawarehousesDWcontainsintegrated,consistent,cleaneddataAvailableinformationprocessingstructuresurroundingdatawarehousesODBC,OLEDB,Webaccessing,servicefacilities,reportingandOLAPtoolsOLAP-basedexploratorydataanalysisminingwithdrilling,dicing,pivoting,etc.On-lineselectionofdataminingfunctionsintegrationandswappingofmultipleminingfunctions,algorithms,andtasks.ArchitectureofOLAM2022/12/27HD-ITR44AnOLAMArchitectureDataWarehouseMetaDataMDDBOLAMEngineOLAPEngineUserGUIAPIDataCubeAPIDatabas

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論