實(shí)用教程TeradataPPT課件_第1頁(yè)
實(shí)用教程TeradataPPT課件_第2頁(yè)
實(shí)用教程TeradataPPT課件_第3頁(yè)
實(shí)用教程TeradataPPT課件_第4頁(yè)
實(shí)用教程TeradataPPT課件_第5頁(yè)
已閱讀5頁(yè),還剩76頁(yè)未讀 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說(shuō)明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、實(shí)用教程(Teradata),陸世潮2008年9月,問(wèn)題總結(jié),常見(jiàn)問(wèn)題分類:表屬性不對(duì):Set/Multiset問(wèn)題:INSERT操作慢主索引(PI)設(shè)置不合理問(wèn)題1:數(shù)據(jù)傾斜度大,空間爆滿。問(wèn)題2:JOIN操作,數(shù)據(jù)需要重分布。分區(qū)索引(PPI)設(shè)置不合理問(wèn)題:全表掃描連接條件過(guò)于復(fù)雜問(wèn)題:系統(tǒng)無(wú)法優(yōu)化執(zhí)行計(jì)劃缺乏統(tǒng)計(jì)信息問(wèn)題:系統(tǒng)無(wú)法找到最優(yōu)化的執(zhí)行計(jì)劃,SQL跑得慢哈!,提綱,Teradata架構(gòu)常見(jiàn)問(wèn)題,及解決方法Teradata工具實(shí)用小技巧JOIN的實(shí)現(xiàn)機(jī)制JOIN的優(yōu)化,Teradata體系架構(gòu),TeradataandMPPSystems,RDBMSARCH,LogicalExa

2、mpleofNPPIversusPPI,提綱,Teradata架構(gòu)常見(jiàn)問(wèn)題,及解決方法Teradata工具實(shí)用小技巧JOIN的實(shí)現(xiàn)機(jī)制JOIN的優(yōu)化,表屬性:Set,例子:pmart.RPT_NM_GRP_PRE_WARN_MON內(nèi)蒙移動(dòng)集團(tuán)客戶預(yù)警指標(biāo)月報(bào)表,假設(shè)原有1286449條記錄插入:152853條記錄耗時(shí):15秒,表屬性:Set,例子:pmart.RPT_NM_GRP_PRE_WARN_MON內(nèi)蒙移動(dòng)集團(tuán)客戶預(yù)警指標(biāo)月報(bào)表,建議:Teradata中都用MultiSet,假設(shè)原有1286449條記錄插入:152853條記錄耗時(shí):1秒,例子:CREATEMULTISETTABLEttt

3、emp.VT_SUBS_VIOC_QUANas(SELECT*FROMtttemp.MID_SUBS_VIOC_QUANWHERECAL_MONTH=200802AND*)WITHDATAPRIMARYINDEX(subs_id);,臨時(shí)表,默認(rèn)為:Set需要指定為:Multiset,字段越多,記錄越多差別越明顯,PI(PrimaryIndex主索引)的選擇,PI影響數(shù)據(jù)的存儲(chǔ)與訪問(wèn),其選擇標(biāo)準(zhǔn):不同值盡量多的字段(MoreUniqueValues)使用頻繁的字段:包括值訪問(wèn)和連接訪問(wèn)少更新PI字段不宜太多最好是手動(dòng)指定PI,例子:用戶語(yǔ)音業(yè)務(wù)量中間表CREATEMULTISETTABLEtt

4、temp.MID_SUBS_VIOC_QUAN(CAL_MonthINTEGERTITLE統(tǒng)計(jì)月份,City_IDCHAR(4)TITLE地市標(biāo)識(shí),Channel_IDCHAR(8)TITLE渠道標(biāo)識(shí),Subs_idCHAR(12)TITLE用戶標(biāo)識(shí),。)PRIMARYINDEX(subs_id);,例子:用戶語(yǔ)音業(yè)務(wù)量臨時(shí)表CREATEMULTISETTABLEtttemp.VT_SUBS_VIOC_QUANas(SELECT*FROMtttemp.MID_SUBS_VIOC_QUANWHERECAL_MONTH=200802AND*)WITHDATAPRIMARYINDEX(subs_id

5、);,Subs_ID:頻繁使用UniqueValue多,如果不指定PI,系統(tǒng)默認(rèn)為:Cal_Month,PI(PrimaryIndex主索引)的選擇(cont.),例子:夢(mèng)網(wǎng)客戶活躍客戶分析CREATEMULTISETTABLEPMART.FCT_DATA_MONNET_ACTIVE_MON(CAL_MonthINTEGERTITLE統(tǒng)計(jì)月份,City_IDCHAR(4)TITLE地市標(biāo)識(shí),Channel_IDCHAR(8)TITLE渠道標(biāo)識(shí),Mont_SVC_Type_CodCHAR(3)TITLE夢(mèng)網(wǎng)業(yè)務(wù)類型編碼,Mont_SVC_CAT_MicroCls_CodCHAR(3)TITLE夢(mèng)

6、網(wǎng)業(yè)務(wù)分類小類編碼,Mont_SVC_CHRG_Type_CodCHAR(2)TITLE夢(mèng)網(wǎng)業(yè)務(wù)計(jì)費(fèi)類型編碼,THR_Brand_CodCHAR(1)TITLE三大品牌編碼,Mont_Consume_Level_CodCHAR(2)TITLE夢(mèng)網(wǎng)消費(fèi)層次編碼,Consume_Level_CodCHAR(2)TITLE消費(fèi)層次編碼,。)PRIMARYINDEX(CAL_Month,City_ID,Channel_ID,Mont_SVC_Type_Cod,Mont_SVC_CAT_MicroCls_Cod,Mont_SVC_CHRG_Type_Cod,THR_Brand_Cod,Mont_Cons

7、ume_Level_Cod,Consume_Level_Cod);PI:9字段2字段:City_ID,Channel_ID調(diào)整PI后,在右邊的SQL中,PI是否起作用?,以下SQL,PI是否起作用?:1.值訪問(wèn)Select*FromFCT_DATA_MONNET_ACTIVE_MONWhereCity_ID=070010andChannel_ID=0100andcal_month=2007072.連接訪問(wèn)Select*FromFCT_DATA_MONNET_ACTIVE_MONALEFTJOINMID_CHANNEL_INFO_DAILYBONA.Channel_ID=B.Channel_ID

8、andA.City_ID=b.City_IDLEFTJOINVW_CDE_REGION_TYPECONA.City_ID=C.City_ID3、值訪問(wèn)連接訪問(wèn)Select*FromFCT_DATA_MONNET_ACTIVE_MONA,VT_INFOBWHEREA.Channel_ID=B.Channel_IDANDA.City_ID=B.City_IDANDA.CAL_MONTH=200707ANDA.Consume_Level_Cod=B.Consume_Level_Cod,PPI的使用,PPI(PartitionPrimaryIndex,分區(qū)索引),把具有相同分區(qū)值的數(shù)據(jù)聚簇存放在一起;

9、類似于SQLServer的聚簇索引(ClusterIndex),Oracle的聚簇表(ClusterTable)。利用PPI,可以快速插入/訪問(wèn)同一個(gè)Partition(分區(qū))的數(shù)據(jù)。,CREATEMULTISETTABLEqdata.TB_DQC_KPI_CHECK_RESULT(TX_DATEDATEFORMATYYYYMMDDTITLE數(shù)據(jù)日期NOTNULL,KPI_CODEINTEGERTITLE指標(biāo)代碼NOTNULL,。)PRIMARYINDEX(KPI_CODE)PARTITIONBYRANGE_N(TX_DATEBETWEENCAST(20030101)ASDATEFORMATY

10、YYYMMDD)ANDCAST(20191231)ASDATEFORMATYYYYMMDD)EACHINTERVAL1DAY,NORANGEORUNKNOWN);,Select*FromTB_DQC_KPI_CHECK_RESULTWheretx_date=20070701;或Wheretx_datebetween20070701and20070731;或Wheretx_date20070701;但Wheretx_datelike200707%;不起作用,PPI的使用(cont.),Partition上不要使用表達(dá)式,否則Partition不能被正確使用。T1.tx_date/100=CAST

11、(20070917ASDATEFORMATYYYYMMDD)/100Substring(T1.tx_datefrom1for6)=200709應(yīng)該修改為T1.tx_date=CAST(20070901ASDATEFORMATYYYYMMDD),PPI的使用(cont.),腳本:tb_030040270.pl/*刪除當(dāng)月*/2小時(shí)delBASS1.tb_03004whereproc_dt=200709;insertintoBASS1.tb_030047小時(shí)。,sel.frompview.vw_evt_cust_socustwhereacpt_date=cast(200710|01asdate)c

12、ast(200710|01asdate)寫法錯(cuò)誤,PPI不起作用日期的正確寫法:Cast(20071001asdateformatYYYYMMDD),在proc_dt建立PPI,PPI字段從Load_Date調(diào)整為acpt_date,創(chuàng)建可變臨時(shí)表,它僅存活于同一個(gè)Session之內(nèi)注意指定可變臨時(shí)表為multiset(通常也要指定PI)可變臨時(shí)表不能帶有PPI例子1:createvolatilemultisettablevt_RETAIN_ANLY_MONas(selectcol1,col2,fromwheregroupby.)withdataPRIMARYINDEX(PI_Cols)ONC

13、OMMITPRESERVEROWS;例子2:createvolatilemultisettablevt_RETAIN_ANLY_MON(col1char(2),col2varchar(12)NOTNULL)PRIMARYINDEX(PI_Cols)ONCOMMITPRESERVEROWS;,創(chuàng)建可變臨時(shí)表(cont.),例子3:createvolatilemultisettablevt_RETAIN_ANLY_MONas(selectcol1,cast(adcasvarchar(12)col2fromwhere)withnodataPRIMARYINDEX(col1)ONCOMMITPRESE

14、RVEROWS;例子4:createvolatilemultisettablevt_net_gsm_nlaspdata.tb_net_gsm_nlwithnodataONCOMMITPRESERVEROWS;,字段col2將用unicode字符集;當(dāng)跟普通字段(latin字符集)join時(shí),需要進(jìn)行數(shù)據(jù)重新分布。不建議,失?。阂?yàn)閜data.tb_net_gsm_nl有PPI而可變臨時(shí)表不允許有PPI,固化臨時(shí)表,固化臨時(shí)表,就是把查詢結(jié)果存放到一張物理表。共下次分析或他人使用Session斷開(kāi)之后,仍然可以使用。示例1:CREATEMULTISETTABLEtttemp.TMP_BOSS_V

15、OICas(select*frompview.vw_net_gsm_nl)WITHnoDATAPRIMARYINDEX(subs_id);INSERTINTOtttemp.TMP_BOSS_VOICSELECT*FROMpview.vw_net_gsm_nlWHERE*;示例2:CREATEMULTISETTABLEtttemp.TMP_BOSS_VOICas(select*frompview.vw_net_gsm_nlWHERE*)WITHDATAPRIMARYINDEX(subs_id);示例3:(復(fù)制表,數(shù)據(jù)備份)CREATEMULTISETTABLEtttemp.TMP_BOSS_VO

16、ICASpdata.tb_net_gsm_nlWITHDATA;,數(shù)據(jù)類型,注意非日期字段與日期字段char,Statement1SELECT*FROMEmp1WHEREEmp_no=1234;,Statement2SELECT*FROMEmp1WHEREEmp_no=1234;,Table1CREATETABLEEmp2(Emp_noINTEGER,Emp_nameCHAR(20)PRIMARYINDEX(Emp_no);,Statement1SELECT*FROMEmp2WHEREEmp_no=1234;,Statement2SELECT*FROMEmp2WHEREEmp_no=1234;

17、,Case2,ResultsinFullTableScan,Resultsinunnecessaryconversion,目標(biāo)列的選擇,減少目標(biāo)列,可以少消耗SPOOL空間,從而提高SQL的效率當(dāng)系統(tǒng)任務(wù)繁忙,系統(tǒng)內(nèi)存少的時(shí)候,效果尤為明顯。舉例:GSM語(yǔ)言話單表,PDATA.TB_NET_GSM_NL共有73字段,以下SQL供返回1.6億條記錄左邊的SQL,記錄最長(zhǎng)為:698字節(jié),平均399字節(jié)右邊的SQL,記錄最長(zhǎng)為:59字節(jié),平均30字節(jié)兩者相差400多GB的SPOOL空間,IO次數(shù)也隨著相差甚大!,SPOOL空間估計(jì):497GB,SPOOL空間估計(jì):42GB,SELECTSUBS_ID

18、,MSISDN,Begin_Date,Begin_Time,Call_DUR,CHRG_DURFROMPDATA.TB_NET_GSM_NLWHEREPROC_DATEBETWEEN20070701AND20070731,SELECT*FROMPDATA.TB_NET_GSM_NLWHEREPROC_DATEBETWEEN20070701AND20070731,Where條件的限定,根據(jù)Where條件先進(jìn)行過(guò)濾數(shù)據(jù)集,再進(jìn)行連接(JOIN)等操作這樣,可以減少參與連接操作的數(shù)據(jù)集大小,從而提高效率好的查詢引擎,可以自動(dòng)優(yōu)化;但有些復(fù)雜SQL,查詢引擎優(yōu)化得并不好。注意:系統(tǒng)的SQL優(yōu)化,只是避

19、免最差的,選擇相對(duì)優(yōu)的,未必能夠得到最好的優(yōu)化結(jié)果。,SELECTA.TX_DATE,A.KPI_CODE,B.SRC_NAME,A.KPI_VALUEFROM(select*fromqdata.tb_dqc_kpi_check_resultwhereTX_DATE=20070701ANDKPI_CODE=65)ALEFTJOIN(SELECT*FROMqdata.tb_dqc_kpi_defwhereKPI_CODE=65andN_TYPE=M)BONA.KPI_CODE=B.KPI_CODE,SELECTA.TX_DATE,A.KPI_CODE,coalesce(B.SRC_NAME,no

20、name),A.KPI_VALUEFROMqdata.tb_dqc_kpi_check_resultALEFTJOINqdata.tb_dqc_kpi_defBONA.KPI_CODE=B.KPI_CODEWHEREA.TX_DATE=20070701ANDA.KPI_CODE=65ANDB.N_TYPE=M,rewrite,用CaseWhen替代UNION,selcity_id,channel_id,cust_brand_id,sum(stat_values)asstat_valuesfrom(.selectt.city_id-語(yǔ)音雜志計(jì)費(fèi)量,coalesce(v.channel_id,b.

21、channel_id,-)aschannel_id,cust_brand_id,sum(casewhenSMS_SVC_Type_Level_SECND=017andCall_Type_Codein(00,10,01,11)thensms_quanelse0END)asstat_valuesfromPVIEW.vw_mid_sms_svc_quan_dailytleftjoinVT_SUBSvont.subs_id=v.subs_idleftjoinPVIEW.vw_FCT_CDE_BUSN_CITY_TYPEbont.city_id=b.City_IDwherecal_date=200709

22、14groupby1,2,3unionallselectt.city_id-夢(mèng)網(wǎng)短信計(jì)費(fèi)量,coalesce(v.channel_id,b.channel_id,-)aschannel_id,cust_brand_id,sum(sms_quan)asstat_valuesfromPVIEW.vw_mid_sms_svc_quan_dailytleftjoinVT_SUBSvont.subs_id=v.subs_idleftjoinPVIEW.vw_FCT_CDE_BUSN_CITY_TYPEbont.city_id=b.City_IDwherecal_date=20070914andSMS_S

23、VC_Type_Level_SECNDlike02%andSMS_SVC_Type_Level_SECNDnotin(021,022)groupby1,2,3.)tmpGroupby1,2,3,兩個(gè)子查詢的表連接部分完全一樣兩個(gè)子查詢除了取數(shù)據(jù)條件,其它都一樣。Unionall是多余的,它需要重復(fù)掃描數(shù)據(jù),進(jìn)行重復(fù)的JOIN可以用Casewhen替代union,作業(yè):KPI_NWR_SMS_BILL_QUAN描述:點(diǎn)對(duì)點(diǎn)短信計(jì)費(fèi)量腳本:kpi_nwr_sms_bill_quan0600.pl,用CaseWhen替代UNION(cont.),selcity_id,channel_id,cust_

24、brand_id,sum(stat_values)asstat_valuesfrom(selectt.city_id,coalesce(v.channel_id,b.channel_id,-)aschannel_id,cust_brand_id,sum(CASEWHENSMS_SVC_Type_Level_SECND=017andCall_Type_Codein(00,10,01,11)THENsms_quan-語(yǔ)音雜志計(jì)費(fèi)量WHENSMS_SVC_Type_Level_SECNDlike02%andSMS_SVC_Type_Level_SECNDnotin(021,022)THENsms_q

25、uan-夢(mèng)網(wǎng)短信計(jì)費(fèi)量ELSE0END)asstat_valuesfromPVIEW.vw_mid_sms_svc_quan_dailytleftjoinVT_SUBSvont.subs_id=v.subs_idleftjoinPVIEW.vw_FCT_CDE_BUSN_CITY_TYPEbont.city_id=b.City_IDwherecal_date=20070914.)tmpGroupby1,2,3,SQL優(yōu)化重寫,用OR替代UNION,Selectcity_id,channel_id,cust_brand_id,sum(sms_quan)stat_valuesfrom(select

26、t.city_id-語(yǔ)音雜志計(jì)費(fèi)量,coalesce(v.channel_id,b.channel_id,-)aschannel_id,cust_brand_id,sum(sms_quan)stat_valuesfromPVIEW.vw_mid_sms_svc_quan_dailytleftjoinVT_SUBSvont.subs_id=v.subs_idleftjoinPVIEW.vw_FCT_CDE_BUSN_CITY_TYPEbont.city_id=b.City_IDwherecal_date=20070914andSMS_SVC_Type_Level_SECND=017andCall

27、_Type_Codein(00,10,01,11)groupby1,2,3unionallselectt.city_id-夢(mèng)網(wǎng)短信計(jì)費(fèi)量,coalesce(v.channel_id,b.channel_id,-)aschannel_id,cust_brand_id,sum(sms_quan)asstat_valuesfromPVIEW.vw_mid_sms_svc_quan_dailytleftjoinVT_SUBSvont.subs_id=v.subs_idleftjoinPVIEW.vw_FCT_CDE_BUSN_CITY_TYPEbont.city_id=b.City_IDwhereca

28、l_date=20070914andSMS_SVC_Type_Level_SECNDlike02%andSMS_SVC_Type_Level_SECNDnotin(021,022)groupby1,2,3)TGroupby1,2,3,兩個(gè)子查詢的表連接部分完全一樣兩個(gè)子查詢除了取數(shù)據(jù)條件,其它都一樣。Unionall是多余的,它需要重復(fù)掃描數(shù)據(jù),進(jìn)行重復(fù)的JOIN可以用OR替代union此類的問(wèn)題,在腳本中經(jīng)常見(jiàn)到。,用OR替代UNION(cont.),selectt.city_id,coalesce(v.channel_id,b.channel_id,-)aschannel_id,cust_

29、brand_id,sum(sms_quan)asstat_valuesfromPVIEW.vw_mid_sms_svc_quan_dailytleftjoinVT_SUBSvont.subs_id=v.subs_idleftjoinPVIEW.vw_FCT_CDE_BUSN_CITY_TYPEbont.city_id=b.City_IDwherecal_date=20070914and(SMS_SVC_Type_Level_SECND=017-語(yǔ)音雜志計(jì)費(fèi)量andCall_Type_Codein(00,10,01,11))OR(SMS_SVC_Type_Level_SECNDlike02%-夢(mèng)

30、網(wǎng)短信計(jì)費(fèi)量andSMS_SVC_Type_Level_SECNDnotin(021,022)Groupby1,2,3,SQL優(yōu)化重寫,去掉多余的Distinct與Groupby,selt.operator,t.acpt_channel_id,t.acpt_city_id,t.subs_id,t.acpt_dateasevt_dateFrom(seloperator,ACPT_Channel_ID,acpt_city_id,subs_id,acpt_datefrompview.vw_evt_cust_socustwhereacpt_date=20071007andso_meth_codein(

31、0,1,2)andPROC_STS_Code=-1groupby1,2,3,4,5unionallseloperator_numasoperator,ACPT_Channel_ID,acpt_city_id,subs.subs_id,charge_dateasacpt_datefrompview.vw_fin_busi_recbusjoincrmmart.subs_day_info_dailysubsonsubs.msisdn=bus.msisdnwherecharge_date=20071007groupby1,2,3,4,5)tgroupby1,2,3,4,5;,既然t查詢外層有g(shù)roup

32、by操作去重,那么子查詢內(nèi)的Groupby去重是多余的。而且,兩個(gè)子查詢groupby后再用unionall,就可能再產(chǎn)生重復(fù)記錄,那么groupby也失去意義了。解決方法:把t查詢內(nèi)部的兩個(gè)groupby去掉即可類似的Distinct問(wèn)題,可效仿解決。,去重,去重,去重,Groupbyvs.Distinct,Distinct是去除重復(fù)的操作Groupby是聚集操作某些情況下,兩者可以起到相同的作用。兩者的執(zhí)行計(jì)劃不一樣,效率也不一樣建議:使用Groupby,selectsubs_id,acct_idfromPVIEW.VW_FIN_ACCT_SUBS_HISwhereefct_date200

33、70701groupby1,2,selectDISTINCTsubs_id,acct_idfromPVIEW.VW_FIN_ACCT_SUBS_HISwhereefct_date20070701,Unionvs.Unionall,Union與Unionall的作用是將多個(gè)SQL的結(jié)果進(jìn)行合并。Union將自動(dòng)剔除集合操作中的重復(fù)記錄;需要耗更多資源。Unionall則保留重復(fù)記錄,一般建議使用Unionall。第一個(gè)SELECT語(yǔ)句,決定輸出的字段名稱,標(biāo)題,格式等要求所有的SELECT語(yǔ)句:1)必須要有同樣多的表達(dá)式數(shù)目;2)相關(guān)表達(dá)式的域必須兼容,select*from(selecta)T

34、1(col1)unionselect*from(selectbc)T2(col2),select*from(selectbc)T3(col3)unionallselect*from(selecta)T1(col1)unionallselect*from(selectbc)T2(col2),col3-abcbc,col1-ab,先Groupby再join,腳本:rpt_mart_new_comm_mon0400.pl11小時(shí)Selectcasewhenb.CUST_Brand_IDisnullthen5020whenb.CUST_Brand_IDin(2000,5010)then5020else

35、b.CUST_Brand_IDend,sum(COALESCE(b.Bas_CHRG_DUR_Unit,0)asThsy_Accum_New_SUBS_CHRG_DUR,sum(casewhenb.call_type_code=20thenb.Bas_CHRG_DUR_Unitelse0END)fromVTNEW_SUBS_THISYEARtinnerjoinVTDUR_MONbont.Subs_ID=b.Subs_IDleftjoinPVIEW.vw_MID_CDE_LONG_CALL_TYPE_LVLconb.Long_Type_Level_SECND=c.Long_Type_Level_

36、SECNDleftjoinPVIEW.vw_MID_CDE_ROAM_TYPE_LVLdonb.Roam_Type_Level_SECND=d.Roam_Type_Level_SECNDgroupby1;,記錄數(shù)情況:t:580萬(wàn),b:9400萬(wàn),c:8,d:8主要問(wèn)題:假如連接順序?yàn)椋?bjoinc)joind)joint)則是(9400萬(wàn)join8)join8)join580萬(wàn))數(shù)據(jù)分布時(shí)間長(zhǎng)(IO多),連接次數(shù)多解決方法:先執(zhí)行(tjoinb),然后groupby,再joinc,d,先Groupby再join(cont.),腳本:rpt_mart_new_comm_mon0400.pl4

37、0秒Selectcasewhenb.CUST_Brand_IDisnullthen5020whenb.CUST_Brand_IDin(2000,5010)then5020elseb.CUST_Brand_IDend,sum(COALESCE(b.Bas_CHRG_DUR_Unit,0)asThsy_Accum_New_SUBS_CHRG_DUR,sum(casewhenb.call_type_code=20thenb.Bas_CHRG_DUR_Unitelse0END)from(selectCUST_Brand_ID,call_type_code,Long_Type_Level_SECND,R

38、oam_Type_Level_SECND,sum(Bas_CHRG_DUR_Unit)Bas_CHRG_DUR_Unit,count(*)quanfromVTDUR_MONwheresubs_idin(selectsubs_idfromVTNEW_SUBS_THISYEAR)groupby1,2,3,4)bleftjoinPVIEW.vw_MID_CDE_LONG_CALL_TYPE_LVLconb.Long_Type_Level_SECND=c.Long_Type_Level_SECNDleftjoinPVIEW.vw_MID_CDE_ROAM_TYPE_LVLdonb.Roam_Type_

39、Level_SECND=d.Roam_Type_Level_SECNDgroupby1;,記錄數(shù)情況:t:580萬(wàn),b:9400萬(wàn),c:8,d:8處理過(guò)程:先執(zhí)行(tjoinb),然后groupby,再joinc,d結(jié)果:1、VTDUR_MONjoinVTNEW_SUBS_THISYEARPI相同,mergejoin,只需10秒2、經(jīng)過(guò)groupby,b表只有332記錄3、bjoincjoind,就是:332884、最終結(jié)果:5記錄,共40秒,先Groupby再join(cont.),先匯總再連接,可以減少參與連接的數(shù)據(jù)集大小,減少比較次數(shù),從而提高效率。以下面SQL為例,假設(shè)歷史表(Hist

40、ory)有1億條記錄左邊的SQL,需要進(jìn)行1億90次比較右邊的SQL,則只需要1億1次比較,SELECTH.product_id,sum(H.account_num)FROMHistoryH,CalendarDTWHEREH.sale_date=DT.calendar_dateANDDT.quarter=3GROUPBY1;,SELECTH.product_id,SUM(H.account_num)FROMHistoryH,(SELECTmin(calendar_date)min_date,max(calendar_date)max_dateFROMCalendarWHEREquarter=3

41、)DTWHEREH.sale_dateBETWEENDT.min_dateandDT.max_dateGROUPBY1;,提取公共SQL形成臨時(shí)表,腳本:rpt_nmmart_comm_subs_mon0403.pl出現(xiàn)以下SQL代碼段,共5次,平均每次執(zhí)行需10分鐘。FROMPVIEW.VW_MID_VOIC_SVC_QUAN_MONa,PVIEW.VW_MID_CDE_SUBS_BRAND_LVLb,vt_subscWHEREa.CUST_Brand_ID=b.SUBS_Brand_Level_ThirdANDa.CAL_Month=200708ANDa.SUBS_ID=c.SUBS_ID

42、。整個(gè)腳本需要掃描以下SQL14次,平均每次執(zhí)行需3分鐘PVIEW.VW_MID_VOIC_SVC_QUAN_MONwhereCAL_Month=200708提取公共SQL,形成臨時(shí)表,較少掃描(IO)次數(shù)。該腳本,經(jīng)過(guò)優(yōu)化之后,從50分鐘縮減至10分鐘,關(guān)聯(lián)條件(1),SelectA.a2,B.b2fromAjoinBonsubstring(A.a1from1for7)=B.b1應(yīng)該寫為SelectA.a2,B.b2from(selectsubstring(a1from1for7)asa1_new,a2fromA)A_newjoinBona1_new=b1,關(guān)聯(lián)條件(2),SelectA.a

43、2,B.b2fromAjoinBonTRIM(A.a1)=TRIM(B.b1)應(yīng)該寫為SelectA.a2,B.b2fromAjoinBonA.a1=B.b1,SQL書寫不當(dāng)可能會(huì)引起笛卡兒積,以下面兩個(gè)SQL為例,它們將進(jìn)行笛卡兒積操作。例子1:Selectemployee.emp_no,employee.emp_nameFromemployeeA例子2:SELECTA.EMP_Name,B.Dept_NameFROMemployeeA,DepartmentBWherea.dept_no=b.dept_no;,修改表定義,常見(jiàn)的表定義修改操作:增加字段修改字段長(zhǎng)度建議的操作流程Renamet

44、abledb.tablexasdb.tabley;通過(guò)Showtable語(yǔ)句獲得原表db.tablex的定義定義新表:db.tablexInsertintodb.tablex(。)select。Fromdb.tabley;Droptabledb.tabley;Teradata提供ALTERTABLE語(yǔ)句,可進(jìn)行修改表定義但,不建議采用ALTERTABLE方式。,插入/更新/刪除記錄時(shí),盡量不要Abort,當(dāng)目標(biāo)表有數(shù)據(jù)時(shí),插入和更新操作,以及部分刪除,都產(chǎn)生TJ如果此時(shí)abort該操作,系統(tǒng)將會(huì)回滾,DeleteBASS1.tb_03004whereproc_dt=200709;,UPDATE

45、CustomerSETCredit_Limit=Credit_Limit*1.20;,DELETEFROMTransWHERETrans_Date981231;DROPTABLETrans;RENAMETABLETrans_NTOTrans;,先建立空表,通過(guò)insert/select方式插入數(shù)據(jù)這是非??斓牟僮?!先備份,然后做變更操作,更加安全!,對(duì)于大表進(jìn)行Update/DELETE操作,將耗費(fèi)相當(dāng)多的資源與相當(dāng)長(zhǎng)的時(shí)間。Update/Delete操作,需要事務(wù)日志TJ(TransientJournal)以防意外中斷導(dǎo)致數(shù)據(jù)受到破壞在Update/Delete操作中途被Cancel,系統(tǒng)則

46、需回滾,這將耗更多的資源與時(shí)間!在經(jīng)分系統(tǒng)中,應(yīng)嚴(yán)防此類事件發(fā)生!,DELETEFROMTransWHERETrans_DateThecontentsofSpool1aresentbacktotheuserastheresultofstatement1.Thetotalestimatedtimeis24.38seconds.,n-TableJoins(多表連接),多表連接可以分解為兩兩連接.對(duì)下面的SQL,查詢引擎可以選擇較優(yōu)的執(zhí)行計(jì)劃:例如,Plan1或者Plan2。SELECT.FROMTable_A,Table_B,Table_C,Table_DWHERE.;對(duì)下面的SQL,查詢引擎只能

47、選擇Plan2,否則結(jié)果有可能不對(duì)。SELECT.FROMTable_AleftjoinTable_BonA.c1=B.c2INNERJOINTable_CONB.c2=c.c3LEFTJOINTable_DOND.C4=A.C1WHERE.;,提綱,Teradata架構(gòu)常見(jiàn)問(wèn)題,及解決方法Teradata工具實(shí)用小技巧JOIN的實(shí)現(xiàn)機(jī)制JOIN的優(yōu)化,改變查詢計(jì)劃的手段,修改PI收集統(tǒng)計(jì)信息關(guān)聯(lián)字段上的統(tǒng)計(jì)信息Partition上的統(tǒng)計(jì)信息Where條件上的統(tǒng)計(jì)信息Groupby字段上的統(tǒng)計(jì)信息查看某個(gè)表的統(tǒng)計(jì)信息情況:helpstatDBName.TableName查看詳盡的統(tǒng)計(jì)情況:se

48、lect*frompview.vw_statistic_info通過(guò)Explain查看,尚需統(tǒng)計(jì)哪些信息?diagnostichelpstatsonforsession;,優(yōu)化示例1:數(shù)據(jù)分布與JOIN方法,腳本:tb_rmis_bb40_mon2900280.pl,16小時(shí)select200709,a.City_ID,a.SUBS_ID,a.BELONG_DISTRICTfrom(selectCity_ID,SUBS_ID,BELONG_DISTRICTfromMMART.TB_MIS_DISTRICT_DIVISIONwhereCAL_Month=200612unionselectCity

49、_ID,SUBS_ID,BELONG_DISTRICTfromMMART.TB_MIS_DISTRICT_DIVISION_NEW)ainnerjoinpview.vw_mid_subs_info_monbona.subs_id=b.subs_idWhereCAL_Month=200709AndSUBS_STS_CodenotIn(10,11,12,13,20,30,60)andSUBS_STS_EFCT_Datebetween200709|01and20070930groupby1,2,3,4;,主要問(wèn)題:1、把表b進(jìn)行Duplicate統(tǒng)計(jì)信息不齊全,認(rèn)為表b經(jīng)過(guò)條件過(guò)濾只有130條記錄,

50、實(shí)際上有200萬(wàn)記錄左右2、用ProductJoin連接算法表A有1000萬(wàn)記錄Duplicate連接,共進(jìn)行比較次數(shù):1000萬(wàn)200萬(wàn)3、最優(yōu)的Join方法?4、解決辦法:對(duì)表b收集統(tǒng)計(jì)相應(yīng)字段的信息必要的話,固化表A,并統(tǒng)計(jì)字段subs_id,優(yōu)化示例1explain,5)Weexecutethefollowingstepsinparallel.1)Wedoanall-AMPsRETRIEVEstepfromMMART.TB_MIS_DISTRICT_DIVISION_NEWbywayofanall-rowsscanwithnoresidualconditionsintoSpool1(a

51、ll_amps),whichisredistributedbyhashcodetoallAMPs.ThenwedoaSORTtoorderSpool1bythesortkeyinspoolfield1eliminatingduplicaterows.ThesizeofSpool1isestimatedwithlowconfidencetobe7,278,252rows.Theestimatedtimeforthisstepis1.46seconds.2)Wedoanall-AMPsRETRIEVEstepfromasinglepartitionofPMART.MID_SUBS_INFO_MON

52、withaconditionof(PMART.MID_SUBS_INFO_MON.CAL_Month=200709)witharesidualconditionof(PMART.MID_SUBS_INFO_MON.SUBS_STS_EFCT_Date=DATE2007-09-01)AND(PMART.MID_SUBS_INFO_MON.SUBS_STS_EFCT_Date=DATE2007-09-30)AND(。)intoSpool5(all_amps),whichisduplicatedonallAMPs.Theinputtablewillnotbecachedinmemory,butiti

53、seligibleforsynchronizedscanning.ThesizeofSpool5isestimatedwithlowconfidencetobe130rows.Theestimatedtimeforthisstepis0.04seconds.6)Wedoanall-AMPsJOINstepfromSpool5(LastUse)bywayofanall-rowsscan,whichisjoinedtoSpool1(LastUse)bywayofanall-rowsscan.Spool5andSpool1arejoinedusingaproductjoin,withajoincon

54、ditionof(SUBS_ID=SUBS_ID).TheresultgoesintoSpool4(all_amps),whichisbuiltlocallyontheAMPs.ThesizeofSpool4isestimatedwithindexjoinconfidencetobe14,556,503rows.Theestimatedtimeforthisstepis3.05seconds.,優(yōu)化示例2:數(shù)據(jù)分布與JOIN方法,腳本:rpt_mart_new_comm_mon0400.pl11小時(shí)Selectcasewhenb.CUST_Brand_IDisnullthen5020whenb

55、.CUST_Brand_IDin(2000,5010)then5020elseb.CUST_Brand_IDend,sum(COALESCE(b.Bas_CHRG_DUR_Unit,0)asThsy_Accum_New_SUBS_CHRG_DUR,。fromVTNEW_SUBS_THISYEARtinnerjoinVTDUR_MONbont.Subs_ID=b.Subs_IDleftjoinPVIEW.vw_MID_CDE_LONG_CALL_TYPE_LVLconb.Long_Type_Level_SECND=c.Long_Type_Level_SECNDleftjoinPVIEW.vw_M

56、ID_CDE_ROAM_TYPE_LVLdonb.Roam_Type_Level_SECND=d.Roam_Type_Level_SECNDgroupby1;,記錄數(shù)情況:t:580萬(wàn),b:9400萬(wàn),c:8,d:8都有統(tǒng)計(jì)信息主要問(wèn)題:1、連接順序:(bJoinc)joind)joint2、對(duì)表b進(jìn)行3次redistribute3、連接算法:MergeJoin4、原因:b表經(jīng)過(guò)匯總而得,雖然知有總記錄數(shù),但未知各個(gè)join字段的情況,解決辦法?固化b,并對(duì)b表連接字段進(jìn)行統(tǒng)計(jì)調(diào)整連接順序?,。2)Next,wedoanall-AMPsRETRIEVEstepfromLUSC.bbywayof

57、anall-rowsscanwithnoresidualconditionsintoSpool4(all_amps),whichisredistributedbyhashcodetoallAMPs.ThenwedoaSORTtoorderSpool4byrowhash.Theresultspoolfilewillnotbecachedinmemory.ThesizeofSpool4isestimatedwithhighconfidencetobe94,078,021rows.Theestimatedtimeforthisstepis17minutesand6seconds.3)Weexecut

58、ethefollowingstepsinparallel.1)Wedoanall-AMPsJOINstepfromSpool4(LastUse)bywayofaRowHashmatchscan,whichisjoinedtoPMART.MID_CDE_LONG_CALL_TYPE_LVLbywayofaRowHashmatchscanwithnoresidualconditions.Spool4andPMART.MID_CDE_LONG_CALL_TYPE_LVLareleftouterjoinedusingamergejoin,withajoinconditionof(Long_Type_Level_SECND=PMART.MID_CDE_LONG_CALL_TYPE_LVL.Long_Type_Level_SECND).TheresultgoesintoSpool5(all_amps),whichisbuiltlocallyontheAMPs.Theresultsp

溫馨提示

  • 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論