版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、DataStage 入門(mén)培訓(xùn),講師:邱明偉 日期:2010-03-01 東南融通 版權(quán)所有,2,Agenda,DataStage介紹 DataStage開(kāi)發(fā) DataStage四個(gè)客戶(hù)端的使用 DataStage常用組件使用 DataStage常用命令 練習(xí),DataStage 介紹,4,Ascential Platform,5,What is DataStage?,Design jobs for Extraction, Transformation, and Loading (ETL) Ideal tool for data integration projects such as, dat
2、a warehouses, data marts, and system migrations Import, export, create, and managed metadata for use within jobs Schedule, run, and monitor jobs all within DataStage Administer your DataStage development and execution environments,DataStage開(kāi)發(fā),7,DataStage Server and Clients,8,DataStage Server and Cli
3、ents,Administrator Administers DataStage projects and conducts housekeeping on the server Designer Creates DataStage jobs that are compiled into executable programs Director Used to run and monitor the DataStage jobs Manager Allows you to view and edit the contents of the repository,DataStageAdminis
4、trator,10,DataStage Administrator,In DataStage all development work is done within a project. Projects are created during installation and after installation using Administrator. Each project is associated with a directory. The directory stores the objects (jobs, metadata, custom routines, etc.) cre
5、ated in the project. Before you can work in a project you must attach to it (open it). You can set the default properties of a project using DataStage Administrator,11,DataStage Administrator,Use the Administrator to specify general server defaults, add and delete projects, and to set project proper
6、ties. Use the Administrator Project Properties window to: Set job monitoring limits and other Director defaults on the General tab. Set user group privileges on the Permissions tab. Enable or disable server-side tracing on the Tracing tab. Specify a user name and password for scheduling jobs on the
7、Schedule tab. Specify hashed file stage read and write cache sizes on the Tunables tab,DataStageManager,13,DataStage Manager,DataStage Manager manages two different types of objects: Metadata describing sources and targets: - Called table definitions in Manager. These are not to be confused with rel
8、ational tables. DataStage table definitions are used to describe the format and column definitions of any type of source: sequential, relational, hashed file, etc. - Table definitions can be created in Manager or Designer and they can also be imported from the sources or targets they describe.,14,Da
9、taStage Manager, DataStage components - Every object in DataStage (jobs, routines, table definitions, etc.) is stored in the DataStage repository. Manager is the interface to this repository. - DataStage components, including whole projects, can be exported from and imported into Manager.,15,DataSta
10、ge Manager,Any object in Manager can be exported to a file Can export whole projects Use for backup Sometimes used for version control Can be used to move DataStage objects from one project to another Use to share DataStage jobs and projects with other developers,16,DataStage Manager,Import Procedur
11、e In Manager, click “ImportDataStage Components” Select DataStage objects for import,17,DataStage Manager,Export Procedure In Manager, click “ExportDataStage Components” Select DataStage objects for export Specified type of export: DSX, XML Specify file path on client machine,DataStageDirector,19,Da
12、taStage Director,Can schedule, validating, and run jobs Can be invoked from DataStage Manager or Designer Clear job log Set Director options Row limits Abort after x warnings,20,Director Log View,Click the Log button in the toolbar to view the job log. The job log records events that occur during th
13、e execution of a job. These events include control events, such as the starting, finishing, and aborting of a job; informational messages; warning messages; error messages; and program-generated messages.,21,DataStage Director,DataStageDesinger,23,What Is a Job?,Executable DataStage program Created
14、in DataStage Designer, but can use components from Manager Built using a graphical user interface Compiles into Orchestrate shell language (OSH),24,Create New Job,Several types of DataStage jobs: Parallel this course will concentrate on parallel jobs. Job Sequence used to create jobs that control ex
15、ecution of other jobs.,25,Create New Job,26,Components Introduce,Sequential file 功能特點(diǎn):適用于一般順序文件(定長(zhǎng)或不定長(zhǎng)),可識(shí)別文本文件或IBM大機(jī)ebcdic文件。 使用要點(diǎn): 按照命名規(guī)范命名 點(diǎn)住文件,雙擊鼠標(biāo),在general說(shuō)明此文件內(nèi)容,格式,存儲(chǔ)目錄等 修改文件屬性,文件名稱(chēng),reject方式,27,Sequential file,28,Sequential file,修改文件格式,比如記錄結(jié)束符是什么,字段分隔符,字符串是用什么區(qū)別等,29,Sequential file,30,Sequent
16、ial file,輸入此文件字段內(nèi)容,31,Annotation,功能特點(diǎn):一般用于注釋?zhuān)衫闷浔尘邦伾趈ob中分顏色區(qū)別不同功能塊,32,Annotation,33,Copy Stage,功能說(shuō)明:Copy Stage可以有一個(gè)輸入,多個(gè)輸出。它可以在輸出時(shí)改變字段的順序,但是不能改變字段類(lèi)型。,34,Copy Stage,35,Filter Stage,功能說(shuō)明:Filter Stage只有一個(gè)輸入,可以有多個(gè)輸出。根據(jù)不同的篩選條件,可以將數(shù)據(jù)輸出到不同的output link,36,Filter Stage,37,Sort Stage,功能說(shuō)明:只能有一個(gè)輸入及一個(gè)輸出,按照指定的
17、Key值進(jìn)行排列??梢赃x擇升序還是降序,是否去除重復(fù)的數(shù)據(jù)等等,38,Sort Stage,39,Sort Stage,Option具體說(shuō)明: Allow Duplicates: 是否去除重復(fù)數(shù)據(jù)。為False時(shí),只選取一條數(shù)據(jù),當(dāng)Stable Sort為T(mén)rue時(shí),選取第一條數(shù)據(jù)。當(dāng)Sort Unility為UNIX時(shí)此選項(xiàng)無(wú)效。 Sort Utility: 選擇排序時(shí)執(zhí)行應(yīng)用程序,可以選擇DataStage內(nèi)建的命令或者Unix的Sort命令 Output Statistics: 是否輸出排序統(tǒng)計(jì)信息到j(luò)ob日志 Stable Sort: 是否對(duì)數(shù)據(jù)進(jìn)行二次整理,40,Sort Stage
18、,Create Cluster Key Change Column:是否為每條記錄創(chuàng)建一個(gè)新的字段:clusterKeyChange。當(dāng)Sort Key Mode為Dont Sort(Previously Sorted) 或 Dont Sort (Previously Grouped)時(shí),對(duì)于第一條記錄該字段被設(shè)置為1,其余的記錄設(shè)置為0。 Create Key Change Column:是否為每一條記錄創(chuàng)建一個(gè)新的字段KeyChange,41,Remove Duplicates Stage,功能說(shuō)明: 輸入根據(jù)關(guān)鍵字分好類(lèi)的有序數(shù)據(jù),去除所有記錄中關(guān)鍵字重復(fù)的記錄,通常與sort stag
19、e配合使用,42,Remove Duplicates Stage,43,Tansformer Stage,功能說(shuō)明:一個(gè)功能極為強(qiáng)大的Stage。有一個(gè)input link,多個(gè)output link,可以將字段進(jìn)行轉(zhuǎn)換,也可以通過(guò)條件來(lái)指定數(shù)據(jù)輸出到那個(gè)output link。在開(kāi)發(fā)過(guò)程中可以使用拖拽,44,Tansformer Stage,45,Tansformer Stage,Constraint及Derivation的區(qū)別: Constraint通過(guò)限定條件使符合條件的數(shù)據(jù)輸出到這個(gè)output link。 Derivation通過(guò)定義表達(dá)式來(lái)轉(zhuǎn)換字段值。 在Constraint及De
20、rivation中可以使用Job parameters及Stage Variables。 注意:Transformer Stage功能強(qiáng)大,但在運(yùn)行過(guò)程中是以犧牲速度為代價(jià)的。在只有簡(jiǎn)單的變換,拷貝等操作時(shí),最好用Modify Stage,Copy Stage,F(xiàn)ilter Stage等來(lái)替換Transformer Stage,46,LookUp Stage,功能說(shuō)明:LookUp Stage把數(shù)據(jù)讀入內(nèi)存執(zhí)行查詢(xún)操作,將匹配的字段輸出,或者在在符合條件的記錄中修改或加入新的字段。,47,LookUp Stage,48,Join Stage,功能說(shuō)明:將多個(gè)表連接后輸出,49,Aggregat
21、or Stage,功能說(shuō)明: 將輸入的數(shù)據(jù)分組,計(jì)算各組數(shù)據(jù)的總和或者按組進(jìn)行其他的操作,最后將結(jié)果數(shù)據(jù)輸出到其他的stage,50,Aggregator Stage,51,Aggregator Stage,52,Change Capture Stage,功能特點(diǎn):Change Capture Stage有兩個(gè)輸入,分別標(biāo)記為before link 及 after link。輸出的數(shù)據(jù)表示before link和after link的區(qū)別,我們稱(chēng)作change set。Change Capture Stage可以和Change Apply Stage配合使用來(lái)計(jì)算after set,53,Ch
22、ange Capture Stage,54,Change Capture Stage,key及value的說(shuō)明 key值是比較的關(guān)鍵值,value是當(dāng)key值相同是作進(jìn)一步比較用的。 change mode選項(xiàng)說(shuō)明: All keys,Explicit Values 需要指定value,其余字段為key Explicit Keys&Values key及value都需要指定 Explicit Keys,All Values 需要指定key,其余的字段為value,55,Funnel Stage,功能說(shuō)明:將多個(gè)字段相同的數(shù)據(jù)文件合并為一個(gè)單獨(dú)的文件輸出,56,Funnel Stage,57,F
23、unnel Stage,合并策略說(shuō)明 Continuous Funnel:從每一個(gè)input link中循環(huán)取一條記錄 Sort Funnel:按照Key值排序合并輸出 Sequence:先輸出第一個(gè)input link的數(shù)據(jù),輸出完畢后再輸出第二個(gè)input link的數(shù)據(jù),依此類(lèi)推,直到結(jié)束。(此時(shí)可以通過(guò)調(diào)整link Ordering調(diào)整輸出順序),DataStage常用命令介紹,59,dsjob,執(zhí)行Job dsjob -run -mode 指定狀態(tài),默認(rèn)為NORMAL -param = 指定參數(shù)運(yùn)行,不指定使用默認(rèn)值 -warn 限制warning的日志行數(shù) -rows 限制日志行數(shù)
24、 -wait 等待作業(yè)運(yùn)行完 -opmetadata 產(chǎn)生metadata -disableprjhandler -disablejobhandler -jobstatus 等待作業(yè)返回運(yùn)行狀態(tài) -userstatus 等待作業(yè)返回用戶(hù)定義的狀態(tài) -local 使用本地腳本調(diào)起job,環(huán)境變量使用腳本里面定義的環(huán)境變量。 -useid 是否使用jobid(使用dsjob -jobid定義別名) ,60,dsjob,停止Job dsjob -stop -useid 如果為作業(yè)定了了別名(使用dsjob -jobid),就使用-useid告訴系統(tǒng)后面跟的是作業(yè)的別名。 列出全部工程 dsjob lprojects 列出project下的全部Job dsjob ljobs project,61,dsjob,列出某個(gè)Job的實(shí)例調(diào)用情況 dsjob linvocations project job 列出某個(gè)Job的所有stage dsjob -lstages -useid 列出沒(méi)個(gè)Stage的LIN
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 食品生產(chǎn)落料處理制度
- 商品生產(chǎn)臺(tái)賬制度
- 定期安全生產(chǎn)檢查制度
- 生產(chǎn)巡檢記錄管理制度
- 糕點(diǎn)生產(chǎn)質(zhì)量管理制度
- 機(jī)務(wù)安全生產(chǎn)基本制度
- 2026北京第二外國(guó)語(yǔ)學(xué)院第一批非事業(yè)編制人員招聘5人參考考試試題附答案解析
- 安全生產(chǎn)管理人制度
- 蔬菜平行生產(chǎn)管理制度
- 企業(yè)生產(chǎn)車(chē)間門(mén)管理制度
- 2025 年度VC PE 市場(chǎng)數(shù)據(jù)報(bào)告 投中嘉川
- 2026中國(guó)電信四川公用信息產(chǎn)業(yè)有限責(zé)任公司社會(huì)成熟人才招聘?jìng)淇碱}庫(kù)及答案詳解(考點(diǎn)梳理)
- 2025年專(zhuān)利管理與保護(hù)操作手冊(cè)
- 2025云南山海遊旅游集團(tuán)有限公司招聘10人考試備考題庫(kù)及答案解析
- 2025年網(wǎng)約車(chē)司機(jī)收入分成合同
- 2026年海南財(cái)金銀河私募基金管理有限公司招聘?jìng)淇碱}庫(kù)參考答案詳解
- 2026年GRE數(shù)學(xué)部分測(cè)試及答案
- 浙江省寧波市鎮(zhèn)海中學(xué)2026屆高二上數(shù)學(xué)期末教學(xué)質(zhì)量檢測(cè)模擬試題含解析
- (2025年)電力交易員練習(xí)試題附答案
- 2026年咨詢(xún)工程師現(xiàn)代咨詢(xún)方法與實(shí)務(wù)模擬測(cè)試含答案
- 甘肅省酒泉市2025-2026學(xué)年高一上學(xué)期期末語(yǔ)文試題(解析版)
評(píng)論
0/150
提交評(píng)論