版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
ComputerArchitecture,Spring2008
TsinghuaUniversity
流水線基本技術
(Pipelining)
汪東升(Prof.DongshengWang)
wds@
清華大學計算機系科學與技術系
http:〃CPU.
21
1JL
ArelevantquestionTsinghuaUniversity
■Assumingyou'vegot:
□Onewasher(takes30minutes)
□Onedrier(takes40minutes)
□One“folder"(takes20minutes)
■Ittakes90minutestowash,dry,andfold1loadoflaundry.
□Howlongdoes4loadstake?
2
ComputerArchitecture,Spring2008
TheslowwayTsinghuaUniversity
6PM7891011Midnight
Time
304020304020304020304020
R7
CD
夕____%_
■Ifeachloadisdonesequentiallyittakes6hours
3
ComputerArchitecture,Spring2008
LaundryPipeliningTsinghuaUniversity
■Starteachloadassoonaspossible
□Overlaploads
6PM7891011Midnight
Time
30?40?-40l"40'-40W
^^51?
■Pipelinedlaundrytakes3.5hours
ComputerArchitecture,Spring2008
PipeliningLessonsTsinghuaUniversity
?Pipeliningdoesn'thelplatencyof
6PM789
singleload,ithelpsthroughputof
Timeentireworkload
?Pipelineratelimitedbyslowest
3040Tb2-40pipelinestage
■Multipletasksoperating
simultaneouslyusingdifferent
resources
■Potentialspeedup=Numberpipe
stages
■Unbalancedlengthsofpipestages
reducesspeedup
■Timeto“fill”pipelineandtimeto“drain”
itreducesspeedup
5
ComputerArchitecture,Spring2008
PipeliningisnotjustMultiprocessingTsinghuaUniversity
■Pipeliningdoesinvolveparallelprocessing,butinaspecificway.
■Bothmultiprocessingandpipeliningrelatetotheprocessingofmultiple
“things"usingmultiple"functionalunits"
□Multiprocessingimplieseachthingisprocessedentirelybyasingle
functionalunit
■e.g.,multiplelanesatthesupermarket
□Inpipelining,eachthingisbrokenintoasequenceofpieces,where
eachpieceishandledbyadifferent(specialized)functionalunit.
■Supermarketanalogy?
■Pipeliningandmultiprocessingarenotmutuallyexclusive
□Modernprocessorsdoboth,withmultiplepipelines(e.g.,
superscalar)
6
ComputerArchitecture,Spring2008
PipeliningTsinghuaUniversity
■Pipeliningisageneral-purposeefficiencytechnique
□Itisnotspecifictoprocessors
■Pipeliningisusedin:
□Assemblylines
□Bucketbrigades
□Fastfoodrestaurants
■PipeliningisusedinotherCSdisciplines:
□Networking
□Serversoftwarearchitecture
■Usefultoincreasethroughputinthepresenceoflonglatency
□Moreonthatlater...
7
ComputerArchitecture,Spring2008
InstructionexecutionreviewTsinghuaUniversity
■ExecutingaMIPSinstructioncantakeuptofivesteps.
StepNameDescription
InstructionFetchIFReadaninstructionfrommemory.
InstructionDecodeIDReadsourceregistersandgeneratecontrolsignals.
ExecuteEXComputeanR-typeresultorabranchoutcome.
MemoryMEMReadorwritethedatamemory.
WritebackWBStorearesultinthedestinationregister.
■However,aswesaw,notallinstructionsneedallfivesteps.
InstructionStepsrequired
beqIFIDEX
R-typeIFIDEXWB
swIFIDEXMEM
IwIFIDEXMEMWB
8
ComputerArchitecture,Spring2008
浦多又承
Single-cycledatapathdiagramTsinghuaUniversity
■Howlongdoesittaketoexecuteeachinstruction?
9
ComputerArchitecture,Spring2008
Example:InstructionFetch(IF)TsinghuaUniversity
■LefsquicklyreviewhowIwisexecutedinthesingle-cycledatapath.
■WeUIignorePCincrementingandbranchingfornow.
■IntheInstructionFetch(IF)step,wereadtheinstructionmemory.
10
ComputerArchitecture,Spring2008
InstructionDecode(ID)TsinghuaUniversity
■TheInstructionDecode(ID)stepreadsthesourceregisterfrom
theregisterfile.
11
ComputerArchitecture,Spring2008
浦多又承
Execute(EX)TsinghuaUniversity
■Thethirdstep,Execute(EX),computestheeffective
memoryaddressfromthesourceregisterandthe
instruction'sconstantfield.
RegWrite
12
ComputerArchitecture,Spring2008
浦多又承
Memory(MEM)TsinghuaUniversity
■TheMemory(MEM)stepinvolvesreadingthedata
memory,fromtheaddresscomputedbytheALU.
RegWrite
13
ComputerArchitecture,Spring2008
Writeback(WB)TsinghuaUniversity
■Finally,intheWriteback(WB)step,thememory
valueisstoredintothedestinationregister.
RegWrite
14
ComputerArchitecture,Spring2008
AbunchoflazyfunctionalunitsTsinghuaUniversity
■Noticethateachexecutionstepusesadifferentfunctionalunit.
■Inotherwords,themainunitsareidleformostofthe8nscycle!
□TheinstructionRAMisusedforjust2nsatthestartofthe
cycle.
□RegistersarereadonceinID(1ns),andwrittenonceinWB
(1ns).
□TheALUisusedfor2nsnearthemiddleofthecycle.
□Readingthedatamemoryonlytakes2nsaswell.
■Thafsalotofhardwaresittingarounddoingnothing.
15
ComputerArchitecture,Spring2008
PuttingthoseslackerstoworkTsinghuaUniversity
■Weshouldn'thavetowaitfortheentireinstructiontocompletebefore
wecanre-usethefunctionalunits.
■Forexample,theinstructionmemoryisfreeintheInstructionDecode
stepasshownbelow,so...
IdleInstructionDecode(ID)
________A_______
16
ComputerArchitecture,Spring2008
DecodingandfetchingtogetherTsinghuaUniversity
■Whydon'twegoaheadandfetchthenextinstructionwhilewe5re
decodingthefirstone?
Fetch2ndDecode1stinstruction
17
ComputerArchitecture,Spring2008
Executing,decodingandfetching
■Similarly,oncethefirstinstructionentersitsExecutestage,wecango
aheadanddecodethesecondinstruction.
■Butnowtheinstructionmemoryisfreeagain,sowecanfetchthethird
instruction!
Fetch3rdDecode2ndExecute1st
________A__________________________八_______________人_______________
18
ComputerArchitecture,Spring2008
MakingPipeliningWorkTsinghuaUniversity
■We'llmakeourpipeline5stageslong,tohandleeachofthefivesteps
inaloadinstructions(thelongestinstructionforthismachine)
□Stagesare:IF,ID,EX,MEM,andWB
■Wewanttosupportexecuting5instructionssimultaneously:onein
eachstage.
19
ComputerArchitecture,Spring2008
Breakdatapathinto5stagesTsinghuaUniversity
■Insertpipelineregisters
■Eachstagehasitsownfunctionalunits.
■Eachstagecanexecutein2ns
IFIDEXEMEMWB
20
ComputerArchitecture,Spring2008
800C6u-」ds-9」nlo①l-llo」<」2ndE0。
69
(dsAoe一寸⑤m
蕓
(ds*)9二ocls
L(dsAMLMls
LCCM
L蕓
QL(ds*)8
蕓
9MlASWxaQ-H=(ds*)寸318
09g寸
00-0^0
speo"|
PipeliningPerformanceTsinghuaUniversity
Clockcycle
123456789
Iw$t0,4($sp)IFIDEXMEMWB
Iw$t1,8($sp)IFIDEXMEMWB
Iw$t2,12($sp)IFIDEXMEMWB
Iw$t3,16($sp)IFIDEXMEMWB
J
Iw$t4,20($sp)IFIDEXMEMWB
filling
■Executiontimeonidealpipeline:
□timetofillthepipeline+onecycleperinstruction
□HowlongforNinstructions?
■Comparewithotherimplementations:
□SingleCycle:(8nsclockperiod)
■HowmuchfasterispipeliningforN=1000?
22
ComputerArchitecture,Spring2008
PipelineDatapath:ResourceRequirements
Clockcycle
123456789
lw$t0,4($sp)IFIDEXMEMWB
Iw$t1,8($sp)IFIDEXMEMWB
lw$t2,12($sp)IFIDEXMEMWB
Iw$t3,16($sp)IFIDEXMEMWB
Iw$t4,20($sp)IFIDEXMEMWB
■Weneedtoperformseveraloperationsinthesamecycle.
□IncrementthePCandaddregistersatthesametime.
□Fetchoneinstructionwhileanotheronereadsorwritesdata.
■Whatdoesthatmeanforourhardware?
23
ComputerArchitecture,Spring2008
Pipeliningotherinstructiontypes
■R-typeinstructionsonlyrequire4stages:IF,ID,EX,andWB
□Wedon5tneedtheMEMstage
■WhathappensifwetrytopipelineloadswithR-type
instructions?
Clockcycle
123456789
add$sp,$sp,-4IFIDEXWB
sub$v0,$a0,$a1IFIDEXWB
Iw$t0,4($sp)IFIDEXMEMWB
or$s0,$s1,$s2IFIDEXWB
Iw$t1,8($sp)IFIDEXMEMWB
24
ComputerArchitecture,Spring2008
浦多又承
ImportantObservationTsinghuaUniversity
■Eachfunctionalunitcanonlybeusedonceperinstruction
■Eachfunctionalunitmustbeusedatthesamestageforall
instructions:
LoadusesRegisterFile'sWritePortduringits5thstage
R-typeusesRegisterFile'sWritePortduringits4thstage
Clockcycle
123456789
add$sp,$sp,-4IFIDEXWB
sub$v0,$a0,$a1IFIDEXWB
Iw$t0,4($sp)IFIDEXMEMWB
or$s0,$s1,$s2IFIDEXWB
Iw$t1,8($sp)IFIDEXMEMWB
25
ComputerArchitecture,Spring2008
Asolution:InsertNOPstages
■Enforceuniformity
□Makeallinstructionstake5cycles.
□Makethemhavethesamestages,inthesameorder
■Somestagesw川donothingforsomeinstructions
Rtype|IF|ID|EX|NOP|WB
Clockcycle
123456789
add$sp,$sp,-4IFIDEXNOPWB
sub$v0,$a0,$a1IFIDEXNOPWB
Iw$t0,4($sp)IFIDEXMEMWB
or$s0,$s1,$s2IFIDEXNOPWB
Iw$t1,8($sp)IFIDEXMEMWB
■StoresandBrancheshaveNOPstages,too...
storeIFIDEXMEMNOP
branchIFIDEXNOPNOP|
26
ComputerArchitecture,Spring2008
浦多又承
SummaryTsinghuaUniversity
■Pipeliningattemptstomaximizeinstructionthroughputby
overlappingtheexecutionofmultipleinstructions.
■Pipeliningoffersamazingspeedup.
□Inthebestcase,oneinstructionfinishesoneverycycle,and
thespeedupisequaltothepipelinedepth.
■Thepipelinedatapathismuchlikethesingle-cycleone,but
withaddedpipelineregisters
Eachstageneedsisownfunctionalunits
■Nexttimewe'llseethedatapathandcontrol,andwalkthrough
anexampleexecution.
27
ComputerArchitecture,Spring2008
PipelineddatapathandcontrolTsinghuaUniversity
■Lasttimeweintroducedthemainideasofpipelining.
■Todaywellseeabasicimplementationofapipelinedprocessor.
□Thedatapathandcontrolunitsharesimilaritieswiththesingle-
cycleimplementationthatwealreadysaw.
□Anexampleexecutionhighlightsimportantpipeliningconcepts.
■Infuturelectures,we'lldiscussseveralcomplicationsofpipelining
thatwe'rehidingfromyoufornow.
28
ComputerArchitecture,Spring2008
ComputerArchitecture,Spring2008
TsinghuaUniversity
Pipelineddatapathand
control
PipeliningconceptsTsinghuaUniversity
■Apipelinedprocessorallowsmultipleinstructionstoexecuteatonce,
andeachinstructionusesadifferentfunctionalunitinthedatapath.
■Thisincreasesthroughput,soprogramscanrunfaster.
□Oneinstructioncanfinishexecutingoneveryclockcycle,and
simplerstagesalsoleadtoshortercycletimes.
Clockcycle
123456789
Iw$t0,4($sp)IFIDEXMEMWB
sub$v0,$a0,$a1IFIDEXMEMWB
and$t1,$t2,$t3IFIDEXMEMWB
or$s0,$s1,$s2IFIDEXMEMWB
add$t5,$t6,$0IFIDEXMEMWB
30
ComputerArchitecture,Spring2008
PipelinedDatapathTsinghuaUniversity
■Thewholepointofpipeliningistoallowmultipleinstructionstoexecuteatthe
sametime.
■Wemayneedtoperformseveraloperationsinthesamecycle.
□IncrementthePCandaddregistersatthesametime.
□Fetchoneinstructionwhileanotheronereadsorwritesdata.
Clockcycle
123456789
Iw$t0,4($sp)IFIDEXMEMWB
sub$v0,$a0,$a1IFIDEXMEMWB
and$t1,$t2,$t3IFIDEXMEMWB
or$s0,$s1,$s2IFIDEXMEMWB
add$t5,$t6,$0IFIDEXMEMWB
■Thus,likethesingle-cycledatapath,apipelinedprocessorwillneedto
duplicatehardwareelementsthatareneededseveraltimesinthesameclock
cycle.
□Whatabouttheregisterfile?
31
ComputerArchitecture,Spring2008
OneregisterfileisenoughTsinghuaUniversity
■WeneedonlyoneregisterfiletosupportboththeIDandWBstages.
ReadRead
register1data1
ReadRead
register2data2
Write
register
Registers
Write
data
■Readsandwritesgotoseparateportsontheregisterfile.
■Wealreadytookadvantageofthispropertyinoursingle-cycleCPU.
32
ComputerArchitecture,Spring2008
浦多又承
TsinghuaUniversity
Single-cycledatapath,slightlyrearranged
ComputerArchitecture,Spring200833
PipelineregistersTsinghuaUniversity
■Welladdintermediateregisterstoourpipelineddatapath.
■There'salotofinformationtosave,however.We'llsimplifyourdiagramsby
drawingjustonebigpipelineregisterbetweeneachstage.
■Theregistersarenamedforthestagestheyconnect.
IF/IDID/EXEX/MEMMEM/WB
NoregisterisneededaftertheWBstage,becauseafterWBtheinstructionis
done.
34
ComputerArchitecture,Spring2008
PipelineddatapathTsinghuaUniversity
u
I
PCSrc
EX/MEMMEM/WB
Shift
RegWriteleft2
ReadRead
register1data1MemWrite
Zero
ReadRead
register2data2Resultl—>Address
Write
Data
registerMemToReg
memory
RegistersALUOp
WriteY
dataALUSrcWriteRead
datadata
Instr[15-0]Sign
RegDst
extendMemRead
Instr[20-16]
Instr[15-11]
35
ComputerArchitecture,Spring2008
PropagatingvaluesforwardTsinghuaUniversity
■Anydatavaluesrequiredinlaterstagesmustbepropagatedthrough
thepipelineregisters.
■Themostextremeexampleisthedestinationregister.
□Therdfieldoftheinstructionword,retrievedinthefirststage(IF),
determinesthedestinationregister.Butthatregisterisn'tupdated
untilthefifthstage(WB).
□Thus,therdfieldmustbepassedthroughallofthepipeline
stages,asshowninredonthenextslide.
■Noticethatwecan'tkeepasingle^instructionregister,becausethe
pipelinedmachineneedstofetchanewinstructioneveryclockcycle.
36
ComputerArchitecture,Spring2008
ThedestinationregisterTsinghuaUniversity
u
I
PCSrc
EX/MEMMEM/WB
RegWriteleft2
ReadRead
register1data1MemWrite
ReadRead
register2data2Result—>■>Address
Write
Data
registerMemToReg
memory
RegistersALUOp
WriteY
dataALUSrcWriteRead
■>datadata
Instr[15-0]Sign
RegDst
extendMemRead
Instr[20-16]
Instr[15-11]
37
ComputerArchitecture,Spring2008
Whataboutcontrolsignals?TsinghuaUniversity
■Thecontrolsignalsaregeneratedinthesamewayasinthesingle-
cycleprocessor——afteraninstructionisfetched,theprocessor
decodesitandproducestheappropriatecontrolvalues.
■Butjustlikebefore,someofthecontrolsignalswillnotbeneeded
untilsomelaterstageandclockcycle.
■Thesesignalsmustbepropagatedthroughthepipelineuntilthey
reachtheappropriatestage.Wecanjustpasstheminthepipeline
registers,alongwiththeotherdata.
■Controlsignalscanbecategorizedbythepipelinestagethatuses
them.
38
ComputerArchitecture,Spring2008
PipelineddatapathandcontrolTsinghuaUniversity
1
0K-I
ID/EX
EX/MEM
PCSrc
Control
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 業(yè)務透明度及誠信經營承諾書3篇
- 我與書中的英雄一起成長的故事8篇
- 高品質家居產品品質承諾書9篇范文
- 城市管線拆除施工安全技術方案
- 快速消費品市場營銷策略及執(zhí)行方案
- 責任共擔筑夢未來承諾書6篇范文
- 2026年河南經貿職業(yè)學院高職單招職業(yè)適應性考試備考試題含答案解析
- 2026年湘南幼兒師范高等??茖W校單招職業(yè)技能筆試備考題庫含答案解析
- 2026年遼寧軌道交通職業(yè)學院單招職業(yè)技能筆試備考題庫含答案解析
- 2026年重慶經貿職業(yè)學院高職單招職業(yè)適應性測試參考題庫含答案解析
- 2025-2026學年遼寧省沈陽市和平區(qū)七年級(上)期末語文試卷(含答案)
- 2026廣東廣州開發(fā)區(qū)統計局(廣州市黃埔區(qū)統計局)招聘市商業(yè)調查隊隊員1人參考題庫完美版
- 君山島年度營銷規(guī)劃
- 10月住院醫(yī)師規(guī)范化培訓《泌尿外科》測試題(含參考答案解析)
- 初中英語寫作教學中生成式AI的應用與教學效果評估教學研究課題報告
- 期末測試卷(試卷)2025-2026學年三年級數學上冊(人教版)
- 2025年福建江夏學院毛澤東思想和中國特色社會主義理論體系概論期末考試模擬題及答案1套
- DB32T 5132.3-2025 重點人群職業(yè)健康保護行動指南 第3部分:醫(yī)療衛(wèi)生人員
- 急性左心衰課件教學
- 押題地理會考真題及答案
- DB44-T 2668-2025 高速公路服務區(qū)和停車區(qū)服務規(guī)范
評論
0/150
提交評論