事務(wù)處理系統(tǒng) SwitchTx - Scalable In-Network Coordination for Distributed Transaction Processing_第1頁
事務(wù)處理系統(tǒng) SwitchTx - Scalable In-Network Coordination for Distributed Transaction Processing_第2頁
事務(wù)處理系統(tǒng) SwitchTx - Scalable In-Network Coordination for Distributed Transaction Processing_第3頁
事務(wù)處理系統(tǒng) SwitchTx - Scalable In-Network Coordination for Distributed Transaction Processing_第4頁
事務(wù)處理系統(tǒng) SwitchTx - Scalable In-Network Coordination for Distributed Transaction Processing_第5頁
已閱讀5頁,還剩22頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

SwitchTx:ScalableIn-NetworkCoordinationforDistributedTransactionProcessingJunruLi?,YouyouLu?*,YimingZhang?,QingWang?,ZhuoCheng?,KejiHuang?,JiwuShu??{lijr19,q-wang18}@,?{luyouyou,shujw}@?zhangyiming@,?{chengzhuo,huangkeji}@??TsinghuaUniversity,?NICEXLab,XiamenUniversity,?HuaweiStorageProductLineABSTRACTOnline-transaction-processing(OLTP)applicationsrequiretheun-derlyingstoragesystemtoguaranteeconsistencyandserializabil-ityfordistributedtransactionsinvolvinglargenumbersofservers,whichtendstointroducehighcoordinationcostandcauselowsys-temperformance.In-networkcoordinationisapromisingapproachtoalleviatethisproblem,whichleveragesprogrammableswitchestomoveapieceofcoordinationfunctionalityintothenetwork.ThispaperpresentsafastandscalabletransactionprocessingsystemcalledSwitchTx.AtthecoreofSwitchTxisadecentralizedmulti-switchin-networkcoordinationmechanism,whichleveragesmodernswitches’programmabilitytoreducecoordinationcostwhileavoid-ingthecentral-switch-causedproblemsinthestate-of-the-artEristransactionprocessingsystem.SwitchTxabstractsvariouscoordi-nationtasks(e.g.,locking,validating,andreplicating)asin-switchgather-and-scatter(GaS)operations,andooadscoordinationtoatreeofswitchesforeachtransaction(insteadoftoacentralswitchforalltransactions)wheretheclientandtheparticipantsconnecttotheleaves.Moreover,tocontrolthetransactiontra伍cintelli-gently,SwitchTxreordersthecoordinationmessagesaccordingtotheirsemanticsandredesignsthecongestioncontrolcombinedwithadmissioncontrol.EvaluationshowsthatSwitchTxoutperformscurrenttransactionprocessingsystemsinvariousworkloadsbyupto2.16×inthroughput,40.4%inlatency,and41.5%inlocktime.PVLDBReferenceFormat:JunruLi,YouyouLu,YimingZhang,QingWang,ZhuoCheng,KejiHuang,JiwuShu.SwitchTx:ScalableIn-NetworkCoordinationforDistributedTransactionProcessing.PVLDB,15(11):2881-2894,2022.doi:10.14778/3551793.35518381INTRODUCTIONTransactionswithconsistencyandserializability[1]provideasimplebutpowerfulabstractionforprogrammingandreasoningaboutdis-tributedstoragesystems,whereasingleserverneverfailsandalwaysexecutesonetransactionatatimeinanorderconsistentwiththerealdistributedexecution.Fastandscalablein-memorytransactionpro-cessingisthebasisformanyonline-transaction-processing(OLTP)*YouyouLuisthecorrespondingauthor(luyouyou@).ThisworkislicensedundertheCreativeCommonsBY-NC-ND4.0InternationalLicense.Visit/licenses/by-nc-nd/4.0/toviewacopyofthislicense.Foranyusebeyondthosecoveredbythislicense,obtainpermissionbyemailinginfo@.Copyrightisheldbytheowner/author(s).PublicationrightslicensedtotheVLDBEndowment.ProceedingsoftheVLDBEndowment,Vol.15,No.11ISSN2150-8097.doi:10.14778/3551793.3551838applicationslikewebservice,stockexchange,ande-commerce.Acommonwaytosupportthislarge-scaletransactionprocessingispartitioningdataintoshardsspreadingoverserverswithconcurrencycontrol[1–6].Datapartitioningnecessitatesdistributedtransactionprocessing,whichtendstocausehighcoordinationcostincludingnetworkcommunication,locking/unlocking,datareplicationaswellasabortsandretries.Therehavebeennumerousstudiesforalleviatingcoordinationcostindistributedtransactionprocessing,e.g.,bydesigningnewconcurrencycontrolandreplicationprotocols[7–12],optimizingforspeciic(independent)transactions[13–16],partitioningdatamoree伍cientlytoreducecontention[17–21],andleveragingfastnetworksthatbypassOSkernel[22–27].However,theseproposalsessentiallyrequireheavyinvolvementofCPUcoresincoordinationandthusareine伍cientintransactionprocessing.Recentadvancesinprogrammablenetworkhardware[28–31]providenewopportunitiesforin-networkcoordinationbymovingthecoordinationfunctionalityintothenetwork.Eris[14],astate-of-the-arttransactionsystem,usesacentralswitchormiddleboxtogeneratemultiplesequencenumbersforeachindependenttransac-tiontoreducethecoordinationcost.Althoughefectivelyimprovingtransactionperformanceinasmallscale,thecentralizedsequencingmechanism(i)boundstheoverallsystemthroughputtothecapabilityofasingleswitch,(ii)substantiallyincreasestheprocessinglatencyforthescenariowherethesingleswitchdoesnotlocateonthepathfromclientstoservers,and(iii)limitsthetransactiontypesduetoswitch’shardwareconstraints.Further,thenetworkstackonlyofersgeneral-purposetra伍ccontrolthatdoesnotconsidertransactionsemantics,thusresultinginrequirementmismatches(i.e.,packetprocessingorderinthenetworkandtransactionprocessingorderindatabase).andfunctionredundancies(i.e.,congestioncontrolinthenetworkandadmissioncontrolinthedatabase)Inthispaper,wepresentanin-memorytransactionprocessingsystem,SwitchTx.AtthecoreofSwitchTxisanovelscalablein-networkcoordinationmechanism.Itleveragesswitches’programma-bilitytoreducecoordination(includingconcurrencycontrolandreplication)costwhileavoidingthecentral-switch-causedproblems.Italsointelligentlycontrolsthenetworktra伍c(i.e.,messageprocess-ingorderandyingmessagecount)basedontransactionsemantics.First,SwitchTxabstractsvariouscoordinationtasksasin-switchgather-and-scatter(GaS)operations,whereswitchesgatherthemes-sagesofatransactionphase,performstatetransitionofthestatemachinewhilemeetingconditions,scattermessagestoinishthecurrentphase,andrecyclethestatemachineforthenextphase(oranothertransaction).In-switchGaSnotonlyreducesthethecommu-nicationlengthbyhalfbutalsoeliminatesprocessingandqueuingP50P9900P50P9900Network+other50%100%23456750%100%#ofshardsinatxn(b)Txnlifetimebreakdown(a)P50(b)TxnlifetimebreakdownFigure1:Coordinationcost.overheadinsoftware.Second,diferentfromEriswhichreliesonacentralswitchforsequencingalltransactions,foreachtransac-tionSwitchTxooadsthecoordinationtasktoatreeofswitcheswherethetransaction’sclientandparticipantsconnecttotheleaves.SwitchTxreducesround-tripsoftransactionprocessingbyexploit-ingthelocalityofmessagesandhasnoconstraintonthetransactiontypes.Third,SwitchTxcontrolsthenetworktra伍c(i.e.,throughputpressureandprocessingorder)intelligently.Tocontroltheprocess-ingorder,SwitchTxleveragestheprocessingqueuesinthenetworkstacktoreordertheconcurrentmessagesfromdiferenttransactionsaccordingtotheirsemantics;Tocontroltheyingmessagecountinthenetwork,SwitchTxredesignstheadmissioncontrolcombinedwiththenetworkcongestioncontrol.Tothebestofourknowledge,wearetheirsttoproposeascal-ablemulti-switchin-networkcoordinationmechanismfordistributedtransactionprocessing,whichooadsallcoordinationfunctionalitytomultipleprogrammableswitchesandcouplesnetworktra伍ccon-trolwithdistributedtransactionsemantics.WehaveimplementedaprototypeofSwitchTxusingBarefootToinoswitches.SwitchTxsupportsoptimisticconcurrencycontrol(OCC)andprimary-backupreplication.EvaluationwithvariousbenchmarksshowsthatSwitchTxoutperformscurrenttransactionprocessingsystemsbyupto2.16×inthroughput,40.4%inlatency,and41.5%inlocktime.2BACKGROUNDANDMOTIVATION2.1DistributedTransactionProcessingLarge-scaletransactionprocessingsystemspartitiondataintoshardsspreadingoverservers.Thissubsectionbrieyreviewsdataparti-tioningandcoordinationfordistributedtransactions.Datapartitioning.Eachservermanagesanexclusiveshardoftheentiredataset,andcoresinaservermanagethedataintwodiferentways.(i)Eachcoreistreatedasalogicalserver;dataofaserverisfur-therpartitionedtocores(i.e.,one-shard-per-coreapproach)[7,14].(ii)Coressharedataoftheserver(i.e.,one-shard-per-serverap-proach),usinglockorversionsynchronizationtocontrolconcurrentdataaccesses[23,32].SwitchTxfocusesontheone-shard-per-serverapproachandacceleratescoordinationamongservers.Coordinationfordistributedtransactions.Inalarge-scaledis-tributedtransactionsystem,concurrencycontrol(suchastwo-phaselockingandoptimisticconcurrencycontrol[4])andreplicationproto-cols,usuallyinducehighnetworkcoordinationcost.Two-phaselock-ing(2PL)useslocksandissuitableforhigh-contentionworkloads,butsufersfromthedeadlockproblem[1].Incontrast,optimisticconcurrencycontrol(OCC)irstexecutestheoperationsintheexe-cutephaseandthenhandlesconictsinthecommitphase,whichismoree伍cientforlow-contentionworkloads.OCCiswidelyadoptedinmoderndistributedtransactionsystems(includingSwitchTx)be-causeofitssimplicity[23,24,33].ButOCCneedsmorenetworkcoordination,therefore,SwitchTxusesprogrammableswitchestosolvethisproblem.InasystemthatusesOCCforconcurrencycontrolandprimary-backupreplicationforavailability,transactionsareprocessedthroughivephases,namely,theexecutephase,lockphase,validatephase,commitbackupphase,andcommitprimaryphase.Foreachtransac-tion,(i)theclientreadsrecordswithoutacquiringlocksandbufersdatainthewritesetintoaprivateworkspaceintheexecutephase;(ii)OCCdetectswrite-writeconictsandread-writeconictsinthelockphaseandvalidatephase,respectively.OCCacquireswritelocksduringthelockphase,andinthevalidatephase,OCCguaranteesthatthereaddataisnotchangedsincetheexecutephase;(iii)iftherearenoconictsthenthetransactionentersthecommit(backup/primary)phase,inwhichthetransactioninstallsdataatomicallyinthebackupserversandprimaryservers.2.2CoordinationCostDistributedtransactionprocessinghasmassivecostfornetworkco-ordination(e.g.,multipleroundtrips),whichisaperformancekiller.Toillustratetheperformanceimpact,weuseamicrobenchmarktoevaluateFaSST[23],astate-of-the-artdistributedtransactionpro-cessingsystem.Inthisbenchmark,weuse8servers,eachrunning24threads;wedisablereplication,wheretransactionsdonotneedthecommitbackupphase;threadsaresymmetric:eachofthembothis-suesnewtransactionsandhandlesnetworkrequeststoparticipateintransactionsissuedbythreadsinotherservers.Eachtransactionran-domlyreadsandwrites8records.Byvaryingthenumberofserversinvolvedineachtransactionfrom2to8,thethroughputdegradesfrom7.1Mopsto3.1Mops(43.6%);theP99taillatencyincreasesfrom60.9usto126.2us(2.07×).Speciically,thecoordinationcostmainlyincludesthefollowingtwoaspects.First,coordinationfordistributedtransactionsnotonlyinduceshighprocessinglatencybutalsolengthensthelocktime(i.e.,thetimebetweenacquiringandreleasingalock)andversionvalidationtime(i.e.,thetimebetweenexecutephaseandvalidatephase),leadingtoahighabortrate.Werefertothesetimesasthecontentionspan[20]ofatransaction.Tounderstandtheimpactoncontentionspanfromcoordination,weevaluatethelocktimeinthelow-contentionwork-load,toexcludetheinterferencefromtransactionabort.AsshowninFigure1(a),whenthenumberofdatashardsinvolvedinatransactiongrows,theP50locktimeincreasesby2.13×,andtheP99locktimeincreasesby3.97×.Second,coordinationtaskswastepreciousCPUcycles,eventhoughtheyaresimpleandonlyincludedistributingandcollectingsmallnetworkmessages.Figure1(b)showsthelatencybreakdownofthetransactioncommittingprocedure;weobservethatthesoftwareoverheadsforcoordinationare15.0%and47.1%under10Gbpsand100Gbpsnetwork,respectively.Weconcludethat,withafasternet-work,thecoordinationcostofsoftwaredesignsisrelativelyheavierandleavesthehigh-speednetworkunder-exploited.2.3ProgrammableSwitchesFigure2showsthearchitectureofprogrammableswitches.TheswitchesprovideexiblepipelineswhereuserscandesignprotocolsEgress…QueuesServer1primarybackupbufferServer2primarybackupbufferProgrammableSwitchesEgress…QueuesServer1primarybackupbufferServer2primarybackupbufferProgrammableSwitchessParserMatch Match ActionActionRegistersRegs…externalnetworkportsnormalpacketflowrecirculationFigure2:Thearchitectureofprogrammableswitches.byprogrammingparserandmatch-actiontables.Applicationsuseaswitchcontrolplanetoconigurematch-actionpairsinthesetables.Programmableswitchesalsohaveon-chipmemory(registersarrays)whichcanbeusedtostoreinformation.Whenapacketarrivesataningressport,theswitchparsesthepacketheaderandthenappliesmatch-actiontablestothispacket.Ifthepacketmatchesakeyinatable,theswitchexecutesthecor-respondingaction(e.g.,modifyingpacketheader,packetmetadata,andregisterarrays).Thepacketmightbedropped,transmittedtoanegressport,orresubmittedtotheingressport.Finally,theegressappliesitsmatch-actiontablestodroporforwardthepacket.2.4ChallengesTheprogrammableswitchesprovideopportunitiestoredesigndis-tributedtransactioncoordinationmechanisms.Toreducethecoordi-nationcostandexploittheresourceofhigh-speednetwork,weneedtoaddressthefollowingtwochallenges.Multi-switchscalability.Eris[14]partitionsdatapercoreandintro-ducesacentralizedswitchormiddleboxasasequencertogeneratemonotonicallyincreasingIDs(i.e.,sequencenumbers)fortransac-tions.Eachcore(i.e.,logicalserver)executestransactionsaccordingtotheirsequencenumbers.Unfortunately,thecentralizedin-switchsequencingmechanismcanneitherscaleouttomultipleswitchesnorscaleuptomultiplepipelinesinaswitch.Erisisnotsuitableforlarge-scale(e.g.,cross-rack)transactionprocessingforthefollowingthreereasons.First,thecentralizedswitchinErisisasingle-pointperformancebottleneck,whichboundstheoverallsystemthrough-puttotheprocessingcapabilityofthecentralswitch.Second,alltransactionsmustberoutedtothecentralizedswitchforsequencing,whichpreventsErisfromexploitinglocality[23,24,34]andthussubstantiallyincreasestheprocessinglatencyinamulti-switch/multi-racksystem.Third,theheadersizeofpacketsinErisisproportionaltotheshardcount;yetswitchhardwaresupportstoparselimitedsizeheader(upto2248-bitwords);thisconstraintpreventsErisfromsupportingcertaintypesoftransactionssuchasqueriesoflargerangesandaggregateprocessing(theybothaccessmanyshards).Semanticgapbetweentransactionsandnetwork.Networktra伍ccontrol(e.g.,messageprocessingorderandyingmessagecount)determineswhetherthenetworkresourcescanbefullyutilized.Theinappropriateprocessingorderofmessagesinthenetworkstackmightintroduceextraaborts.Forexample,alockoperationmustfailifitisprocessedbeforetheunlockoperation;themessageofretryingtransactionneedstohavehigherprocessingprioritytoreducethetaillatency.Further,thetransactionprocessingsystemcontrolsthenumberofconcurrenttransactionsbyadmissioncontrolalgorithmsMemory clientparticipanMemory clientparticipantWorkerThreadsclient:participant:primaryprimarybackupFigure3:SwitchTxoverview.(a.k.aMulti-ProgrammingLimitorMPL)toavoidexcessivetransac-tionabortsandretries,whilethenetworkstackcontrolsthenumberofconcurrentnetworkmessagesbycongestioncontrolalgorithmstoavoidpacketlossandretransmission.Therearefunctionredundancyandinterferencebetweenthetwocontrolalgorithms.Distributedtransactionsystemsneedtoconsiderbothofthem,forexample,allowingclientstoissuemoretransactions,whentheconictisrare,cancauseunnecessarylatencyincreaseduetonetworkcongestion.3DESIGNWedesignSwitchTxwiththefollowingfourgoals.Reducecoordinationcost.Consideringthatswitchesareintheroutingpathsofdistributedtransactionmessages,ourirstgoalistoooadallcoordinationfunctionalitytoswitches,soastoreducein-teractionbetweenservers,killtransactionlatency,shortencontentionspans,andsaveCPUcycles.Avoidsingle-pointbottleneck.Large-scaletransactionprocessingsystemsmaycontainthousandsof(orevenmore)servers,andtheoverallthroughputfarexceedsthecapacityofanysingleswitch.Oursecondgoalistoutilizeallswitchesinthenetworktoparallelizethecoordinationofdisjointtransactions.Manipulatetransactiontra伍cintelligently.Switchescanmonitorsystemstatusandapplysoftware-deinedprotocols,providingop-portunitiestocontroltransactiontra伍c.Ourthirdgoalistoreorderthemessagesbythetransactionsemanticsandtoco-designnetworkcongestioncontrolwithtransactionadmissioncontrol.Minimizeresourceusageinswitches.Programmableswitcheshavelimitedon-chipmemoryandprocessingresources.Ourfourthgoalistominimizetheresourceusageofswitchesandpreventswitchmemoryfrombeingexhausted.3.1SwitchTxOverviewSwitchTxisanin-memorytransactionprocessingsystemwhichleveragesthein-networkcoordinationandtransactiontra伍ccontroltoacceleratedistributedtransactionprocessing.Figure3showsitsend-to-endarchitecture.SwitchTxdividesdata(basedonprimarykeys)intomanyshardsspreadingoverservers,andserversstoredatainmemory.Forhighavailability,datashardsarereplicated.Speciically,SwitchTxuses2-way1primary-backupreplication(i.e.,aprimaryandabackup).Eachserverhasseveralworkerthreads,andtheysharethedataoftheserver.Theworkerthreadsareinthesymmetricmodel,whereeachoneoperatesasaclientandaparticipantatthesametime.1Ourdesignisgeneralforsystemwithahigherreplicationfactor.Speciically,eachworkerthreadisaclient:itreceivestheexternaltransactionrequestsfromapplicationsandthenexecutestransactions(i.e.,reads/writesdatafromthelocalshardsandsendsrequeststoread/writeremoteshards).Eachworkerthreadisalsoapar-ticipant:itmanagesthedataandrespondstoread/writerequestsfromclientsandcoordinationrequestsfromswitches.Switchesintheclusterhaveprogrammability,andtheyarerespon-sibleforthecoordination(includingconcurrencycontrolandrepli-cation)betweenparticipants.Speciically,SwitchTxusesoptimisticconcurrencycontrol(OCC)protocoland2-wayprimary-backupreplication.Toguaranteeserializability,transactionsreaddataandacquirelockonlyfromtheprimaryreplicasofdatashards.SwitchTxneedsfoursynchronousphases(i.e.,lock,validate,commitbackup,andcommitprimary)tocommitacross-shardtransaction.In-networkcoordination.Weobservethatthecoordinationtasksaretosynchronizetheresultsfromparticipantsinthecurrentphaseandmakethetransactionenterthenextphase.SwitchTxabstractsthecoordinationtasksasin-switchgather-and-scatter(GaS)operations.Theswitchesgathertherepliesofresultsfromparticipantsinthecurrentphase,performstatetransition,andscattermessagestoinishthecurrentphaseundercertainconditions(i.e.,phasefailureorphasesuccess).GaSmakestransactionsentertheirnextphasesasquicklyaspossible.InSwitchTx,withthein-networkcoordination,theclientisonlyinvolvedintheexecutionphase,andtheswitchesperformthecoordinationtasksinsubsequentfourphases.TomaketheGaSoperationscaleouttoalargescale(i.e.,multiplerackswithmultipleswitches)andfurtherexploittheprocessingresourceofallswitches,SwitchTxgeneratesatreetopologyamongswitchesforeachtransaction.Theclientandtheparticipantsconnecttotheleaves;themessagesaregatheredfromthechildswitchestotherootswitchandarescatteredreversely.WeirstintroducehowSwitchTxrealizesin-networkcoordinationusingonesingleswitchin§3.2,andthendetailhowSwitchTxextendsthesingle-switchdesigntomultipleswitchesforscalabilityin§3.3.Transactiontra伍ccontrol.Further,weobservethatthereisase-manticgapbetweentransactionsemanticandgeneralnetworkproto-col.Tomanipulatetransactiontra伍cintelligently,weintroducenewtransactiontra伍ccontrolalgorithmsin§3.4.SwitchTxreordersmes-sagesinabatch-basedandpriority-basedmanner,intheserversandswitchesrespectively;SwitchTxmonitorsperformancemetricsandpacketlossrateandappliesdynamictransactionadmissioncontrolaccordingtothem.3.2In-SwitchGather-and-Scatter3.2.1Gather-and-Scatter.Weobservethatonceallparticipantscompletethecurrentphasesuccessfully,thetransactionentersthenextphase,andaslongasanyparticipantcompleteswithfailures,thetransactionaborts.Toemploytheswitchasacoordinator,SwitchTxabstractsvariouscoordinationtasksasin-switchgather-and-scatteroperations.Fromtheperspectiveoftheswitch,anycoordinationtaskisto(i)gatheracertainamount(allorone)ofcompletionrepliesfromonesetofparticipantsand(ii)scatterthecorrespondingphasetransitionmessagestoanothersetofparticipants.TheGaSoperationneedsthefollowinginformation:amessagecounter(counter),thenumber(threshold)ofparticipantsinthecurrentphase(gather_group),andtheparticipantsinthenextphaseTable1:Casesingather-and-scatter.messagetypethreshold*nextphasescatter_grouplock_ok#WPValidateRPvalidate_ok#RPCommitbackupWBreplicate_ok#WBCommitprimaryWPcommit_ok#WP-unicasttoclientfail1UnlockW**&clientversion_copy_ok***#RPValidate&ReadRP*W/R:write/read;P/W:primary/backup;#:participantcount.**Thescatter_groupinfailmessagesexcludesthemessagesender.***version_copy_okisusedforread-onlytransactions.(scatter_group).Speciically,foranongoingtransaction,counteris0atthebeginning.(i)Inthegatherstep,whenreceivingareplymessagefromparticipantsingather_group,theswitchincrementscounterby1;Theswitchdoesnotroutemessages(i.e.,dropsit)ifcounterislessthanthreshold.(ii)Inthescatterstep,oncecounterisequaltothreshold,theswitchmulticaststhemessagetotheparticipantsinscatter_grouptonotifythemforthenextphase,thenresetscounterto0.Table1listsallcasesoftheGaSoperationsinSwitchTx.Forexample,thethresholdinthelockphaseforlock_okmessagesisthenumberofwriteparticipants2andthescatter_groupistheprimaryreplicasofreadshards.Itmeansthatatransactionentersthevalidatephaseoncealllocksonwriteparticipantsareheld.WeshowtwotransactionexamplesusingtheGaSoperations:acommittedoneandanabortedone.Committedtransactions.Figure4showsacommittedread-modify-writetransaction.Itreadsrecordsfromshard0andshard1,modiiesthembytheuser’slogic,andwritesrecordsinshard1andshard2.Figure4(a)showsthebasicprocedureoftransactionprocessingwithin-servercoordination.Intheexecutionphase,theclientreadsrecordsfromprimaryreplicasP0andP2,andexecutesthetransac-tion.Andthen,inthelockphase,itsendsrequeststoP1andP2toacquirewritelocks;inthevalidatephase,itveriiesthattheversionsofrecordsinP0andP2arenotchanged.Finally,inthecommitbackupphase,itwriteslogstoB1andB2;inthecommitprimaryphase,itwritesandunlocksthelockedrecordsinP1andP2.SwitchTxextendsthebasicproceduretoooadcoordinationtotheswitch,asshowninFigure4(b).Theclientsendsthewholewritedatatotheprimaryandbackupreplicasatthebeginningofthelockphasesothatthesubsequentcoordinationphasedoesnotneedtoinvolvetheclient.WhileintheoriginalOCC,thekeysinthewritesetarecombinedwithmessagesinthelockphase,andthevaluesarecombinedwithmessagesinthecommitbackup/primaryphase.Inthelockphase,theswitchusesthecountertocountthenumberofthelock_okmessagesfromP1andP2,anditcomparescounterwiththenumberofwriteparticipants(i.e.,threshold=2).Itdropstheirstlock_okmessageandmulticaststhesecondone(i.e.,thelastone)asthevalidaterequeststo〈P0,P2?.Inthevalidatephase,similarly,theswitchwaitsforvalidate_okmessagesfromP0andP1,andthenmulticaststhelastoneto〈B1,B2?.Afterthat,inthecommitbackupphase,itwaitsforreplicate_okmessagesfromB1andB2,andthenmulticaststhelastoneto〈P1,P2?;inthecommitprimaryphase,theswitchwaitsforcommit_okmessages 2Theyareprimaryreplicasofwriteshards.ValidateCommitckupclientValidateCommitckupG:P,PS:P,PG:P,PSValidateCommitckupclientValidateCommitckupG:P,PS:P,PG:P,PS:B,BG:B,BS:P,PG:P,PS:Cockoktxnlog+datavalidate_okcommit_okreplicate_ok(b)SwitchTxockclientGPSCP,PG:P,P,PSP,Pfaillock_okCommitprimarytata(a)OCCCommitprimaryFigure4:Theli

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論