版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
HardwareAcceleratorforConvolutionalRestricted
BoltzmannMachines
JunghoonHan
ElectricalEngineeringandComputerSciencesUniversityofCalifornia,Berkeley
TechnicalReportNo.UCB/EECS-2025-29
/Pubs/TechRpts/2025/EECS-2025-29.html
May1,2025
Copyright?2025,bytheauthor(s).
Allrightsreserved.
Permissiontomakedigitalorhardcopiesofallorpartofthisworkfor
personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare
notmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecificpermission.
Acknowledgement
IwouldliketothankProfessorSayeefSalahuddinforhiscontinued
mentorshipandgeneroussponsorshipduringmymaster’sprogram.IthankPratikBrahma,whopioneeredthisresearchtopic,forhiscloseguidance,
ideas,andhelponthisproject.ThankstotherestoftheUnconventional
Computinggroupmembers,ChiragGarg,SaavanPatel,andPhilipCanoza,inhelpingoutwiththisprojectinvariousways.
Iamprofoundlygratefulfortheexceptionalsupportofmyfamily
throughouttheyears.Andthankstoallmyfriends,especiallymyfellowRa-Onbandmembers,whomademygraduateprogramfruitful.
HardwareAcceleratorfor
ConvolutionalRestrictedBoltzmannMachines
byJunghoonHan
ResearchProject
SubmittedtotheDepartmentofElectricalEngineeringandComputerSciences,
UniversityofCaliforniaatBerkeley,inpartialsatisfactionoftherequirementsforthedegreeofMasterofScience,PlanII.
ApprovalfortheReportandComprehensiveExamination:
Committee:
ProfessorSayeefSalahuddinResearchAdvisor
5.07.2024
(Date)
*******
ProfessorSophiaShaoSecondReader
5/6/2024
(Date)
Copyright2024
by
JunghoonHan
Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalor
classroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedfor
pro?torcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationonthe
?rstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,
requirespriorspeci?cpermission.
HardwareAcceleratorforConvolutional
RestrictedBoltzmannMachines
by
JunghoonHan
Abstract
RestrictedBoltzmannMachines(RBMs)havegainedattentionfortheirstrengthinaidingMonteCarlosimulationsforCombinatorialOptimization,QuantumApplications,andMa-chineLearningproblems.ConvolutionalRBM(CRBM),avariantofRBM,hassparkedinterestduetoitslowerparametercountsande?cientperformancefortranslationally-symmetricproblems.However,thestochasticnatureofCRBMoftenmakesittakelongdurationtoreachtheground-statesolution,demandinganapproachtoacceleratethecom-putationprocess.
Inthiswork,wedemonstrateourhardwareacceleratorforCRBM,implementedinRTLandprogrammedonFPGA.Softwareapplicationscanharnesstheacceleratorbysimplyprogrammingtheweights,bias,andlatticesizes.WeshowthatforsolvingfrustratedclassicalHamiltoniansforIsingShastry-Sutherlandmodel,ourhardwareacceleratesthereachingofground-statesolutionbyupto5ordersofmagnitudecomparedtoGPUs.
i
Contents
Contentsi
1Introduction1
1.1Background 1
1.2MotivationsandPreviouswork 2
2ConvolutionalRBM(CRBM)3
2.1RestrictedBoltzmannMachine(RBM) 3
2.2ConvolutionalRBM(CRBM) 3
2.3CRBMComputationLogic 5
2.4Shastry-Sutherlandmodelmapping 8
3CRBMHardwareAccelerator10
3.1Background 10
3.2Architecture 10
3.3InputandOutput(I/O)andProgrammingLogic 13
3.4Testing 14
3.5Analysis 14
4Results16
4.1TimetoSolution 16
4.2RuntimeResults 16
4.3Evaluation 17
5Conclusion19
5.1FutureSteps 19
5.2Conclusion 20
Bibliography21
1
Chapter1
Introduction
1.1Background
Intheever-evolvinglandscapeofIsingmodels,thequestfore?cientandrobustmodelscapableofprocessingcomplexdataremainsparamount.AmongthemyriadoftechniquesthathaveemergedformappingIsingmodels,RestrictedBoltzmannMachines(RBMs)standoutasafundamentalbuildingblockintherealmofunsupervisedlearning.Withtheirabilitytocaptureintricatepatterns,parallelizegibbssampling,andmaprelationshipsbetweendi?erentneurons,RBMshavegarneredconsiderableattentionandacclaiminthe?eldofCombinatorialOptimization,Quantumproblems,andclassicalIsingmodels.
PartofthisattentionisascribedtoConvolutionalRestrictedBoltzmannMachines(CRBMs).
CRBMsharnessthepowerofprobabilisticinferencetoexploresolutionspacesmoree?ec-tively,therebyenablingthediscoveryofoptimalornear-optimalsolutionsincomputation-allychallengingproblems.CRBMs,aconvolutionalvariantofRBMs,havelowerparametercounts,therebyincreasingthecomputee?ciencyfortrainingandinference.Recentworkhassparkedinterestsinitsabilitytooptimallymaptranslationallysymmetricproblems,inwhichconvolutionweightsarerepeatedeverystride.
ThetransformativepotentialofCRBMshasimmensepracticalsigni?canceinaddressingreal-worldchallengeswithprofoundimplications.Inmaterialsscience,theabilitytoexplorevastsolutionspaceswithprobabilisticmethodologiesenablesresearcherstoexpeditethesearchfornovelcompoundsandmaterialswithdesiredproperties.ThispaperwillpartiallyincludedemonstrationofmappingaclassicalIsingShastry-SutherlandmodeltoCRBMstoacceleratethesamplingcomputationstoreachground-statesolution.
Duetotheirstochasticnature,CRBMsmayrequiresigni?cantiterationsofsamplingtoreachthedesiredground-statesolution.TherequiredsamplingcountalsoincreaseswiththenumberofneuronsintheCRBM.Thus,toharnessthepowerofCRBMswithinareasonablecomputetime,ane?cientimplementationisessential.ThismotivatesourapproachtodesigninghardwareacceleratorsforCRBMstoimprovecomputetimeandenergye?ciency.
2
CHAPTER1.INTRODUCTION
1.2MotivationsandPreviouswork
MotivationforHardwareAcceleration
MappingthemathematicallogicdirectlyintodigitalRegisterTransfer-level(RTL)logic,ratherthanencodingthemtoinstructionsforgeneralpurposecomputers,canspeedupthecalculationsbyseveralordersofmagnitude.Thisprocesscannotonlysavecomputationtime,butalsoreducetheenergyrequiredtocomputeadesiredprogram.
SamelogicfollowsfordesigningacustomdigitalhardwareacceleratorforConvolutionalRBMs.Transistorlogiccanbecustomizedandoptimizedtosuitthespeci?crequirementsofCRBMs,suchasoptimizingmemoryaccesspatterns,exploitingspatialparallelismatthehardwarelevel,andimplementingspeci?cmodulestailoredforGibbssamplingcomputations.ThedetailsofthehardwareimplementationarenotedinChapter3.
RelevantPreviouswork
ThisresearchispartofSalahuddinLab’sUnconventionalComputingsubgroup,whichhasbeenusingRBMsforNP-Hardcombinatorialoptimizations.Ourteam’sformermembershavedemonstratedtheusedofhardwareacceleratedRBMsforsolvingoptimizationproblemssuchasMAX-CUTproblemandSherrington-Kirkpatrickspinglass.TheFPGA-mappedRBMhasdemonstratedsimilarorbetterscalingperformancecomparedtoQuantumCom-puterssuchasDWave200QQuantumAdiabaticComputer[1].SubsequentworkhasusedtheRBMHardwareacceleratorforintegerfactorizationof16-bitnumbers.Thisworkshowedastaggeringruntimeimprovementof10000xoverCPUsand1000xoverGPUs.[2]
Aspreviousresearchonhardware-acceleratedRBMshavebeenmeaningful,ourgroupwasmotivatedtodesignhardwareacceleratorstospeci?cvariantsofRBMs,notablyCRBMs.Inthispaper,weembarkonajourneytoexplorethedesign,implementation,andevaluationofahardwareacceleratortailoredspeci?callyforConvolutionalRestrictedBoltzmannMa-chinesfornon-deterministicpolynomial-timecomputing.ThroughRTL-leveldescriptionsandFPGAmappings,wedemonstratethee?cacyandversatilityofhardware-acceleratedCRBMsinsolvingcombinatorialoptimizationproblems.
3
Chapter2
ConvolutionalRBM(CRBM)
2.1RestrictedBoltzmannMachine(RBM)
TheRestrictedBoltzmannMachine(RBM)isastochastic2-layergraphneuralnetwork.The2layersareeachcalled”visible”and”hidden”layers,whichareall-to-allconnected,containingtheformofabipartitegraph.RBMsareusedbyblockGibbssamplingbetweenthe2layersrepeatedly,thentrackthevisiblelayervalueseverysampletoderivetheprob-abilitydistributionoftheresultingnode(neuron)values.RBMisanenergy-basedmodel,whichmeansthattheobjectiveofsamplingistominimizetheenergyvalueassociatedwiththeweight,bias,andnodevalues.[3]
Allnodevaluesarebinary:0or1.Thenextvalueofanodeisdeterminedbyderivingaprobabilityforittobeofvalue1andconductingrandomsamplingaccordingtotheprobability.Thenextsetofvaluesforeachlayersissampledbytheconditionalprobabilitydependentontheotherlayer.Thevaluesofallnodesinasinglelayeraresampledjointly;thenextsetofhiddennodeswillbesampledbyprobabilityp(h|v),andthevisiblenodesbyprobabilityp(v|h).ThisformofsimultaneoussamplingiscalledblockGibbssampling.
ThenodesandedgesoftheRBMcorrespondtoneuronsandsynapticconnections.Thus,whenwemapdiferentproblemstoRBM,wecanassignthevisiblenodestorepresentphysicalvariables(suchasspins,direction,groupassignment)andthehiddennodestointeractionsbetweenthem(suchasspininteractions).
2.2ConvolutionalRBM(CRBM)
WhileRBMsareassumedtohavefully-connectededgesbetweenthevisibleandhiddenlayers,CRBMsworkwithstridesandconvolution.CRBMsshowtranslationalinvariance,wherethepatternofweightsareidenticalacrossdiferentpartsofthenodes.Astheall-to-allconnectionofRBMcanbememory-heavyandcompute-heavy,CRBMhelpsrelaxthelogicbyusingonlyasetofconnectionstofullyrepresenttheprobabilitiesforblockGibbssampling.
4
CHAPTER2.CONVOLUTIONALRBM(CRBM)
Figure2.1:PictorialrepresentationofRBMandCRBM.
The2.1showsthestructureofRBMandCRBM.Asseenontherightofthe?gure,CRBMshavethesameweightsrepeatedeveryastride(inthiscase,strideequalto1).The?gurealsonotesperiodicity,whichmeanswhenthestridegoesoutofboundsofthevisiblenodes,itwrapsbacktothe?rstindexofthehiddennodes(inthiscase,connectingv4withh1).Periodicitycanbeturnedonorof,dependingontheproblemformulation.
CRBMscanhavemultiplesetofweights.Forexample,asperFigure2.1,the?rstsetofweightscanbew1=(e1,e2)=(1,2),whilethesecondsetofweightscanbew2=(e1,e2)=(3,4).Eachsetofweightswillproduceagroupofhiddennodes.Anothersetofweightswillproduceaseparategroupofhiddennodes.Hereon,wewillnotethemasconvolutiongroups.
EnergyandProbabilityformulation
ThefollowingformulasarederivedbyconvertingthegeneralRBMenergyandprobabilityequationstore?ecttheconvolutionalnatureofCRBM.
Here,thenotationsare:vijisthevisiblenodeatthei-throwandj-thcolumn.k
representstheconvolutionalgroup?,whichcorrespondstothekthsetofweights,alsoknown
as’?lters’.Wkisthek-th?lter.Wkisthek-th?lter,?ippedinbothhorizontalandverticalaxes.hkijinturnrepresentsthehiddennodeatgroupk,i-throwandj-thcolumn.bisthehiddenbiasandcisthevisiblebias.?istheelement-wiseproductfollowedbysummation:
A?B=trATB.*operatordenotesconvolution.σdenotesthesigmoidoperator.[4]
P(hkij=1|v)=σ((Wk*v)ij+b)(2.2)
5
CHAPTER2.CONVOLUTIONALRBM(CRBM)
TheobjectiveofourCRBMistosamplerepeatedlyuntiltheenergyreachestheground-statesolution.(Theground-statesolutionisalsotheoutputwithhighest-likelihood).Theprobabilitiesareusedtosampleeachofthevisibleandhiddennodevalues.Thisprobabilityisusedtorandomlysamplethenodevalueof0or1,therebydeterminingthenextvalueofthenodes.
2.3CRBMComputationLogic
TheCRBMcomputationlogicandsequenceisillustratedinFigure2.2.Notethatthelogic?owsfromvisiblenodes→hiddennodes→visiblenodes,andrepeats.
2.3.1.Visiblenodes
Thesamplingstartswiththeinitialstateofvisiblenodes.Inoursetting,thevisiblelayeriscon?guredasa2-dimensionalarrayofbinarynodes.
Figure2.2startswithvisiblenodesofsize3x3.
2.3.2.Wrapping
Wrappingisdonetoensureperiodicityisincorporatedintotheconvolutionlogic.Assumethatthe?ltersizeisMxM.Ifperiodicityis’on’inthecolumndirection,the?rstM-1columnsiscopiedtothelastcolumnindex.Ifperiodicityis’of’inthecolumndirection,therewillbeM-1columnsofzerosinserted.Thesamelogicholdsfortherowdirection.
Figure2.2notesthewrappinglogicfora2x2size?lterandperiodicityoninbothcolumn
androwdirection.Thewrappednodesaredenotedincolororange.
2.3.3.Convolution-Forward
Forwardconvolutionnotestheconvolutionlogicnecessaryforsamplinghiddennodesfromvisiblenodes(visible→hidden).Convolutionhereoccursasanelement-wisematrixmultiplywiththe?lterandcurrentposition’svisiblenodes,followedbyaccumulation(mac).Thisoperationisconductedrepeatedlywithastride,whichmovesthe?ltertothenextrespectivelocation.Thestrideoccursinbothcolumnandrowdirection,andtheprocessisrepeateduntileachdirection’sindexisoutofbounds.
Thecompleteprocessmentionedaboveisidenticalforalldiferent?lters.Thenumberofoutputgroupswillbeequaltothenumberofdiferent?lters.
6
CHAPTER2.CONVOLUTIONALRBM(CRBM)
Figure2.2:CRBMComputationlogic
7
CHAPTER2.CONVOLUTIONALRBM(CRBM)
Figure2.2illustratestheconvolutionlogicfor3diferent2x2size?lterswithastrideof
2.For4x4visiblenodes,thisprocesscreatesa2x2resultforeach?ltergroup.
2.3.4.ProbabilityandSampling-Forward
TheconvolutionresultissenttoasigmoidoperatortoobtaintheprobabilityofP(h|v).Thesigmoidisappliedelement-wisetoeachoftheoutputsoftheconvolution.
Sigmoidwillprovideaprobabilityvaluebetween0and1,whichisin-turnusedforrandomsampling.Thesamplerwilltaketheprobabilityasthelikelihoodofresultnodebeingequalto1.Then,thesampler’sresult,either0or1,willbethenextvalueofthehiddennodes.Inpractice,thisprocessisdonebygeneratingarandom?oatingpointvaluebetween0and1,comparingittothesigmoidoutput,andsettingtheresultvalueto1iftherandomnumberislessthanthesigmoidoutput.
2.3.5.Hiddennodes
Thesampledvalueswillbethenexthiddennodevalues.WithNdiferent?lters,therewillbeNgroupsofhiddennodes.Allhiddennodevaluesarebinaryaswell.
2.3.6.ZeroPadding
Weconductazero-paddingtechniquetoensurethattheresultingreversesampling(hidden→visible)hasthesamedimensionasthestartingvisiblenodedimension.Thatis,weinsertzerosbetweenthehiddennodesinalldirections.
Similarlytothewrappingstep,zeropaddingalsoincludescopyingthelastcolumnsandrowstothebeginningcolumnandrow.Ifperiodicityison,wecopythehiddennodevaluesalongwithpaddedzerios.Ifperiodicityisof,wesimplyzerosareaddedtothebeginningcolumnandrowpositions.
2.3.7.Convolution-Reverse
Theconvolutionlogichereissimilartothatoftheconvolutioninforwarddirection.Thekeydiferencehereisthatthe?ltersappliedare?ippedinhorizontalandverticaldirections.Moreover,thestridevalueisalwaysequalto1inthereversedirection.
2.3.8.Accumulation
ForNdiferentconvolutiongroups,therewillbeNdiferentconvolutionoutputs.Thisstepaccumulatesallthenodevaluesfromtheconvolutionoutput,element-wise.Thedimensionoftheoutputfromthisstepisequaltothatofthevisiblenodes.
8
CHAPTER2.CONVOLUTIONALRBM(CRBM)
Figure2.3:Shastry-SutherlandMagnetizationPhases
2.3.10.ProbabilityandSampling-reverse
Similartotheforwarddirectionprocess,thesigmoidisappliedtoproducetheprobability,whichisusedforrandomlysamplingthenextsetofvisiblenodes.Thisstepproducesthenextsetofvisiblenodevalues,whichcompletesthefullcycle.
2.4Shastry-Sutherlandmodelmapping
Inourwork,wemaptheclassicalIsingShastry-SutherlandmodelontheCRBMstructuretosolvefrustratedclassicalHamiltonian.OurresultsdemonstratethattheCRBMcanbeusedtosimulateanykindoftranslationally-symmetricclassicalHamiltonian.TheShastry-SutherlandLatticehasdiscretetranslationalsymmetry,wherecertainsetofspininteractionsarerepeatedelsewhereonthelattice.TheShastry-SutherlandmodelcanbemappedtoCRBMinthefollowingway:thevisiblenodescanrepresentphsyicalvariables,inthiscasethemagneticspins.Thehiddennodescanrepresentinteractionsbetweenthespins.
TomaptheShastry-SutherlandIsingmodeltotheCRBMframework,weequatethephysicallattice’sBoltzmanndistributiontoRBM’smarginaldistribution.TheRBMweightsarethenmappedtobeuniqueonlyuptotheunitcellonthelattice,ofsize3x3.Thus,the
9
CHAPTER2.CONVOLUTIONALRBM(CRBM)
?ltersizesare3x3.TheShastry-Sutherlandcontainsunique10repeatedinteractions,leadingtotheformulationof10diferent?lters.
Wefocusedourexperimenton4oftheShastry-SutherlandMagnetizationphases,asnotedinFigure2.3.Eachnode,mappedtothevisiblenodesofCRBM,representthemagnetizationspins.Theemptycirclesarerepresentedas1,and?lledcirclesarerepresentedas0.Diferentphaseproblemsproducediferent?ltersandbiases.
?AFMPhase:Anti-FerromagneticPhase.Everynon-diagonalnodeshavetheoppositespins.
?FMPhase:FerromagneticPhase.Allnodeshavethesamemagnetizationspins
?1/3FractionalPhase:therowsofthelatticeshowapatternofFMphaserowsand-wichedbetweentwoAFMphaserows
?DimerPhase:certaindiagonalsetofnodesareexpectedtobeoppositespinsofeachother(markedingreenboxes)
DetailedmappingresultoftheShastry-SutherlandtoCRBMwillbeillustratedinacomingpaperfromtheSalahuddinGroup,inaworkpioneeredbyPratikBrahma.
10
Chapter3
CRBMHardwareAccelerator
3.1Background
TheobjectiveoftheCRBMhardwareacceleratoristosigni?cantlyreducetheruntimeofreachingtheground-statesolutionofCRBM.
ThehardwaredesignisimplementedinRTL(RegisterTransferLevel)andmappedtoFieldProgrammableGateArray(FPGA).WeusedtheVirtexUltrascale+FPGAdevice(VCU118),aproductofXilinx-AMD.ThisFPGArepresentsacutting-edgesolutioninthe?eldofFPGAswith14nm/16nmFinFETprocesstechnology,dynamicpowermanagement,andintegratedGen3x16PCIeblocks.WeusethisFPGAjointlywiththeexperimentserverwith11thGenIntelCorei9-11900K@3.50GHzand135GBRAM.ToprogramtheFPGA,weuseXilinx’sVivadotools.
3.2Architecture
Thehardwarearchitecture,asdenotedinFigure3.1,mapseachstepofCRBMintorespectivehardwaremodules.NotethattherearecorrespondingmodulestothedescribedstepsinFigure2.2.
Thehardwareispipelinedwith2stages:forwardandreverse.Theforwardstagecontainslogicofsamplingfromvisiblenodes→hiddennodes(stages3.2.1to3.2.5).Thereversestagecontainslogicofsamplingfromhiddennodes→visiblenodes(stages3.2.5to3.2.9).
3.2.1.VisibleNodeRegisters
The2Dvisiblenodelayerisrepresentedinasingleregister.Asthenodevaluesarebinary,theytakeupasinglebitintheregister.ThistechniqueminimizestheLUTresourceusageontheFPGA.
11
CHAPTER3.CRBMHARDWAREACCELERATOR
Figure3.1:CRBMHardwareAcceleratorArchitecture
ThedimensionofthelatticesizeisnotedasLxL,whichnotesLrowsandLcolumnsofvisiblenodes,makingatotalofLxLvisiblenodes.Thus,thereareLxLbitsinthevisiblenoderegister.
3.2.2.Wrapper
Thewrappermodulefollowsthelogicofwrappingtechniquenotedinsection2.3.2.Ittakesinthevisiblenoderegisterandperiodicitysignalasinputs,andcopiesorzerosouttherespectivecolumnsandrowsaccordingly.
3.2.3.Convoluter-Forward
Theconvolutermoduletakesinthewrapperand?lterstoconductconvolutionlogicasnotedinsection2.3.3.The?ltervaluesareprovidedbytheuser’ssoftwareviaPCIe.
Inthisimplementation,theconvolutertakesadvantageofspatialparallelism.Itcontainsconvolutionlogicofmultiplyandaccumulateinplaceforcorrespondingpositions.Samelogiciscopiedtootherpositionsthatareseparatedinadistanceequaltothestridevalue
12
CHAPTER3.CRBMHARDWAREACCELERATOR
inalldirections.Insummary,allconvolutioncomputationiscontainedinasinglespatiallyparallelisedcombinationallogic.
3.2.4.SigmoidandLFSR-Forward
Thesigmoidmodulesaresynthesizedwiththeinputandoutputbitcountparameters,whichareusedtodeterminethelevelofprecisionoftheinputandoutput.Theinputistheresultoftheconvolution.Theoutputisthecorrespondingsigmoidvalue.Thesigmoidmoduleinternallycontainsapre-codedLUTwhichisidenticaltoadictionaryofkeyandvalue,inputandoutput.Themoduleselectstheclosestcorrespondingsigmoidvaluethatwassynthesizedwiththegivenprecisionparameters.
TheLinearFeedbackShiftRegister(LFSR)moduleissynthesizedaccordingtoasetseedvalue.Theinternalregisterinthemodule,initializedwiththeseedvalue,isshu?edeverycycletoproducerandomizedbits.TheLFSRoutputisconvertedtoavaluebetween0and1.
ForN?ltergroups,thereareNsigmoidmodulesandNLFSRmodules.ThesigmoidvalueiscomparedwiththeoutputoftheLFSRmodule.Ifthesigmoidvalueisgreater,thecorrespondinghiddennodewillcontainvalue1.Otherwise,itwillcontainvalue0.Thislogiccompletesthe?rstpipelinestage.
TheSigmoidandLFSRhardwaremodulesarepioneeredbyourformerresearchers,Saa-vanPatelandPhilipCanoza.
3.2.5.HiddenNodeRegisters
ForN?lters,thereareNhiddennodegroupsproduced.Eachofthemwillhaveadimensionofb(L+1)/stridecxb(L+1)/stridec.Thus,thehiddennoderegisterwillcontainatotalofNxb(L+1)/stridecxb(L+1)/stridecbits.
3.2.6.ZeroPadder
Thezeropaddermoduleimplementsthelogicnotedinsection2.3.6.Thehardwarekeepsanarrayofzeroswithemptyslotsforpositionsthattakeinhiddennodevalues.Thehiddennodevaluesareinsertedinaspatiallyparallelmanner.ForN?lters,thereareNzeropaddermodules.
3.2.7.Convoluter-Reverse
Theconvolutermoduleinthereversedirectionisthesamemoduleusedintheforwarddirection(section3.2.3).The?ippedweightsareinputstothismodule,whichareprovidedbytheusersoftwareviaPCIe.ForN?lters,thereareNreverseconvolutermodules.
13
CHAPTER3.CRBMHARDWAREACCELERATOR
3.2.8.Accumulator
Theoutputoftheconvolutermoduleisaccumulatedinthismodule.Astheaccumulationisdoneelement-wise,itissimpletocreateacombinationallogicthataddsupthevaluesforthesamepositionsinNgroups.Followingtheaccumulatormodule,theN?ltergroupsareaggregatedtoasinglegroup.
Moreover,visiblebiasisappliedinthismodule.Thebiasvaluesareprovidedbytheusersoftware.Weprovideanoptiontouseoddbiasandevenbias,whichallowsdiferentbiasvaluestobeappliedforoddcolumnsandevencolumns.
3.2.9.SigmoidandLFSR-Reverse
ThesigmoidandLFSRmodulesusedinthereversedirectionarethesamemodulesusedintheforwarddirection(section3.2.4).Theoutputofthesemodulesdeterminethenextvisiblenodevalues.
Thismodulecompletesthesecondpipelinestage,andcompletesafullcycleofsampling.
3.3InputandOutput(I/O)andProgrammingLogic
ThehostmachineandtheFPGAcommunicatesoverthex16PCIe.WeimplementtheInputandOutput(I/O)logicofthePCIethroughanopensourcemodulenamedXillybus.XillybusprovidesbothanFPGAIPcoreandadriverforthehostPC’soperatingsystem.ItprovidescustomizedbundlesfordiferentFPGAmodels.
Althoughourhardwareneednotcommunicatelargedatawithineachtimesteps,thehostmachineandFPGArunondiferentclockfrequencies,producingaclockdomaincross-ing.Thus,weuseaFirst-In-First-Out(FIFO)moduletoenablesequentialcommunicationbetweenthehostandFPGA.
TheXillybusdrivercreates2devices?lesoftheFPGA:oneforwritingandoneforreading.Theusersoftwarewritesandreadsthefollowingvaluestoandfromthedevice?les:
?Write(PCtoFPGA):weights,?ippedweights,biases,latticesizes(visiblelayerdi-mensions),periodicity,andclearlasthiddenrowssignal(someapplicationsrequireclampingthelastrowofhiddennodestozero)
?Read(fromFPGA):Visiblenodevaluesofeachcycle
Afterthehostmachinereadsthevisiblenodevalues,theusersoftwarecalculatestheenergyofthenodes.
14
CHAPTER3.CRBMHARDWAREACCELERATOR
Figure3.2:CRBMHardwareAcceleratorcomplexities
3.4Testing
Eachhardwaremodulewentthroughbehavioraltestin
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 北京師范大學(xué)人才人事部招聘2人筆試備考試題及答案解析
- 2026江蘇南京大學(xué)化學(xué)學(xué)院科研人員招聘考試備考試題及答案解析
- 2026年福建莆田中山中學(xué)玉湖校區(qū)代課教師招聘4人筆試備考試題及答案解析
- 2026年鋁材加工車間安全規(guī)程
- 2026年老年康復(fù)護(hù)理實(shí)務(wù)培訓(xùn)
- 2026年建筑電氣節(jié)能技術(shù)的市場前景
- 2026中國農(nóng)業(yè)大學(xué)人才招聘筆試參考題庫及答案解析
- 2026年計算機(jī)視覺算法應(yīng)用培訓(xùn)
- 2026年跟蹤消費(fèi)者購買行為的營銷策略
- 2026青海海南州貴南縣招聘項(xiàng)目管理人員辦公室文員3人考試備考試題及答案解析
- 以房抵工程款合同協(xié)議6篇
- GB/T 222-2025鋼及合金成品化學(xué)成分允許偏差
- 申報個稅申請書
- 中秋福利采購項(xiàng)目方案投標(biāo)文件(技術(shù)方案)
- 固態(tài)電池技術(shù)在新能源汽車領(lǐng)域的產(chǎn)業(yè)化挑戰(zhàn)與對策研究
- 2025年廣電營銷考試題庫
- 湖南省岳陽市平江縣2024-2025學(xué)年高二上學(xué)期期末考試語文試題(解析版)
- DB5101∕T 161-2023 公園城市鄉(xiāng)村綠化景觀營建指南
- 2024-2025學(xué)年湖北省武漢市江漢區(qū)七年級(下)期末數(shù)學(xué)試卷
- 重慶市2025年高考真題化學(xué)試卷(含答案)
- 工地材料管理辦法措施
評論
0/150
提交評論