版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
FastPython*AnalyticsandDeepLearningFrameworksonCPU
Intel?AccelerationsforAI
NathanGreeneltch,PhD
ConsultingEngineer,IntelCorporation
1oo
CJ
10
101
n1
10
01 01
)
1
1
0100
1
°
1
1101
。
『
。。司。
。。
廠
已
Softwar
Intel?AlFramework
Accelerations e
00
1
0.
1[1
0
01
圃
Intel?Python*AccelerationsIntel?DAALforPython*Analytics
GetDeepLearningFrameworkPerformanceonIntel?Architecture
Intel?OptimizedAIFrameworks
NathanGreeneltch,PhD
ConsultingEngineer,IntelCorporation
PAGE
4
Copyright?IntelCorporation2019
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
DefineTheProblem
Consumer
Health
Finance
Retail
Government
EnergyTranspo
Industrial
Other
SmartAssistants
Enhanced
Algorithmic
Support
Defense
Oil&Gas
In-Vehicle
Factory
Advertising
ChatbotsSearchPersonalization
AugmentedReality
Robots
Copyright?IntelCo
*Othernamesandb
Diagnostics
DrugDiscovery
PatientCare
Research
SensoryAids
rporation2019
andsmaybeclaimedasthepro
Trading
FraudDetectionResearchPersonalFinanceRiskMitigation
pertyofothers.
ExperienceMarketingMerchandisingLoyalty
SupplyChainSecurity
DataInsights
Safety&Security
ResidentEngagement
SmarterCities
Exploration
SmartGrid
OperationalImprovement
Conservation
Experience
AutomatedDriving
AerospaceShipping
Search&Rescue
Automation
PredictiveMaintenance
PrecisionAgriculture
FieldAutomation
Education
Gaming
Professional&ITServices
Telco/MediaSports
Source:Intelforecas
6
ArtificialIntelligenceWillTransform…
rt
r
t
10X
World’sdatawillgrowin10years
Copyright?IntelCorporation2019
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
Source:Seagate“DataAge2025”July2017
ForbesMagazine:20Mind-BogglingFactsEveryBusinessLeaderMustReflectOnNow(Nov1,2015);InsideBigData:ExponentialGrowthofData(Feb16,2017)
Yetlessthan1%of
alldata
iseveranalyzed
andused
Copyright?IntelCorporation2019
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
PAGE
9
CPUAI
Intel?DLBoost
ArtificialIntelligenceeverywhere
Inference
ClientIoTEdge
VPUEYEQ iGPUFPGA
Accelerators
GPUFPGA
Datacenter
DiscreteAI
Training
Intel?NNP-L
GPU
CommonsoftwarestackEcosystem
IntelAIStrategy:“ArtificialIntelligenceEverywhere”
AI(ML&DL)SoftwareStackforIntel?Processors
DeeplearningandAIecosystemincludesedgeanddatacenterapplications.
Opensourceframeworks(TensorFlow*,MXNet*,PyTorch*,PaddlePaddle*)
Inteldeeplearningproducts(BigDL,OpenVINO?toolkit)
In-houseuserapplications
Intel?MKLandIntel?MKL-DNNoptimizedeeplearningandmachinelearningapplicationsforIntel?processors:
Throughthecollaborationwithframeworkmaintainerstoupstreamchanges(Tensorflow*,MXNet*,PyTorch,PaddlePaddle*)
ThroughIntel-optimizedforks(Caffe*)
Bypartneringtoenableproprietarysolutions
IntelMKL
IntelMKL-DNN
IntelProcessors
Intel?MathKernelLibraryforDeepNeuralNetworks(Intel?MKL-DNN)isanopensourceperformancelibraryfordeeplearningapplications(availableat/intel/mkl-dnn)
FastopensourceimplementationsforwiderangeofDNNfunctions
Earlyaccesstonewandexperimentalfunctionality
Openforcommunitycontributions
Intel?MathKernelLibrary(Intel?MKL)isaproprietaryperformancelibraryforwiderangeofmathandscienceapplications
Distribution:IntelRegistrationCenter,packagerepositories(apt,yum,conda,pip),Intel?ParallelStudioXE,Intel?SystemStudio
Copyright?IntelCorporation2019
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
PopularDLFrameworksarenowoptimizedforCPU!
FOR
*
*
*
*
Seeinstallationguidesat/framework-optimizations/
*
Moreunderoptimization: *
SEEALSO:MachineLearningLibrariesforPython(Scikit-learn,Pandas,NumPy),R(Cart,randomForest,e1071),Distributed(MlLibonSpark,Mahout)
*Limitedavailabilitytoday
Othernamesandbrandsmaybeclaimedasthepropertyofothers.
DefineTheProblem
Artificial
Intelligence
istheabilityofmachinesto
learnfromexperiencewithoutexplicitprogramming,inorder
toperformcognitivefunctionsassociatedwiththehumanmind
ArtificialIntelligence
Machinelearning
Algorithmswhoseperformanceimproveastheyareexposedtomoredataovertime
Deeplearning
Subsetofmachine
learninginwhichmulti-layeredneuralnetworkslearnfromvastamountsofdata
Intel?DAALFocus
MachineLearningTechnologyBreakdown
MachineLearning
Autonomouscomputationmethodsthatlearnfromexperience(data)
DeepLearning
Hierarchicalapproachwithmanyhiddenlayers-gainingfamefromaccuratelyclassifying
data-likeimages,speech,andnaturallanguage.
Featuresarelearned.
“dog”
Typicalcustomers:CSP,HPC
Other(orclassic)ML
TraditionalMLtechniquesforclustering,regression,andclassificationusingveryfew(oneortwo)hiddenlayers.Requiresfeatureengineering.
Typicalcustomers:Enterprise,HPC
Training
Trainanalgorithmtobuildamodel
Time-to-modeliscritical
Inference
Deploymodelsforclassification,prediction,recognition
Easilydistributed
Criteria:Throughput,TCO@scale
MachineLearningTechnologyBreakdown
DeepLearning
Hierarchicalapproachwithmanyhiddenlayers-gainingfamefromaccuratelyclassifying
data-likeimages,speech,andnaturallanguage.
Featuresarelearned.
Training
Trainanalgorithmtobuildamodel
Time-to-modeliscritical
“dog”
Inference
Deploymodelsforclassification,prediction,recognition
Easilydistributed
Criteria:Throughput,TCO@scale
Typicalcustomers:CSP,HPC
DeepLearningBreakthroughs
recognitio9n7%
person
Human
recognition
99
“pl
Human
Machinesabletomeetorexceedhumanimage&speechrecognition
30%
Error
23%
15%
Image
30%
Error
23%
15%
Speech
%
aysong”
8% 8%
0%
2010 Present
0%
2000 Present
e.g.
DocumentOil&Ga
Defect
Genome
TumorDetectio
Sorting search
detectio
sequencig
VoiceAssistan
Source:ILSVRCImageNetwinningentryclassificationerrorrateeachyear2010-2016(Left),https://
/en-us/research/blog/microsoft-researchers-achieve-new-conversational-speech-recognition-milestone/
(Right)
DepthofNetworks
ImageNetLargeScaleVisualRecognitionCompetition(ILSVRC)
CNN
/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf
Intel?Xeon?ProcessorScalableFamily
NowbuildtheAIyouwantontheCPUyouknow
your
FOUNDATION
forAI
Getmaximumutilization
runningdatacenterandAIworkloadsside-by-side
Breakmemorybarriers
toapplyAItolargedatasetsandmodels
Trainmodelsatscale
throughefficientscalingtomanynodes
Accessoptimizedtools
includingcontinuousperformancegainsforTensorFlow*,MXNet*,more
Runinthecloud
includingAWS,Microsoft,Alibaba,TenCent,Google,Baidu,more
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
11X2
Intel?OptimizationforCaffeResNet-501InferenceThroughput
ContinuedInnovationDrivingDeepLearningInferencePerformanceOnIntel?Xeon?ScalableProcessors
Intel?DeepLearningBoost
2.8X2
RelativeInferenceThroughput(images/sec)(Higherisbetter)
FP32
5.4X
INT8
2
EnablingLowerprecision&systemoptimizationsforhigherthroughput
August1th2018
IntroducingnewINT8VNNIinstruction
ProjectedPerformance4
1.0
FP32
Intel?OptimizedCaffe
Atlaunch,July11th2017
WithnewlibraryandframeworkoptimizationsJan19th2018
Intel?Xeon?Platinum8180Processor(Codenamed:Skylake)
ProjectedFutureIntel?Xeon?ScalableProcessor
(Codename:CascadeLake)
1Intel?OptimizationforCaffeResnet-50performancedoesnotnecessarilyrepresentotherFrameworkperformance.
4Inferenceprojectionsassume100%sockettosocketscaling
2BasedonIntelinternaltesting:1X(7/11/2017),2.8X(1/19/2018)and5.4X(7/26/2018)performanceimprovementbasedonIntel?OptimizationforCaféResnet-50inferencethroughputperformanceonIntel?Xeon?ScalableProcessor.
311X(7/25/2018)ResultshavebeenestimatedusinginternalIntelanalysis,andprovidedtoyouforinformationalpurposes.Anydifferencesinyoursystemhardware,softwareorconfigurationmayaffectyouractualperformance.
Performanceresultsarebasedontestingasof7/11/2017(1x),1/19/2018(2.8x)&7/26/2018(5.4)andmaynotreflectallpublicallyavailablesecurityupdate.Noproductcanbeabsolutely.Seeconfigurationdisclosurefordetails.Noproductcanbeabsolutelysecure.OptimizationNotice:Intel'scompilersmayormaynotoptimizetothesamedegreefornon-IntelmicroprocessorsforoptimizationsthatarenotuniquetoIntelmicroprocessors.TheseoptimizationsincludeSSE2,SSE3,andSSSE3instructionsetsandotheroptimizations.Inteldoesnotguaranteetheavailability,functionality,oreffectivenessofanyoptimizationonmicroprocessorsnot
manufacturedbyIntel.Microprocessor-dependentoptimizationsinthisproductareintendedforusewithIntelmicroprocessors.CertainoptimizationsnotspecifictoIntelmicroarchitecturearereservedforIntelmicroprocessors.PleaserefertotheapplicableproductUserandReferenceGuidesformoreinformationregardingthe
specificinstructionsetscoveredbythisnotice.
Copyright?IntelCorporation2019
SoftwareandworkloadsusedinperformancetestsmayhavebeenoptimizedforperformanceonlyonIntelmicroprocessors.Performancetests,suchasSYSmarkandMobileMark,aremeasuredusingspecificcomputersystems,components,software,operationsandfunctions.Anychangetoanyofthosefactorsmaycausetheresultstovary.Youshouldconsultotherinformationandperformanceteststoassistyouinfullyevaluatingyourcontemplatedpurchases,includingtheperformanceofthatproductwhencombinedwithotherproducts.Formorecompleteinformationvisit:
/performance
Copyright?IntelCorporation2019
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
ConfigurationsforPerformanceGrowth-Inferencethroughput
1xinferencethroughputimprovementinJuly2017:
TestedbyIntelasofJuly11th2017:Platform:2SIntel?Xeon?Platinum8180CPU@2.50GHz(28cores),HTdisabled,turbodisabled,scalinggovernorsetto“performance”viaintel_pstatedriver,384GBDDR4-2666ECCRAM.CentOSLinuxrelease7.3.1611(Core),Linuxkernel3.10.0-514.10.2.el7.x86_64.SSD:Intel?SSDDCS3700Series(800GB,2.5inSATA6Gb/s,25nm,MLC).Performancemeasuredwith:Environmentvariables:KMP_AFFINITY='granularity=fine,compact‘,
OMP_NUM_THREADS=56,CPUFreqsetwithcpupowerfrequency-set-d2.5G-u3.8G-gperformance.Caffe:(
/intel/caffe/),
revisionf96b759f71b2281835f690af267158b82b150b5c.Inferencemeasuredwith“caffetime--forward_only”command,trainingmeasuredwith“caffetime”command.For“ConvNet”topologies,dummydatasetwasused.Forothertopologies,datawasstoredonlocalstorageandcachedinmemorybeforetraining.Topologyspecsfrom/intel/caffe/tree/master/models/intel_optimized_models(ResNet-50),and/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners(ConvNetbenchmarks;fileswereupdatedtousenewerCaffeprototxtformatbutarefunctionallyequivalent).IntelC++compilerver.17.0.220170213,IntelMKLsmalllibrariesversion2018.0.20170425.Cafferunwith“numactl-l“.
2.8xinferencethroughputimprovementinJanuary2018:
TestedbyIntelasofJan19th2018Processor:2socketIntel(R)Xeon(R)Platinum8180CPU@2.50GHz/28coresHTON,TurboONTotalMemory376.46GB(12slots/32GB/2666MHz).CentOSLinux-7.3.1611-Core,SSDsdaRS3WC080HDD744.1GB,sdb
RS3WC080HDD1.5TB,sdcRS3WC080HDD5.5TB,DeepLearningFrameworkIntel?Optimizationforcaffeversion:f6d01efbe93f70726ea3796a4b89c612365a6341Topology::resnet_50_v1BIOS:SE5C620.86B.00.01.0009.101920170742MKLDNN:version:ae00102be506ed0fe2099c6557df2aa88ad57ec1NoDataLayer..Datatype:FP32Batchsize=64Measured:652.68imgs/secvsTestedbyIntelasofJuly11th2017:Platform:2SIntel?Xeon?Platinum8180CPU@2.50GHz(28cores),HTdisabled,turbodisabled,scalinggovernorsetto“performance”viaintel_pstatedriver,384GBDDR4-2666ECCRAM.CentOSLinuxrelease7.3.1611(Core),Linuxkernel3.10.0-514.10.2.el7.x86_64.SSD:Intel?SSDDCS3700Series(800GB,2.5inSATA6Gb/s,25nm,MLC).Performancemeasuredwith:Environmentvariables:KMP_AFFINITY='granularity=fine,compact‘,OMP_NUM_THREADS=56,CPUFreqsetwithcpupowerfrequency-set-d2.5G-u3.8G-gperformance.Caffe:(
/intel/caffe/),
revisionf96b759f71b2281835f690af267158b82b150b5c.Inferencemeasuredwith“caffetime--forward_only”command,trainingmeasuredwith“caffetime”command.For“ConvNet”topologies,dummydatasetwasused.Forothertopologies,datawasstoredonlocalstorageandcachedinmemorybeforetraining.Topologyspecsfrom/intel/caffe/tree/master/models/intel_optimized_models(ResNet-50),and/soumith/convnet-
benchmarks/tree/master/caffe/imagenet_winners(ConvNetbenchmarks;fileswereupdatedtousenewerCaffeprototxtformatbutarefunctionallyequivalent).IntelC++compilerver.17.0.220170213,IntelMKLsmalllibrariesversion2018.0.20170425.Cafferunwith“numactl-l“.
5.4xinferencethroughputimprovementinAugust2018:
TestedbyIntelasofmeasuredJuly26th2018:2socketIntel(R)Xeon(R)Platinum8180CPU@2.50GHz/28coresHTON,TurboONTotalMemory376.46GB(12slots/32GB/2666MHz).CentOSLinux-7.3.1611-Core,kernel:3.10.0-862.3.3.el7.x86_64,SSDsdaRS3WC080HDD744.1GB,sdbRS3WC080HDD1.5TB,sdcRS3WC080HDD5.5TB,DeepLearningFrameworkIntel?Optimizationforcaffeversion:a3d5b022fe026e9092fc7abc7654b1162ab9940dTopology::resnet_50_v1
BIOS:SE5C620.86B.00.01.0013.030920180427MKLDNN:version:464c268e544bae26f9b85a2acb9122c766a4c396instances:2instancessocket:2(ResultsonIntel?Xeon?ScalableProcessorweremeasuredrunningmultipleinstancesofthe
framework.Methodologydescribedhere:/en-us/articles/boosting-deep-learning-training-inference-performance-on-xeon-and-xeon-phi)NoDataLayer.Datatype:INT8Batchsize=64Measured:1233.39imgs/secvsTestedbyIntelasofJuly11th2017:2SIntel?Xeon?Platinum8180CPU@2.50GHz(28cores),HTdisabled,turbodisabled,scalinggovernorsetto“performance”viaintel_pstatedriver,384GBDDR4-2666ECCRAM.CentOSLinuxrelease7.3.1611(Core),Linuxkernel3.10.0-514.10.2.el7.x86_64.SSD:Intel?SSDDCS3700Series(800GB,2.5inSATA6Gb/s,25nm,MLC).Performancemeasuredwith:Environmentvariables:KMP_AFFINITY='granularity=fine,compact‘,OMP_NUM_THREADS=56,CPUFreqsetwithcpupowerfrequency-set-d2.5G-u3.8G-gperformance.Caffe:(
/intel/caffe/),
revisionf96b759f71b2281835f690af267158b82b150b5c.Inferencemeasuredwith“caffetime--forward_only”command,trainingmeasuredwith“caffetime”command.For“ConvNet”topologies,dummydatasetwasused.Forothertopologies,datawasstoredonlocalstorageandcachedinmemorybeforetraining.Topologyspecsfrom/intel/caffe/tree/master/models/intel_optimized_models(ResNet-50).IntelC++compilerver.17.0.220170213,IntelMKLsmalllibrariesversion2018.0.20170425.Cafferunwith“numactl-l“.
11XinferencethroughputimprovementwithCascadeLake:
FutureIntelXeonScalableprocessor(codenameCascadeLake)resultshavebeenestimatedorsimulatedusinginternalIntelanalysisorarchitecturesimulationormodeling,andprovidedtoyouforinformationalpurposes.Anydifferencesinyoursystemhardware,softwareorconfigurationmayaffectyouractualperformancevsTestedbyIntelasofJuly11th2017:2SIntel?Xeon?Platinum8180CPU@2.50GHz(28cores),HTdisabled,turbodisabled,scalinggovernorsetto“performance”viaintel_pstatedriver,384GBDDR4-2666ECCRAM.CentOSLinuxrelease7.3.1611(Core),Linuxkernel3.10.0-514.10.2.el7.x86_64.SSD:Intel?SSDDCS3700Series(800GB,2.5inSATA6Gb/s,25nm,MLC).Performancemeasuredwith:Environmentvariables:KMP_AFFINITY='granularity=fine,compact‘,OMP_NUM_THREADS=56,CPUFreqsetwithcpupowerfrequency-set-d2.5G-u3.8G-gperformance.Caffe:(
/intel/caffe/),
revisionf96b759f71b2281835f690af267158b82b150b5c.Inferencemeasuredwith“caffetime--forward_only”command,trainingmeasuredwith“caffetime”command.For“ConvNet”topologies,dummydatasetwasused.Forothertopologies,datawasstoredonlocalstorageandcachedinmemorybeforetraining.Topologyspecsfrom/intel/caffe/tree/master/models/intel_optimized_models(ResNet-50),.IntelC++compilerver.17.0.220170213,IntelMKLsmalllibrariesversion2018.0.20170425.Cafferunwith“numactl-l“.
Copyright?IntelCorporation2019
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
PAGE
22
Intel?Xeon?ScalableProcessorsforAI
DeepLearningINFERENCE&DeepLearningTRAINING
Intel?AdvancedVectorExtensions512(Intel?AVX-512)
512-bitwidevectors,32operandregisters,864bmaskregisters,Embeddedbroadcast&rounding
GFLOPs,SystemPower
3500
3000
2500
2000
1500
LINPACKPerformance
3.1
2.8
2.5
1178 2034
3259
2.1
3.5
3
CoreFrequency
2.5
2
1.5
6
GFLOPs/Watt
4.83
2.92
1.00
1.74
NormalizedtoSSE4.2GFLOPs/Watt
4
2
0
SSE4.2 AVX AVX2 AVX512
1000
500
0
669
760
SSE4.2
768
AVX
791
AVX2
767
AVX512
1
0.5
0
8 GFLOPs/GHz
NormalizedtoSSE4.2GFLOPs/GHz
6
4
1.95
2 1.00
3.77
7.19
InteGFlLO?Ps
AVPoXwer-(W5)
12Fredquenecy(lGHizv)
erssignifica0ntperformanceand
SSE4.2 AVX AVX2 AVX512
efficiencygains
Intelinternalmeasurements.SoftwareandworkloadsusedinperformancetestsmayhavebeenoptimizedforperformanceonlyonIntelmicroprocessors.Performancetests,suchasSYSmarkandMobileMark,aremeasuredusingspecificcomputersystems,components,software,operationsandfunctions.Anychangetoanyofthosefactorsmaycausetheresultstovary.Youshouldconsultotherinformationandperformanceteststoassistyouinfullyevaluatingyourcontemplatedpurchases,includingtheperformanceofthatproductwhencombinedwithotherproducts.ConfigurationSummary:1-Node,2xIntel?Xeon?Platinum8180ProcessoronPurley-EP(Lewisburg)(S2600WF)with384GB(12x32GBDDR4-2666)TotalMemory,IntelS3610800GBSSD,BIOS:SE5C620.86B.01.00.0471.040720170924,04/07/2017,RHELKernel:3.10.0-514.16.1.el7.x86_64x86_64,Benchmark:Intel?OptimizedMPLINPACK
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
Lowerprecisionintegerops
TypicalIntel?AVX-512instructiontoperformFP32convolutions:vfmadd231ps
Int8forInferenceonIntel?Xeon?ScalableProcessors
FP32
INPUTFP32
vfmadd231ps
INPUTFP32
OUTPUTFP32
64Ops/Cycle
INT8
IncreaseOperations/cycletoimprovethroughput
INPUTINT8
vpmaddubsw
INPUTINT8
85Ops/Cycle
TypicalIntel?AVX-512instructionstoperformINT8convolutions:vpmaddubsw,vpmadpdwedr,fvopardmddance
CONSTANTINT16
OUTPUTINT16
vpmaddwd
CONSTANTINT32
OUTPUTINT32
vpaddd
OUTPUTINT32
Copyright?IntelCorporation2019
ContinuedInnovationDrivingDeepLearningInferencePerformanceOnIntel?Xeon?ScalableProcessors
Intel?OptimizationforCaffeResNet-501InferenceThroughput
2
11X
Intel?DeepLearningBoost
RelativeInferenceThroughput(images/sec)(Higherisbetter)
IntroducingnewINT8VNNIinstruction
2.8X2
FP32
5.4X
INT8
2
EnablingLowerprecision&systemoptimizationsforhigherthroughput
August1th2018
1.0
WithnewlibraryandframeworkoptimizationsJan19th2018
FP32
2ndGenerationIntel?Xeon?ScalableProcessor
Intel?OptimizedCaffe
Atlaunch,July11th2017
Intel?Xeon?Platinum8180Processor
1Intel?OptimizationforCaffeResnet-50performancedoesnotnecessarilyrepresentotherFrameworkperformance.
4Inferenceprojectionsassume100%sockettosocketscaling
2BasedonIntelinternaltesting:1X(7/11/2017),2.8X(1/19/2018)and5.4X(7/26/2018)performanceimprovementbasedonIntel?OptimizationforCaféResnet-50inferencethroughputperformanceonIntel?Xeon?ScalableProcessor.SeeConfigurationDetails53
311X(7/25/2018)ResultshavebeenestimatedusinginternalIntelanalysis,andprovidedtoyouforinformationalpurposes.Anydifferencesinyoursystemhardware,softwareorconfigurationmayaffectyouractualperformance.
Performanceresultsarebasedontestingasof7/11/2017(1x),1/19/2018(2.8x)&7/26/2018(5.4)andmaynotreflectallpublicallyavailablesecurityupdate.Noproductcanbeabsolutely.Seeconfigurationdisclosurefordetails.Noproductcanbeabsolutelysecure.OptimizationNotice:Intel'scompilersmayormaynotoptimizetothesamedegreefornon-IntelmicroprocessorsforoptimizationsthatarenotuniquetoIntelmicroprocessors.TheseoptimizationsincludeSSE2,SSE3,andSSSE3instructionsetsandotheroptimizations.Inteldoesnotguaranteetheavailability,functionality,oreffectivenessofanyoptimizationonmicroprocessorsnot
manufacturedbyIntel.Microprocessor-dependentoptimizationsinthisproductareintendedforusewithIntelmicroprocessors.CertainoptimizationsnotspecifictoIntelmicroarchitecturearereservedforIntelmicroprocessors.PleaserefertotheapplicableproductUserandReferenceGuidesformoreinformationregardingthe
specificinstructionsetscoveredbythisnotice.
Copyright?IntelCorporation2019
SoftwareandworkloadsusedinperformancetestsmayhavebeenoptimizedforperformanceonlyonIntelmicroprocessors.Performancetests,suchasSYSmarkandMobileMark,aremeasuredusingspecificcomputersystems,components,software,operationsandfunctions.Anychangetoanyofthosefactorsmaycausetheresultstovary.Youshouldconsultotherinformationandperformanceteststoassistyouinfullyevaluatingyourcontemplatedpurchases,includingtheperformanceofthatproductwhencombinedwithotherproducts.Formorecompleteinformationvisit:
/performance
Copyright?IntelCorporation2019
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
TrainingPerformance:ResNet-50/ChestXRay14
Intel?2SXeon?Gold6148FprocessorbasedDellEMC*PowerEdgeC6420Zenith*ClusteronOPA?FabricTensorFlow*1.6+horovod*,IMPI
147
104
1
Intel?Xeon?nodes!
104xfasterusing128
RelativeTrainingThroughput(images/sec)(HigherisBetter)
160
140
120
147xfasterusing256Intel?Xeon?nodes!
120
100
120xfasterusing200Intel?Xeon?nodes!
80
60
40
20
0
Node=1,Workers=4GlobalBatchSize=64
Nodes=128,Workers=512GlobalBatchSize=8192
Nodes=200,Workers=800GlobalBatchSize=8000
Nodes=256,Workers=1024GlobalBatchSize=8192
PerformanceresultsarebasedontestingasofMay17,2018andmaynotreflectallpubliclyavailablesecurityupdate.Seeconfigurationdisclosurefordetails.Noproductcanbeabsolutelysecure.
OptimizationNotice:Intel'scompilersmayormaynotoptimizetothesamedegreefornon-IntelmicroprocessorsforoptimizationsthatarenotuniquetoIntelmicroprocessors.TheseoptimizationsincludeSSE2,SSE3,andSSSE3instructionsetsandotheroptimizations.Inteldoesnotguaranteetheavailability,functionality,oreffectivenessofanyoptimizationonmicroprocessorsnotmanufacturedbyIntel.Microprocessor-dependentoptimizationsinthisproductareintendedforusewithIntelmicroprocessors.CertainoptimizationsnotspecifictoIntel
microarchitecturearereservedforIntelmicroprocessors.PleaserefertotheapplicableproductUserandReferenceGuidesformoreinformationregardingthespecificinstructionsetscoveredbythisnotice.SoftwareandworkloadsusedinperformancetestsmayhavebeenoptimizedforperformanceonlyonIntelmicroprocessors.Performancetests,suchasSYSmarkandMobileMark,aremeasuredusingspecificcomputersystems,components,software,operationsandfunctions.Anychangetoanyofthosefactorsmaycausetheresultstovary.Youshouldconsultotherinformationandperformanceteststoassistyouinfullyevaluatingyourcontemplatedpurchases,includingtheperformanceofthatproductwhencombinedwithotherproducts.Formorecompleteinformationvisit:
/performance
Performanceresultsarebasedontestingasofdatesshowninconfigurationandmaynotreflectallpubliclyavailablesecurityupdates.Noproductcanbeabsolutelysecure.Seeconfigurationdisclosurefordetails.OptimizationNotice:
Intel'scompilersmayormaynotoptimizetothesamedegreefornon-IntelmicroprocessorsforoptimizationsthatarenotuniquetoIntelmicroprocessors.TheseoptimizationsincludeSSE2,SSE3,andSSSE3instructionsetsandother
optimizations.Inteldoesnotguaranteetheavailability,functionality,oreffectivenessofanyoptimizationonmicroprocessorsnotmanufacturedbyIntel.Microprocessor-dependentoptimizationsinthisproductareintendedforusewith
Intelmicroprocessors.CertainoptimizationsnotspecifictoIntelmicroarchitecturearereservedforIntelmicroprocessors.PleaserefertotheapplicableproductUserandReferenceGuidesformoreinformationregardingthespecific
instructionsetscoveredbythisnotice.SoftwareandworkloadsusedinperformancetestsmayhavebeenoptimizedforperformanceonlyonIntelmicroprocessors.Performancetests,suchasSYSmarkandMobileMark,aremeasuredusingspecificcomputersystems,components,software,operationsandfunctions.Anychangetoanyofthosefactorsmaycausetheresultstovary.Youshouldconsultotherinformationandperformanceteststoassistyouinfully
evaluatingyourcontemplatedpurchases,includingtheperformanceofthatproductwhencombinedwithotherproducts.Formorecompleteinformationvisit:
/performance
Copyright?IntelCorporation2019
*Othernamesandbrandsmaybeclaimedasthepropertyofothers.
AIPerformanceGrowthonIntel?Xeon?Processors
improvementwithlibraryandframeworkoptimizations,enablinglowerprecision&systemOptimizations
Thepicturecan'tbedisplayed.
improveduseofparallelizationandvectorization
285x
Ordersofmagnitudeimprovementindeeplearningperformance
Baseline
July2017
Thepicturecan'tbedisp5layed.0x
vs.Baseline
July2017Skylakelaunch
2SIntel?Xeon?ScalableProcessor(Skylake)
vs.Baseline
February2019
SoftwareOptimizationsandHardwarefeaturesdrivingDeepLearningPerformanceonIntel?Xeon?ScalableProcessors
15.7xinferencethroughputimprovementwithIntel?OptimizationsforCaffeResNet-50onIntel?Xeon?Platinum8180ProcessorinFeb2019comparedtoperformanceatlaunchinJuly2017.SeeconfigurationdetailsonConfig1
OptimizedDeepLearningFrameworksandToolkits
GenonGenPerformancegainsforResNet-50withIntel?DLBoost
2SIntel?Xeon?Platin
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2026上半年安徽事業(yè)單位聯(lián)考安慶市宜秀區(qū)招聘23人筆試參考題庫及答案解析
- 2026新疆第一師阿拉爾市大學(xué)生鄉(xiāng)村醫(yī)生專項計劃招聘13人筆試參考題庫及答案解析
- 2026湖南鎂宇科技有限公司第一次招聘8人筆試參考題庫及答案解析
- 2026新疆鴻聯(lián)建設(shè)工程項目管理咨詢有限公司哈密分公司招聘12人考試備考題庫及答案解析
- 2026中國太平洋壽險安順中支招聘13人考試參考題庫及答案解析
- 北京順義高麗營社區(qū)衛(wèi)生服務(wù)中心招聘3人筆試模擬試題及答案解析
- 2026年鋼鐵冶煉高溫防護(hù)措施
- 2026年材料力學(xué)性能實(shí)驗(yàn)中的模塊化設(shè)計
- 2026年甘肅省隴南市武都區(qū)馬營中心衛(wèi)生院金廠分院鄉(xiāng)村醫(yī)生招聘考試備考題庫及答案解析
- 2026上半年貴州事業(yè)單位聯(lián)考大方縣招聘210人筆試模擬試題及答案解析
- 2026貴州省省、市兩級機(jī)關(guān)遴選公務(wù)員357人考試備考題庫及答案解析
- 兒童心律失常診療指南(2025年版)
- 北京通州產(chǎn)業(yè)服務(wù)有限公司招聘備考題庫必考題
- (正式版)DBJ33∕T 1307-2023 《 微型鋼管樁加固技術(shù)規(guī)程》
- 2026年基金從業(yè)資格證考試題庫500道含答案(完整版)
- 2025年寵物疫苗行業(yè)競爭格局與研發(fā)進(jìn)展報告
- 綠化防寒合同范本
- 2025年中國礦產(chǎn)資源集團(tuán)所屬單位招聘筆試參考題庫附帶答案詳解(3卷)
- 氣體滅火系統(tǒng)維護(hù)與保養(yǎng)方案
- GB/T 10922-202555°非密封管螺紋量規(guī)
- ESD護(hù)理教學(xué)查房
評論
0/150
提交評論