快速 Python 分析和深度CPU上的學(xué)習(xí)框架_第1頁
快速 Python 分析和深度CPU上的學(xué)習(xí)框架_第2頁
快速 Python 分析和深度CPU上的學(xué)習(xí)框架_第3頁
快速 Python 分析和深度CPU上的學(xué)習(xí)框架_第4頁
快速 Python 分析和深度CPU上的學(xué)習(xí)框架_第5頁
已閱讀5頁,還剩79頁未讀, 繼續(xù)免費(fèi)閱讀

付費(fèi)下載

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

FastPython*AnalyticsandDeepLearningFrameworksonCPU

Intel?AccelerationsforAI

NathanGreeneltch,PhD

ConsultingEngineer,IntelCorporation

1oo

CJ

10

101

n1

10

01 01

)

1

1

0100

1

°

1

1101

。

。。司。

。。

Softwar

Intel?AlFramework

Accelerations e

00

1

0.

1[1

0

01

Intel?Python*AccelerationsIntel?DAALforPython*Analytics

GetDeepLearningFrameworkPerformanceonIntel?Architecture

Intel?OptimizedAIFrameworks

NathanGreeneltch,PhD

ConsultingEngineer,IntelCorporation

PAGE

4

Copyright?IntelCorporation2019

*Othernamesandbrandsmaybeclaimedasthepropertyofothers.

DefineTheProblem

Consumer

Health

Finance

Retail

Government

EnergyTranspo

Industrial

Other

SmartAssistants

Enhanced

Algorithmic

Support

Defense

Oil&Gas

In-Vehicle

Factory

Advertising

ChatbotsSearchPersonalization

AugmentedReality

Robots

Copyright?IntelCo

*Othernamesandb

Diagnostics

DrugDiscovery

PatientCare

Research

SensoryAids

rporation2019

andsmaybeclaimedasthepro

Trading

FraudDetectionResearchPersonalFinanceRiskMitigation

pertyofothers.

ExperienceMarketingMerchandisingLoyalty

SupplyChainSecurity

DataInsights

Safety&Security

ResidentEngagement

SmarterCities

Exploration

SmartGrid

OperationalImprovement

Conservation

Experience

AutomatedDriving

AerospaceShipping

Search&Rescue

Automation

PredictiveMaintenance

PrecisionAgriculture

FieldAutomation

Education

Gaming

Professional&ITServices

Telco/MediaSports

Source:Intelforecas

6

ArtificialIntelligenceWillTransform…

rt

r

t

10X

World’sdatawillgrowin10years

Copyright?IntelCorporation2019

*Othernamesandbrandsmaybeclaimedasthepropertyofothers.

Source:Seagate“DataAge2025”July2017

ForbesMagazine:20Mind-BogglingFactsEveryBusinessLeaderMustReflectOnNow(Nov1,2015);InsideBigData:ExponentialGrowthofData(Feb16,2017)

Yetlessthan1%of

alldata

iseveranalyzed

andused

Copyright?IntelCorporation2019

*Othernamesandbrandsmaybeclaimedasthepropertyofothers.

PAGE

9

CPUAI

Intel?DLBoost

ArtificialIntelligenceeverywhere

Inference

ClientIoTEdge

VPUEYEQ iGPUFPGA

Accelerators

GPUFPGA

Datacenter

DiscreteAI

Training

Intel?NNP-L

GPU

CommonsoftwarestackEcosystem

IntelAIStrategy:“ArtificialIntelligenceEverywhere”

AI(ML&DL)SoftwareStackforIntel?Processors

DeeplearningandAIecosystemincludesedgeanddatacenterapplications.

Opensourceframeworks(TensorFlow*,MXNet*,PyTorch*,PaddlePaddle*)

Inteldeeplearningproducts(BigDL,OpenVINO?toolkit)

In-houseuserapplications

Intel?MKLandIntel?MKL-DNNoptimizedeeplearningandmachinelearningapplicationsforIntel?processors:

Throughthecollaborationwithframeworkmaintainerstoupstreamchanges(Tensorflow*,MXNet*,PyTorch,PaddlePaddle*)

ThroughIntel-optimizedforks(Caffe*)

Bypartneringtoenableproprietarysolutions

IntelMKL

IntelMKL-DNN

IntelProcessors

Intel?MathKernelLibraryforDeepNeuralNetworks(Intel?MKL-DNN)isanopensourceperformancelibraryfordeeplearningapplications(availableat/intel/mkl-dnn)

FastopensourceimplementationsforwiderangeofDNNfunctions

Earlyaccesstonewandexperimentalfunctionality

Openforcommunitycontributions

Intel?MathKernelLibrary(Intel?MKL)isaproprietaryperformancelibraryforwiderangeofmathandscienceapplications

Distribution:IntelRegistrationCenter,packagerepositories(apt,yum,conda,pip),Intel?ParallelStudioXE,Intel?SystemStudio

Copyright?IntelCorporation2019

*Othernamesandbrandsmaybeclaimedasthepropertyofothers.

PopularDLFrameworksarenowoptimizedforCPU!

FOR

*

*

*

*

Seeinstallationguidesat/framework-optimizations/

*

Moreunderoptimization: *

SEEALSO:MachineLearningLibrariesforPython(Scikit-learn,Pandas,NumPy),R(Cart,randomForest,e1071),Distributed(MlLibonSpark,Mahout)

*Limitedavailabilitytoday

Othernamesandbrandsmaybeclaimedasthepropertyofothers.

DefineTheProblem

Artificial

Intelligence

istheabilityofmachinesto

learnfromexperiencewithoutexplicitprogramming,inorder

toperformcognitivefunctionsassociatedwiththehumanmind

ArtificialIntelligence

Machinelearning

Algorithmswhoseperformanceimproveastheyareexposedtomoredataovertime

Deeplearning

Subsetofmachine

learninginwhichmulti-layeredneuralnetworkslearnfromvastamountsofdata

Intel?DAALFocus

MachineLearningTechnologyBreakdown

MachineLearning

Autonomouscomputationmethodsthatlearnfromexperience(data)

DeepLearning

Hierarchicalapproachwithmanyhiddenlayers-gainingfamefromaccuratelyclassifying

data-likeimages,speech,andnaturallanguage.

Featuresarelearned.

“dog”

Typicalcustomers:CSP,HPC

Other(orclassic)ML

TraditionalMLtechniquesforclustering,regression,andclassificationusingveryfew(oneortwo)hiddenlayers.Requiresfeatureengineering.

Typicalcustomers:Enterprise,HPC

Training

Trainanalgorithmtobuildamodel

Time-to-modeliscritical

Inference

Deploymodelsforclassification,prediction,recognition

Easilydistributed

Criteria:Throughput,TCO@scale

MachineLearningTechnologyBreakdown

DeepLearning

Hierarchicalapproachwithmanyhiddenlayers-gainingfamefromaccuratelyclassifying

data-likeimages,speech,andnaturallanguage.

Featuresarelearned.

Training

Trainanalgorithmtobuildamodel

Time-to-modeliscritical

“dog”

Inference

Deploymodelsforclassification,prediction,recognition

Easilydistributed

Criteria:Throughput,TCO@scale

Typicalcustomers:CSP,HPC

DeepLearningBreakthroughs

recognitio9n7%

person

Human

recognition

99

“pl

Human

Machinesabletomeetorexceedhumanimage&speechrecognition

30%

Error

23%

15%

Image

30%

Error

23%

15%

Speech

%

aysong”

8% 8%

0%

2010 Present

0%

2000 Present

e.g.

DocumentOil&Ga

Defect

Genome

TumorDetectio

Sorting search

detectio

sequencig

VoiceAssistan

Source:ILSVRCImageNetwinningentryclassificationerrorrateeachyear2010-2016(Left),https://

/en-us/research/blog/microsoft-researchers-achieve-new-conversational-speech-recognition-milestone/

(Right)

DepthofNetworks

ImageNetLargeScaleVisualRecognitionCompetition(ILSVRC)

CNN

/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf

Intel?Xeon?ProcessorScalableFamily

NowbuildtheAIyouwantontheCPUyouknow

your

FOUNDATION

forAI

Getmaximumutilization

runningdatacenterandAIworkloadsside-by-side

Breakmemorybarriers

toapplyAItolargedatasetsandmodels

Trainmodelsatscale

throughefficientscalingtomanynodes

Accessoptimizedtools

includingcontinuousperformancegainsforTensorFlow*,MXNet*,more

Runinthecloud

includingAWS,Microsoft,Alibaba,TenCent,Google,Baidu,more

*Othernamesandbrandsmaybeclaimedasthepropertyofothers.

11X2

Intel?OptimizationforCaffeResNet-501InferenceThroughput

ContinuedInnovationDrivingDeepLearningInferencePerformanceOnIntel?Xeon?ScalableProcessors

Intel?DeepLearningBoost

2.8X2

RelativeInferenceThroughput(images/sec)(Higherisbetter)

FP32

5.4X

INT8

2

EnablingLowerprecision&systemoptimizationsforhigherthroughput

August1th2018

IntroducingnewINT8VNNIinstruction

ProjectedPerformance4

1.0

FP32

Intel?OptimizedCaffe

Atlaunch,July11th2017

WithnewlibraryandframeworkoptimizationsJan19th2018

Intel?Xeon?Platinum8180Processor(Codenamed:Skylake)

ProjectedFutureIntel?Xeon?ScalableProcessor

(Codename:CascadeLake)

1Intel?OptimizationforCaffeResnet-50performancedoesnotnecessarilyrepresentotherFrameworkperformance.

4Inferenceprojectionsassume100%sockettosocketscaling

2BasedonIntelinternaltesting:1X(7/11/2017),2.8X(1/19/2018)and5.4X(7/26/2018)performanceimprovementbasedonIntel?OptimizationforCaféResnet-50inferencethroughputperformanceonIntel?Xeon?ScalableProcessor.

311X(7/25/2018)ResultshavebeenestimatedusinginternalIntelanalysis,andprovidedtoyouforinformationalpurposes.Anydifferencesinyoursystemhardware,softwareorconfigurationmayaffectyouractualperformance.

Performanceresultsarebasedontestingasof7/11/2017(1x),1/19/2018(2.8x)&7/26/2018(5.4)andmaynotreflectallpublicallyavailablesecurityupdate.Noproductcanbeabsolutely.Seeconfigurationdisclosurefordetails.Noproductcanbeabsolutelysecure.OptimizationNotice:Intel'scompilersmayormaynotoptimizetothesamedegreefornon-IntelmicroprocessorsforoptimizationsthatarenotuniquetoIntelmicroprocessors.TheseoptimizationsincludeSSE2,SSE3,andSSSE3instructionsetsandotheroptimizations.Inteldoesnotguaranteetheavailability,functionality,oreffectivenessofanyoptimizationonmicroprocessorsnot

manufacturedbyIntel.Microprocessor-dependentoptimizationsinthisproductareintendedforusewithIntelmicroprocessors.CertainoptimizationsnotspecifictoIntelmicroarchitecturearereservedforIntelmicroprocessors.PleaserefertotheapplicableproductUserandReferenceGuidesformoreinformationregardingthe

specificinstructionsetscoveredbythisnotice.

Copyright?IntelCorporation2019

SoftwareandworkloadsusedinperformancetestsmayhavebeenoptimizedforperformanceonlyonIntelmicroprocessors.Performancetests,suchasSYSmarkandMobileMark,aremeasuredusingspecificcomputersystems,components,software,operationsandfunctions.Anychangetoanyofthosefactorsmaycausetheresultstovary.Youshouldconsultotherinformationandperformanceteststoassistyouinfullyevaluatingyourcontemplatedpurchases,includingtheperformanceofthatproductwhencombinedwithotherproducts.Formorecompleteinformationvisit:

/performance

Copyright?IntelCorporation2019

*Othernamesandbrandsmaybeclaimedasthepropertyofothers.

ConfigurationsforPerformanceGrowth-Inferencethroughput

1xinferencethroughputimprovementinJuly2017:

TestedbyIntelasofJuly11th2017:Platform:2SIntel?Xeon?Platinum8180CPU@2.50GHz(28cores),HTdisabled,turbodisabled,scalinggovernorsetto“performance”viaintel_pstatedriver,384GBDDR4-2666ECCRAM.CentOSLinuxrelease7.3.1611(Core),Linuxkernel3.10.0-514.10.2.el7.x86_64.SSD:Intel?SSDDCS3700Series(800GB,2.5inSATA6Gb/s,25nm,MLC).Performancemeasuredwith:Environmentvariables:KMP_AFFINITY='granularity=fine,compact‘,

OMP_NUM_THREADS=56,CPUFreqsetwithcpupowerfrequency-set-d2.5G-u3.8G-gperformance.Caffe:(

/intel/caffe/),

revisionf96b759f71b2281835f690af267158b82b150b5c.Inferencemeasuredwith“caffetime--forward_only”command,trainingmeasuredwith“caffetime”command.For“ConvNet”topologies,dummydatasetwasused.Forothertopologies,datawasstoredonlocalstorageandcachedinmemorybeforetraining.Topologyspecsfrom/intel/caffe/tree/master/models/intel_optimized_models(ResNet-50),and/soumith/convnet-benchmarks/tree/master/caffe/imagenet_winners(ConvNetbenchmarks;fileswereupdatedtousenewerCaffeprototxtformatbutarefunctionallyequivalent).IntelC++compilerver.17.0.220170213,IntelMKLsmalllibrariesversion2018.0.20170425.Cafferunwith“numactl-l“.

2.8xinferencethroughputimprovementinJanuary2018:

TestedbyIntelasofJan19th2018Processor:2socketIntel(R)Xeon(R)Platinum8180CPU@2.50GHz/28coresHTON,TurboONTotalMemory376.46GB(12slots/32GB/2666MHz).CentOSLinux-7.3.1611-Core,SSDsdaRS3WC080HDD744.1GB,sdb

RS3WC080HDD1.5TB,sdcRS3WC080HDD5.5TB,DeepLearningFrameworkIntel?Optimizationforcaffeversion:f6d01efbe93f70726ea3796a4b89c612365a6341Topology::resnet_50_v1BIOS:SE5C620.86B.00.01.0009.101920170742MKLDNN:version:ae00102be506ed0fe2099c6557df2aa88ad57ec1NoDataLayer..Datatype:FP32Batchsize=64Measured:652.68imgs/secvsTestedbyIntelasofJuly11th2017:Platform:2SIntel?Xeon?Platinum8180CPU@2.50GHz(28cores),HTdisabled,turbodisabled,scalinggovernorsetto“performance”viaintel_pstatedriver,384GBDDR4-2666ECCRAM.CentOSLinuxrelease7.3.1611(Core),Linuxkernel3.10.0-514.10.2.el7.x86_64.SSD:Intel?SSDDCS3700Series(800GB,2.5inSATA6Gb/s,25nm,MLC).Performancemeasuredwith:Environmentvariables:KMP_AFFINITY='granularity=fine,compact‘,OMP_NUM_THREADS=56,CPUFreqsetwithcpupowerfrequency-set-d2.5G-u3.8G-gperformance.Caffe:(

/intel/caffe/),

revisionf96b759f71b2281835f690af267158b82b150b5c.Inferencemeasuredwith“caffetime--forward_only”command,trainingmeasuredwith“caffetime”command.For“ConvNet”topologies,dummydatasetwasused.Forothertopologies,datawasstoredonlocalstorageandcachedinmemorybeforetraining.Topologyspecsfrom/intel/caffe/tree/master/models/intel_optimized_models(ResNet-50),and/soumith/convnet-

benchmarks/tree/master/caffe/imagenet_winners(ConvNetbenchmarks;fileswereupdatedtousenewerCaffeprototxtformatbutarefunctionallyequivalent).IntelC++compilerver.17.0.220170213,IntelMKLsmalllibrariesversion2018.0.20170425.Cafferunwith“numactl-l“.

5.4xinferencethroughputimprovementinAugust2018:

TestedbyIntelasofmeasuredJuly26th2018:2socketIntel(R)Xeon(R)Platinum8180CPU@2.50GHz/28coresHTON,TurboONTotalMemory376.46GB(12slots/32GB/2666MHz).CentOSLinux-7.3.1611-Core,kernel:3.10.0-862.3.3.el7.x86_64,SSDsdaRS3WC080HDD744.1GB,sdbRS3WC080HDD1.5TB,sdcRS3WC080HDD5.5TB,DeepLearningFrameworkIntel?Optimizationforcaffeversion:a3d5b022fe026e9092fc7abc7654b1162ab9940dTopology::resnet_50_v1

BIOS:SE5C620.86B.00.01.0013.030920180427MKLDNN:version:464c268e544bae26f9b85a2acb9122c766a4c396instances:2instancessocket:2(ResultsonIntel?Xeon?ScalableProcessorweremeasuredrunningmultipleinstancesofthe

framework.Methodologydescribedhere:/en-us/articles/boosting-deep-learning-training-inference-performance-on-xeon-and-xeon-phi)NoDataLayer.Datatype:INT8Batchsize=64Measured:1233.39imgs/secvsTestedbyIntelasofJuly11th2017:2SIntel?Xeon?Platinum8180CPU@2.50GHz(28cores),HTdisabled,turbodisabled,scalinggovernorsetto“performance”viaintel_pstatedriver,384GBDDR4-2666ECCRAM.CentOSLinuxrelease7.3.1611(Core),Linuxkernel3.10.0-514.10.2.el7.x86_64.SSD:Intel?SSDDCS3700Series(800GB,2.5inSATA6Gb/s,25nm,MLC).Performancemeasuredwith:Environmentvariables:KMP_AFFINITY='granularity=fine,compact‘,OMP_NUM_THREADS=56,CPUFreqsetwithcpupowerfrequency-set-d2.5G-u3.8G-gperformance.Caffe:(

/intel/caffe/),

revisionf96b759f71b2281835f690af267158b82b150b5c.Inferencemeasuredwith“caffetime--forward_only”command,trainingmeasuredwith“caffetime”command.For“ConvNet”topologies,dummydatasetwasused.Forothertopologies,datawasstoredonlocalstorageandcachedinmemorybeforetraining.Topologyspecsfrom/intel/caffe/tree/master/models/intel_optimized_models(ResNet-50).IntelC++compilerver.17.0.220170213,IntelMKLsmalllibrariesversion2018.0.20170425.Cafferunwith“numactl-l“.

11XinferencethroughputimprovementwithCascadeLake:

FutureIntelXeonScalableprocessor(codenameCascadeLake)resultshavebeenestimatedorsimulatedusinginternalIntelanalysisorarchitecturesimulationormodeling,andprovidedtoyouforinformationalpurposes.Anydifferencesinyoursystemhardware,softwareorconfigurationmayaffectyouractualperformancevsTestedbyIntelasofJuly11th2017:2SIntel?Xeon?Platinum8180CPU@2.50GHz(28cores),HTdisabled,turbodisabled,scalinggovernorsetto“performance”viaintel_pstatedriver,384GBDDR4-2666ECCRAM.CentOSLinuxrelease7.3.1611(Core),Linuxkernel3.10.0-514.10.2.el7.x86_64.SSD:Intel?SSDDCS3700Series(800GB,2.5inSATA6Gb/s,25nm,MLC).Performancemeasuredwith:Environmentvariables:KMP_AFFINITY='granularity=fine,compact‘,OMP_NUM_THREADS=56,CPUFreqsetwithcpupowerfrequency-set-d2.5G-u3.8G-gperformance.Caffe:(

/intel/caffe/),

revisionf96b759f71b2281835f690af267158b82b150b5c.Inferencemeasuredwith“caffetime--forward_only”command,trainingmeasuredwith“caffetime”command.For“ConvNet”topologies,dummydatasetwasused.Forothertopologies,datawasstoredonlocalstorageandcachedinmemorybeforetraining.Topologyspecsfrom/intel/caffe/tree/master/models/intel_optimized_models(ResNet-50),.IntelC++compilerver.17.0.220170213,IntelMKLsmalllibrariesversion2018.0.20170425.Cafferunwith“numactl-l“.

Copyright?IntelCorporation2019

*Othernamesandbrandsmaybeclaimedasthepropertyofothers.

PAGE

22

Intel?Xeon?ScalableProcessorsforAI

DeepLearningINFERENCE&DeepLearningTRAINING

Intel?AdvancedVectorExtensions512(Intel?AVX-512)

512-bitwidevectors,32operandregisters,864bmaskregisters,Embeddedbroadcast&rounding

GFLOPs,SystemPower

3500

3000

2500

2000

1500

LINPACKPerformance

3.1

2.8

2.5

1178 2034

3259

2.1

3.5

3

CoreFrequency

2.5

2

1.5

6

GFLOPs/Watt

4.83

2.92

1.00

1.74

NormalizedtoSSE4.2GFLOPs/Watt

4

2

0

SSE4.2 AVX AVX2 AVX512

1000

500

0

669

760

SSE4.2

768

AVX

791

AVX2

767

AVX512

1

0.5

0

8 GFLOPs/GHz

NormalizedtoSSE4.2GFLOPs/GHz

6

4

1.95

2 1.00

3.77

7.19

InteGFlLO?Ps

AVPoXwer-(W5)

12Fredquenecy(lGHizv)

erssignifica0ntperformanceand

SSE4.2 AVX AVX2 AVX512

efficiencygains

Intelinternalmeasurements.SoftwareandworkloadsusedinperformancetestsmayhavebeenoptimizedforperformanceonlyonIntelmicroprocessors.Performancetests,suchasSYSmarkandMobileMark,aremeasuredusingspecificcomputersystems,components,software,operationsandfunctions.Anychangetoanyofthosefactorsmaycausetheresultstovary.Youshouldconsultotherinformationandperformanceteststoassistyouinfullyevaluatingyourcontemplatedpurchases,includingtheperformanceofthatproductwhencombinedwithotherproducts.ConfigurationSummary:1-Node,2xIntel?Xeon?Platinum8180ProcessoronPurley-EP(Lewisburg)(S2600WF)with384GB(12x32GBDDR4-2666)TotalMemory,IntelS3610800GBSSD,BIOS:SE5C620.86B.01.00.0471.040720170924,04/07/2017,RHELKernel:3.10.0-514.16.1.el7.x86_64x86_64,Benchmark:Intel?OptimizedMPLINPACK

*Othernamesandbrandsmaybeclaimedasthepropertyofothers.

Lowerprecisionintegerops

TypicalIntel?AVX-512instructiontoperformFP32convolutions:vfmadd231ps

Int8forInferenceonIntel?Xeon?ScalableProcessors

FP32

INPUTFP32

vfmadd231ps

INPUTFP32

OUTPUTFP32

64Ops/Cycle

INT8

IncreaseOperations/cycletoimprovethroughput

INPUTINT8

vpmaddubsw

INPUTINT8

85Ops/Cycle

TypicalIntel?AVX-512instructionstoperformINT8convolutions:vpmaddubsw,vpmadpdwedr,fvopardmddance

CONSTANTINT16

OUTPUTINT16

vpmaddwd

CONSTANTINT32

OUTPUTINT32

vpaddd

OUTPUTINT32

Copyright?IntelCorporation2019

ContinuedInnovationDrivingDeepLearningInferencePerformanceOnIntel?Xeon?ScalableProcessors

Intel?OptimizationforCaffeResNet-501InferenceThroughput

2

11X

Intel?DeepLearningBoost

RelativeInferenceThroughput(images/sec)(Higherisbetter)

IntroducingnewINT8VNNIinstruction

2.8X2

FP32

5.4X

INT8

2

EnablingLowerprecision&systemoptimizationsforhigherthroughput

August1th2018

1.0

WithnewlibraryandframeworkoptimizationsJan19th2018

FP32

2ndGenerationIntel?Xeon?ScalableProcessor

Intel?OptimizedCaffe

Atlaunch,July11th2017

Intel?Xeon?Platinum8180Processor

1Intel?OptimizationforCaffeResnet-50performancedoesnotnecessarilyrepresentotherFrameworkperformance.

4Inferenceprojectionsassume100%sockettosocketscaling

2BasedonIntelinternaltesting:1X(7/11/2017),2.8X(1/19/2018)and5.4X(7/26/2018)performanceimprovementbasedonIntel?OptimizationforCaféResnet-50inferencethroughputperformanceonIntel?Xeon?ScalableProcessor.SeeConfigurationDetails53

311X(7/25/2018)ResultshavebeenestimatedusinginternalIntelanalysis,andprovidedtoyouforinformationalpurposes.Anydifferencesinyoursystemhardware,softwareorconfigurationmayaffectyouractualperformance.

Performanceresultsarebasedontestingasof7/11/2017(1x),1/19/2018(2.8x)&7/26/2018(5.4)andmaynotreflectallpublicallyavailablesecurityupdate.Noproductcanbeabsolutely.Seeconfigurationdisclosurefordetails.Noproductcanbeabsolutelysecure.OptimizationNotice:Intel'scompilersmayormaynotoptimizetothesamedegreefornon-IntelmicroprocessorsforoptimizationsthatarenotuniquetoIntelmicroprocessors.TheseoptimizationsincludeSSE2,SSE3,andSSSE3instructionsetsandotheroptimizations.Inteldoesnotguaranteetheavailability,functionality,oreffectivenessofanyoptimizationonmicroprocessorsnot

manufacturedbyIntel.Microprocessor-dependentoptimizationsinthisproductareintendedforusewithIntelmicroprocessors.CertainoptimizationsnotspecifictoIntelmicroarchitecturearereservedforIntelmicroprocessors.PleaserefertotheapplicableproductUserandReferenceGuidesformoreinformationregardingthe

specificinstructionsetscoveredbythisnotice.

Copyright?IntelCorporation2019

SoftwareandworkloadsusedinperformancetestsmayhavebeenoptimizedforperformanceonlyonIntelmicroprocessors.Performancetests,suchasSYSmarkandMobileMark,aremeasuredusingspecificcomputersystems,components,software,operationsandfunctions.Anychangetoanyofthosefactorsmaycausetheresultstovary.Youshouldconsultotherinformationandperformanceteststoassistyouinfullyevaluatingyourcontemplatedpurchases,includingtheperformanceofthatproductwhencombinedwithotherproducts.Formorecompleteinformationvisit:

/performance

Copyright?IntelCorporation2019

*Othernamesandbrandsmaybeclaimedasthepropertyofothers.

TrainingPerformance:ResNet-50/ChestXRay14

Intel?2SXeon?Gold6148FprocessorbasedDellEMC*PowerEdgeC6420Zenith*ClusteronOPA?FabricTensorFlow*1.6+horovod*,IMPI

147

104

1

Intel?Xeon?nodes!

104xfasterusing128

RelativeTrainingThroughput(images/sec)(HigherisBetter)

160

140

120

147xfasterusing256Intel?Xeon?nodes!

120

100

120xfasterusing200Intel?Xeon?nodes!

80

60

40

20

0

Node=1,Workers=4GlobalBatchSize=64

Nodes=128,Workers=512GlobalBatchSize=8192

Nodes=200,Workers=800GlobalBatchSize=8000

Nodes=256,Workers=1024GlobalBatchSize=8192

PerformanceresultsarebasedontestingasofMay17,2018andmaynotreflectallpubliclyavailablesecurityupdate.Seeconfigurationdisclosurefordetails.Noproductcanbeabsolutelysecure.

OptimizationNotice:Intel'scompilersmayormaynotoptimizetothesamedegreefornon-IntelmicroprocessorsforoptimizationsthatarenotuniquetoIntelmicroprocessors.TheseoptimizationsincludeSSE2,SSE3,andSSSE3instructionsetsandotheroptimizations.Inteldoesnotguaranteetheavailability,functionality,oreffectivenessofanyoptimizationonmicroprocessorsnotmanufacturedbyIntel.Microprocessor-dependentoptimizationsinthisproductareintendedforusewithIntelmicroprocessors.CertainoptimizationsnotspecifictoIntel

microarchitecturearereservedforIntelmicroprocessors.PleaserefertotheapplicableproductUserandReferenceGuidesformoreinformationregardingthespecificinstructionsetscoveredbythisnotice.SoftwareandworkloadsusedinperformancetestsmayhavebeenoptimizedforperformanceonlyonIntelmicroprocessors.Performancetests,suchasSYSmarkandMobileMark,aremeasuredusingspecificcomputersystems,components,software,operationsandfunctions.Anychangetoanyofthosefactorsmaycausetheresultstovary.Youshouldconsultotherinformationandperformanceteststoassistyouinfullyevaluatingyourcontemplatedpurchases,includingtheperformanceofthatproductwhencombinedwithotherproducts.Formorecompleteinformationvisit:

/performance

Performanceresultsarebasedontestingasofdatesshowninconfigurationandmaynotreflectallpubliclyavailablesecurityupdates.Noproductcanbeabsolutelysecure.Seeconfigurationdisclosurefordetails.OptimizationNotice:

Intel'scompilersmayormaynotoptimizetothesamedegreefornon-IntelmicroprocessorsforoptimizationsthatarenotuniquetoIntelmicroprocessors.TheseoptimizationsincludeSSE2,SSE3,andSSSE3instructionsetsandother

optimizations.Inteldoesnotguaranteetheavailability,functionality,oreffectivenessofanyoptimizationonmicroprocessorsnotmanufacturedbyIntel.Microprocessor-dependentoptimizationsinthisproductareintendedforusewith

Intelmicroprocessors.CertainoptimizationsnotspecifictoIntelmicroarchitecturearereservedforIntelmicroprocessors.PleaserefertotheapplicableproductUserandReferenceGuidesformoreinformationregardingthespecific

instructionsetscoveredbythisnotice.SoftwareandworkloadsusedinperformancetestsmayhavebeenoptimizedforperformanceonlyonIntelmicroprocessors.Performancetests,suchasSYSmarkandMobileMark,aremeasuredusingspecificcomputersystems,components,software,operationsandfunctions.Anychangetoanyofthosefactorsmaycausetheresultstovary.Youshouldconsultotherinformationandperformanceteststoassistyouinfully

evaluatingyourcontemplatedpurchases,includingtheperformanceofthatproductwhencombinedwithotherproducts.Formorecompleteinformationvisit:

/performance

Copyright?IntelCorporation2019

*Othernamesandbrandsmaybeclaimedasthepropertyofothers.

AIPerformanceGrowthonIntel?Xeon?Processors

improvementwithlibraryandframeworkoptimizations,enablinglowerprecision&systemOptimizations

Thepicturecan'tbedisplayed.

improveduseofparallelizationandvectorization

285x

Ordersofmagnitudeimprovementindeeplearningperformance

Baseline

July2017

Thepicturecan'tbedisp5layed.0x

vs.Baseline

July2017Skylakelaunch

2SIntel?Xeon?ScalableProcessor(Skylake)

vs.Baseline

February2019

SoftwareOptimizationsandHardwarefeaturesdrivingDeepLearningPerformanceonIntel?Xeon?ScalableProcessors

15.7xinferencethroughputimprovementwithIntel?OptimizationsforCaffeResNet-50onIntel?Xeon?Platinum8180ProcessorinFeb2019comparedtoperformanceatlaunchinJuly2017.SeeconfigurationdetailsonConfig1

OptimizedDeepLearningFrameworksandToolkits

GenonGenPerformancegainsforResNet-50withIntel?DLBoost

2SIntel?Xeon?Platin

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論