黃文波移動終端深度學(xué)習(xí)的優(yōu)化實踐

上傳人：1*** IP屬地：河南上傳時間：2023-12-02 格式：PPTX 頁數(shù)：48 大?。?.16MB 積分：6 舉報 版權(quán)申訴

已閱讀5頁，還剩43頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

黃文波移動終端深度學(xué)習(xí)的優(yōu)化實踐匯報人：美麗聯(lián)合集團(tuán)是專注服務(wù)女性的時尚消費平臺，成立于2016

年

月

日。美麗聯(lián)合集團(tuán)旗下包括：蘑菇街、美麗說、uni、銳鯊、MOGU

STATION等產(chǎn)品與服務(wù)。覆蓋時尚消費的各個領(lǐng)域，滿足不同年齡層、消費力和審美品位的女性用戶日常時尚資訊與時尚消費所需。集團(tuán)簡介日活用戶10

,000

+成交規(guī)模￥

,000

+時尚紅人120

,000

+注冊用戶數(shù)200

,000

+女性用戶占比

95%+移動用戶占比95%+整體數(shù)據(jù)主要內(nèi)容背景與現(xiàn)狀模型壓縮與設(shè)計移動端實踐總結(jié)01背景及現(xiàn)狀深度學(xué)習(xí)：從云端到邊緣計算蘑菇街為什么做深度學(xué)習(xí)優(yōu)化？服務(wù)器減少訓(xùn)練、預(yù)測的時間節(jié)約GPU資源，節(jié)約電移動端實時響應(yīng)需求本地化運行，減少服務(wù)器壓力保護(hù)用戶隱私CNN基礎(chǔ)CNN基礎(chǔ)模型越來越大越多的存儲和計算耗費越多能量移動設(shè)備：內(nèi)存有限、計算性能有限、功耗有限深度學(xué)習(xí)：網(wǎng)絡(luò)越來越深，準(zhǔn)確率越來越高Challenge02模型壓縮與設(shè)計PruningQuantizationHuffman

EncodingModel

CompressionWeight-Level

Pruning

for

the

sparse

connectionsHan

al,

“Learning

both

weights

and

connections

for

efficient

neural

networks”,

NIPS

2015PruningChannel-Level

Pruning

and

retraining

iterativelyLi

al,

“Pruning

filter

for

efficient

convnets”,

ICLR

2017PruningChannel-Level

Pruning

with

regularizationLiu

al,

“Learning

efficient

convolutional

networks

through

network

slimming”,

ICCV

2017PruningHan

al,

“Deep

Compression:

Compressing

deep

neural

networks

with

pruning,

trained

quantization

and

huffman

coding”,QuantizationHan

al,

“Deep

Compression:

Compressing

deep

neural

networks

with

pruning,

trained

quantization

and

huffman

coding”,Huffman

Encodingchannel-level

pruningand

retraining

iterativelychannel-level

pruning

withL1

regularizationPruning:

less

number

channelsSummary

model

compressionSqueezeNetMobileNetShuffleNetSmaller

CNNs

architecture

designIandola

al,

“Squeeze

Net:

Alex

Net-

level

accuracy

with

fewer

parameters

and

model

size”,

arXiv

2016SqueezeNetInput641x1

ConvSqueeze161x1

ConvExpand3x3

ConvExpandOutputConcat/Eltwise1286464Howard

al,

“MobileNets:

Efficient

convolutional

neural

networks

for

mobile

vision

applications”,

arXiv2017MobileNetsZhang

al,

“ShuffleNet:

extremely

efficient

convolutional

neural

network

for

mobile

devices”,

arXiv

2017ShuffleNetOverall

Performance

Pruning

ResNet50

ImageNetOur

practiceModelstrategyTop-1Top-5Model

SizeOriginal-75%92.27%98MPruned-50Pruning72.5%90.9%49MPruned-Q-50Pruning

Quantization72.4%90.6%15MPerformance

Pruning

ResNet-34

Our

DatasetOur

practice(2319

categories,

1200W

samples)ModelTop-1Top-5Inference

TimeModel

SizeOriginal48.92%82.2%96ms86MPruned-6448.27%81.5%45ms31MParseNet

18類(基礎(chǔ)網(wǎng)絡(luò)：MobileNet)Our

practiceModelmIOUPixel-Level-AccuracyModel

SizeParseNet56%93.5%13M03移動端工程實踐TrainingInference移動端服務(wù)端分工DL

frameworksCaffe

Caffe2

MXNet

Tensorflow

Torch

….NCNN、MDLCoreMLTensorflow

LiteFrom

training

inferenceConvolutionBNReluConvolution優(yōu)化卷積計算Direct

convolution25*9

9*1im2col-based

convolution優(yōu)化卷積計算Cho

al,

“MEC:

Memory-efficient

convolution

for

deep

neural

network”,浮點運算定點化Input(float)Output(float)MinMaxQuantize8

BitMinMaxQuantizedRelu8Bit

DequaMnitnizeMax卷積計算還能怎么進(jìn)化？再牛逼的優(yōu)化算法，都不如硬件實現(xiàn)來得直接通用卷積

VS特定卷積Android端深度學(xué)習(xí)框架NCNN

MDLTensorflow

LiteMobileNet

HuaweiP9FrameWork單線程四線程內(nèi)存NCNN370ms200ms25MMDL360ms190ms30MQuantize

MobileNetFloat

Mobilenet85ms400msiOS上的DLCoreML可擴(kuò)展性不強(qiáng)，不適合部署新算法；需要iOS

11+充分利用GPU資源，不用搶占CPU利用Metal開發(fā)新的層很方便Tips：半精度計算；權(quán)重存儲格式為NHWCMPSCNNMPSImageThe

layout

9-channel

CNN

image

with

width

and

height

2.Slice2Slice1Slice0MPSCNNkernel

void

eltwise

Sum_array(texture2d_array<half,

access::sample>

inTexture1

[[texture(0)]],texture2d_array<half,

access::sample>

inTexture2

[[texture(1)]],texture2d_array<half,

access::write>

outTexture

[[texture(2)]],ushort3

gid

[[thread_position_in_grid]]){if

(gid.x

outTexture.get_width()

||gid.y

outTexture.get_height()

||gid.z

outTexture.get_array_size())

return;constexprsampler

s(coord::pixel,

filter::nearest,

address::clamp_to_zero);const

ushort2

pos

gid.xy;const

ushort

slice

gid.z;half4

in[2];in[0]

inTexture1.sample(s,

float2(pos.x,

pos.y),

slice);

in[1]

inTexture2.sample(s,

float2(pos.x,

pos.y),

slice);

float4

out

float4(0.0f);out

=float4(

in[0]+in[1]);outTexture.write(half4(out),

gid.xy,

gid.z);}Metal

Performance

ShaderMPSCNN

NCNN

iPhoneDevice:

iPhone

6sFrameWorkTimeNCNN110msMPSCNN45msHow

create

new

framework優(yōu)化inference網(wǎng)絡(luò)結(jié)構(gòu)GPU加速指令集加速多線程內(nèi)存布局優(yōu)化

NCHW—>NHWC浮點運算定點化ForprofessionalHighly

FlexibleEasy

useHigh

Cohesion&

Low

CouplingMoguDLToolkitMogu

Deep

Learning

ToolkitMogu

Deep

Learning

ToolkitCreatelayerInit

layerInferenceClassificationDetectionSegmentation…………Mogu

Toolkit-ExampleMobileNetclass

MobileNet{public:Input

input;Convolution

fc7;int

Init(const

char*

modelpath);int

infer(Mat

&input,Mat

&output);private:Convolution

人人文庫> 全部分類> 應(yīng)用文書 > 作業(yè)報告

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

黃文波移動終端深度學(xué)習(xí)的優(yōu)化實踐

文檔簡介

溫馨提示

最新文檔

評論

黃文波移動終端深度學(xué)習(xí)的優(yōu)化實踐

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔