iccv2019論文全集8443-resnets-ensemble-via-the-feynman-kac-alism-to-improve-natural-and-robust-accuracies_第1頁
iccv2019論文全集8443-resnets-ensemble-via-the-feynman-kac-alism-to-improve-natural-and-robust-accuracies_第2頁
iccv2019論文全集8443-resnets-ensemble-via-the-feynman-kac-alism-to-improve-natural-and-robust-accuracies_第3頁
iccv2019論文全集8443-resnets-ensemble-via-the-feynman-kac-alism-to-improve-natural-and-robust-accuracies_第4頁
iccv2019論文全集8443-resnets-ensemble-via-the-feynman-kac-alism-to-improve-natural-and-robust-accuracies_第5頁
已閱讀5頁,還剩6頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

1、ResNets Ensemble via the Feynman-Kac Formalism to Improve Natural and Robust Accuracies Bao Wang Department of Mathematics University of California, Los Angeles wangbaonj Binjie Yuan School of Aerospace Tsinghua University ybj14 Zuoqiang Shi Department of Mathematics Tsinghua University zqshi Stanle

2、y J. Osher Department of Mathematics University of California, Los Angeles Abstract We unify the theory of optimal control of transport equations with the practice of training and testing of ResNets. Based on this unifi ed viewpoint, we propose a simple yet effective ResNets ensembl

3、e algorithm to boost the accuracy of the robustly trained model on both clean and adversarial images. The proposed algo- rithm consists of two components: First, we modify the base ResNets by injecting a variance specifi ed Gaussian noise to the output of each residual mapping, and it results in a s

4、pecial type of neural stochastic ordinary differential equation. Second, we average over the production of multiple jointly trained modifi ed ResNets to get the fi nal prediction. These two steps give an approximation to the Feynman-Kac formula for representing the solution of a convection-diffusion

5、 equation. For the CIFAR10 benchmark, this simple algorithm leads to a robust model with a natural accuracy of85.62% on clean images and a robust accuracy of57.94%under the 20 iterations of the IFGSM attack, which outperforms the current state-of-the-art in defending against IFGSM attack on the CIFA

6、R10. The code is available at 1Introduction Despite the extraordinary success of deep neural nets (DNNs) in image and speech recognition 23, their vulnerability to adversarial attacks raises concerns when applying them to security-critical tasks, e.g., autonomous cars 3,1, robotics 14, and DNN-based

7、 malware detection systems 31,13. Since the seminal work of Szegedy et al. 38, recent research shows that DNNs are vulnerable to many kinds of adversarial attacks including physical, poisoning, and inference attacks 9,7,30,12,17,5,4. The empirical adversarial risk minimization (EARM) is one of the m

8、ost successful frameworks for adversarial defense. Under the EARM framework, adversarial defense fornorm based inference attacks can be formulated as solving the following EARM 29, 45 min fH 1 n n X i=1 max kx0 ixik? L(f(x0i,w),yi),(1) wheref(,w)is a function in the hypothesis classH, e.g., ResNet 1

9、6, parameterized byw. Here, (xi,yi)n i=1areni.i.d. data-label pairs drawn from some high dimensional unknown distributionD, L(f(xi,w),yi)is the loss associated withfon(xi,yi) . For classifi cation,Lis typically selected to be the cross-entropy. Adversarial defense for other measure based attacks can

10、 be formulated similarly. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. 1.1Our Contribution In this work, we unify the training and testing of ResNets with the theory of transport equations (TEs). This unifi ed viewpoint enables us to interpret the adver

11、sarial vulnerability of ResNets as the irregularity, which will be defi ned later, of the TEs solution. Based on this observation, we propose a new ResNets ensemble algorithm based on the Feynman-Kac formula. In a nutshell, the proposed algorithm consists of two essential components. First, for each

12、l = 1,2, ,MwithM being the number of residual mappings in the ResNet, we modify thel-th residual mapping from xl+1= xl+F(xl)(Fig. 1 (a) toxl+1= xl+F(xl)+N(0,2I)(Fig. 1 (b), wherexlis the input, Fis the residual mapping andN(0,2I)is Gaussian noise with a specially designed variance2. This step can be

13、 regarded as building a simple neural stochastic differential equation. Second, we average over multiple jointly trained, by solving the EARM Eq. (1), modifi ed ResNets outputs to get the fi nal prediction (Fig. 1 (c). This ensemble algorithm improves the base models accuracy on both clean and adver

14、sarial data. The advantages of the proposed algorithm are summarized as follows: It outperforms the current state-of-the-art in defending against inference attacks. It improves the natural accuracy of the adversarially trained models. Its defense capability can be improved dynamically as the base Re

15、sNet advances. It is motivated from partial differential equation (PDE) theory, which introduces a new way to defend against adversarial attacks, and it is a complement to many other existing adversarial defenses. (a)(b)(c) Figure 1: Original (a)/noise injected (b) residual mapping. (c) Architecture

16、 of the EnResNet. 1.2Related Work There is a massive volume of research over the last several years on defending against adversarial attacks for DNNs. Randomized smoothing transforms an arbitrary classifi erfinto a smoothed surrogate classifi erg and is certifi ably robust in2norm based adversarial

17、attacks 25,24,10,43,6, 27,34 . One of the ideas is to inject Gaussian noise to the input image and the classifi cation result is based on the probability of the noisy image in the decision region. Our adversarial defense algorithm injects noise into each residual mapping instead of the input image.

18、Robust optimization for solving EARM achieves great success in defending against inference attacks 29,32,33,44,36,42. Regularization in EARM can further boost the robustness of the adversarially trained models 45,21,35,47 . The adversarial defense algorithms should learn a classifi er with high test

19、 accuracy on both clean and adversarial data. To achieve this goal, Zhang et al. 46 developed a new loss function, TRADES, that explicitly trades off between natural and robust generalization. To the best our of knowledge, TRADES is the current state-of-the-art in defending against inference attacks

20、 on the CIFAR10. Throughout this paper, we regard TRADES as the benchmark. Modeling DNNs as ordinary differential equations (ODEs) has drawn lots of attention recently. Chen et al. proposed neural ODEs for deep learning 8. E 11 modeled training ResNets as solving an ODE optimal control problem. Habe

21、r and Ruthotto 15 constructed stable DNN architectures based on the properties of ODEs. Lu, Zhu and et al. 28,48 constructed novel architectures for DNNs, which were motivated from the numerical discretization schemes for ODEs. 2 1.3Organization This paper is organized in the following way: In secti

22、on 2, we establish the connection between training/testing of ResNets and the theory of TEs. This connection gives us a way to decipher the adversarial vulnerability of the ResNet, and we propose a simple ensemble algorithm based on the Feynman-Kac formula to enhance the guaranteed adversarial robus

23、tness of ResNet. In section 3, we test the effi ciency of the proposed ResNets ensemble for adversarial defense on both CIFAR10 and CIFAR100. Section 4 contains some concluding remarks. 2Algorithm and Theoretical Motivation 2.1Transport Equation Modeling of ResNet The connection between training Res

24、Net and solving optimal control problems of the TE is in- vestigated in 39,40,26,41. In this section, we derive the TE model for ResNet and explain its adversarial vulnerability from a PDE perspective. The TE model enables us to understand the data fl ow of the entire training and testing data in th

25、e forward and backward propagation; whereas, the ODE models focus on the dynamics of individual data points 8. As shown in Fig. 1 (a), residual mapping adds a skip connection to connect the input and output of the original mapping (F), and the l-th residual mapping can be written as xl+1= F(xl,wl) +

26、 xl, withx0= x T Rdbeing a data point in the setT,xlandxl+1are the input and output tensors of the residual mapping. The parameters wlcan be learned by back-propagating the training error. For x T with label y, the forward propagation of ResNet can be written as ( xl+1= xl+ F(xl,wl), l = 0,1,.,L 1,

27、with x0= x, y . = f(xL), (2) where yis the predicted label,Lis the number of layers, andf(x) = softmax(w0x)be the output activation with w0being the trainable parameters. Next, lettl= l/L, forl = 0,1, ,L, with intervalt = 1/L. Without considering dimensional consistency, we regard xlin Eq. (2) as th

28、e value of x(t) at tl, so Eq. (2) can be rewritten as ( x(tl+1) = x(tl) + t F(x(tl),w(tl),l = 0,1,.,L 1, with x(0) = x y . = f(x(1), (3) where F . = 1 tF. Eq. (3) is the forward Euler discretization of the following ODE dx(t) dt = F(x(t),w(t), x(0) = x.(4) Letu(x,t) be a function that is constant al

29、ong the trajectory defi ned by Eq. (4), thenu(x,t) satisfi es d dt (u(x(t),t) = u t (x,t) + F(x,w(t) u(x,t) = 0, x Rd.(5) If we enforce the terminal condition at t = 1 for Eq. (5) to be u(x,1) = softmax(w0 x) := f(x), then according to the fact thatu(x,t) is constant along the curve defi ned by Eq.

30、(4) (which is called the characteristic curve for the TE defi ned in Eq. (5), we haveu( x,0) = u(x(1),1) = f(x(1); therefore, the forward propagation of ResNet for xcan be modeled as computingu( x,0)along the characteristic curve of the following TE ( u t(x,t) + F(x,w(t) u(x,t) = 0, x R d, u(x,1) =

31、f(x). (6) Meanwhile, the backpropagation in training ResNets can be modeled as fi nding the velocity fi eld, F(x(t),w(t), for the following control problem u t(x,t) + F(x,w(t) u(x,t) = 0, x R d, u(x,1) = f(x), x Rd, u(xi,0) = yi, xi T, with T being the training set. (7) 3 Note that in the above TE f

32、ormulation of ResNet,u(x,0) serves as the classifi er and the velocity fi eldF(x,w(t)encodes ResNets architecture and weights. WhenFis very complex,u(x,0)might be highly irregular i.e. a small change in the inputxcan lead to a massive change in the value of u(x,0). This irregular function may have a

33、 good generalizability, but it is not robust to adversarial attacks. Fig. 2 (a) shows a 2D illustration ofu(x,0)with the terminal conditionu(x,1)shown in Fig. 2 (d); we will discuss this in detail later in this section. (a) = 0(b) = 0.01(c) = 0.1(d) u(x,1) Figure 2: (d): terminal condition for Eq. (

34、8); (a), (b), and (c): solutions of the convection-diffusion equation, Eq. (8), at t = 0 with different diffusion coeffi cients . 2.2Adversarial Defense by ResNets Ensemble via the Feynman-Kac Formalism Using a specifi clevelset ofu(x,0) in Fig. 2 (a) for classifi cation suffers from adversarial vul

35、nerability: A tiny perturbation inx will lead the output to go across the level set, thus leading to misclassifi cation. To mitigate this issue, we introduce a diffusion term 1 2 2uto Eq. (6), with being the diffusion coeffi cient andis the Laplace operator inRd, to make the level sets of the TE mor

36、e regular. This improves robustness of the classifi er. Hence, we arrive at the following convection-diffusion equation ( u t(x,t) + F(x,w(t) u(x,t) + 1 2 2u(x,t) = 0, x Rd, t 0,1), u(x,1) = f(x). (8) The solution to Eq. (8) is much more regular when 6= 0than when = 0. We consider the solution of Eq

37、. (8) in a 2D unit square with periodic boundary conditions, and on each grid point of the mesh the velocity fi eldF(x,w(t)is a random number sampled uniformly from1to1. The terminal condition is also randomly generated, as shown in Fig. 2 (d). This 2D convection-diffusion equation is solved by the

38、pseudo-spectral method with spatial and temporal step sizes being1/128 and1 103, respectively. Figure 2 (a), (b), and (c) illustrate the solutions when = 0,0.01, and 0.1, respectively. These show that asincreases, the solution becomes more regular, which makes the classifi er more robust, but might

39、be less accurate on clean data. Theshould be selected to have a good trade-off between accuracy and robustness. Moreover, we have the following theoretical guarantee for robustness of the solution to the convection-diffusion equation. Theorem 1.22 LetF(x,t)be a Lipschitz function in bothxandt, andf(

40、x)be a bounded function. Consider the following initial value problem of the convection-diffusion equation ( 6= 0) ( u t(x,t) + F(x,w(t) u(x,t) + 1 2 2u(x,t) = 0, x Rd, t 0,1), u(x,1) = f(x). (9) Then, for any small perturbation, we have|u(x+,0)u(x,0)| C ? kk2 ? for some constant 0if 1. Here,kk2is t

41、he2norm of, andCis a constant that depends ond,kfk, and kFkL x,t. According to the above observation, instead of usingu(x,0) of the TEs solution for classifi cation, we use that of the convection-diffusion equation. The above convection-diffusion equation can be solved using the Feynman-Kac formula

42、18 in high dimensional space, which gives u( x,0) as 1 u( x,0) = Ef(x(1)|x(0) = x,(10) where x(t) is an It process, dx(t) = F(x(t),w(t)dt + dBt, 1A detailed derivation is available in the supplementary material. 4 and u( x,0) is the conditional expectation of f(x(1). We approximate the Feynman-Kac f

43、ormula by an ensemble of modifi ed ResNets in the following way: Accoding to the EulerMaruyama method 2, the termdBtcan be approximated by adding Gaussian noiseN(0,I), where = apVar(xl+ F(xl)withabeing a tunable parameter, to each residual mappingxl+1= xl+ F(xl) . This gives the modifi ed residual m

44、appingxl+1= xl+ F(xl ) + N(0,I), as illustrated in Fig. 1 (b). Let ResNet denote the modifi ed ResNet where we inject noise to each residual mapping of the original ResNet. In a nutshell, ResNets approximation to the Feynman-Kac formula is an ensemble of jointly trained ResNet as illustrated in Fig.

45、 1 (c). 2 We call this ensemble of ResNets as EnResNet. For instance, an ensemble ofnResNet20 is denoted as EnnResNet20. 2.3Robust Training of the EnResNet We use the PGD adversarial training 29 to robustly train EnResNets with = 0.1on both CIFAR10 and CIFAR100 20 with standard data augmentation 16.

46、 The attack in the PGD adversarial training is merely iterative fast gradient sign method (IFGSM) with an initial random perturbation on the clean data. Other methods to solve EARM can also be used to train EnResNets. All computations are carried out on a machine with a single Nvidia Titan Xp graphi

47、cs card. 2.4Attack Methods We attack the trained model,f(x,w), bynorm based untargeted FGSM, IFGSM 12, and C column 2-3 (4-5): adversarial images crafted by using IFGSM20and C&W to attack ResNet20 (En5ResNet20) and corresponding predicted labels. 3.2Integration of Separately Trained EnResNets In the

48、 previous subsection, we verifi ed the adversarial defense capability of EnResNet, which is an approximation to the Feynman-Kac formula to solve the convection-diffusion equation. As we showed, when more ResNets and larger models are involved in the ensemble, both natural and robust accuracies are i

49、mproved. However, EnResNet proposed above requires to train the ensemble jointly, which poses memory challenges for training ultra-large ensembles. To overcome this issue, we consider training each component of the ensemble individually and integrating them together for prediction. The major benefi

50、t of this strategy is that with the same amount of GPU memory, we can train a much larger model for inference since the batch size used in inference can be one. Table 5 lists natural and robust accuracies of the integration of separately trained EnResNets on the CIFAR10. The integration have better

51、robust accuracy than each component. For instance, the integra- tionofEn2ResNet110andEn1WideResNet34-10givesarobustaccuracy57.94%undertheIFGSM20 attack, which is remarkably better than both En2ResNet110 (53.05%) and En1WideResNet34-10 (56.60%). To the best of our knowledge,57.94% outperforms the cur

52、rent state-of-the-art 46 by 1.33%. The effectiveness of the integration of separately trained EnResNets sheds light on the development of ultra-large models to improve effi ciency for adversarial defense. Table 5: Natural and robust accuracies of different integration of different robustly trained E

53、nResNets on the CIFAR10. Unit: %. ModeldatasetAnatArob(FGSM)Arob(IFGSM20)Arob(C&W) En2ResNet20&En5ResNet20CIFAR1082.8259.1453.1568.00 En2ResNet44&En5ResNet20CIFAR1082.9959.6453.8669.36 En2ResNet110&En5ResNet20CIFAR1083.5760.6354.8770.02 En2ResNet110&En1WideResNet34-10CIFAR1085.6262.4857.9470.20 3.3G

54、radient Mask and Comparison with Simple Ensembles Besides applying EOT gradient, we further verify that our defense is not due to obfuscated gradi- ent. We use IFGSM20to attack naturally trained (using the same approach as that used in 16) En1ResNet20, En2ResNet20, and En5ResNet20, and the correspon

55、ding accuracies are:0%,0.02%, and0.03%, respectively. All naturally trained EnResNets are easily fooled by IFGSM20, thus gradient mask does not play an important role in EnResNets for adversarial defense 4. However, under the FGSM attack with? = 8/255, the naturally trained En1ResNet20 and En2ResNet

56、20 (with injected Gaussian noise of standard deviation 0.1) has robust accuracies 27.93% and 28.75%, resp., and it is signifi cantly higher than that of the naturally trained ResNet20. These results show that the naturally trained EnResNets are also more resistant to adversarial attacks. Ensemble of

57、 models for adversarial defense has been studied in 37. Here, we show that ensembles of robustly trained ResNets without noise injection cannot boost natural and robust accuracy much. The natural accuracy of jointly (separately) adversarially trained ensemble of two ResNet20 without noise injection

58、is75.75% (74.96%), which does not substantially outperform ResNet20 with a natural 8 accuracy75.11%. The corresponding robust accuracies are51.11% (51.68%),47.28% (47.86%), and59.73% (59.80%), respectively, under the FGSM, IFGSM20, and C&W attacks. These robust accuracies are much inferior to that o

59、f En2ResNet20. Furthermore, the ensemble of separately trained robust ResNet20 and robust ResNet44 gives a natural accuracy of77.92%, and robust accuracies are54.73%,51.47%,61.77% under the above three attacks. These results reveal that ensemble adversarially trained ResNets via the Feynman-Kac formalism is much more accur

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論