iccv2019論文全集9176-a-convex-relaxation-barrier-to-tight-robustness-verification-of-neural-networks_第1頁
iccv2019論文全集9176-a-convex-relaxation-barrier-to-tight-robustness-verification-of-neural-networks_第2頁
iccv2019論文全集9176-a-convex-relaxation-barrier-to-tight-robustness-verification-of-neural-networks_第3頁
iccv2019論文全集9176-a-convex-relaxation-barrier-to-tight-robustness-verification-of-neural-networks_第4頁
iccv2019論文全集9176-a-convex-relaxation-barrier-to-tight-robustness-verification-of-neural-networks_第5頁
已閱讀5頁,還剩6頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、A Convex Relaxation Barrier to Tight Robustness Verifi cation of Neural Networks Hadi Salman Microsoft Research AI hadi.salman Greg Yang Microsoft Research AI gregyang Huan Zhang UCLA huanhuan- Cho-Jui Hsieh UCLA Pengchuan Zhang Microsoft Research AI penzhan Abstract Verifi catio

2、n of neural networks enables us to gauge their robustness against ad- versarial attacks. Verifi cation algorithms fall into two categories: exact verifi ers that run in exponential time and relaxed verifi ers that are effi cient but incom- plete. In this paper, we unify all existing LP-relaxed verif

3、i ers, to the best of our knowledge, under a general convex relaxation framework. This framework works for neural networks with diverse architectures and nonlinearities and covers both primal and dual views of neural network verifi cation. Next, we perform large-scale experiments, amounting to more

4、than 22 CPU-years, to obtain exact solution to the convex-relaxed problem that is optimal within our framework for ReLU networks. We fi nd the exact solution does not signifi cantly improve upon the gap between PGD and existing relaxed verifi ers for various networks trained normally or robustly on

5、MNIST and CIFAR datasets. Our results suggest there is an inherent barrier to tight verifi cation for the large class of methods captured by our framework. We discuss possible causes of this barrier and potential fu- ture directions for bypassing it. Our code and trained models are available at 1Int

6、roduction A classifi cation neural networkf : Rn RK(wherefi(x)should be thought of as theith logit) is considered adversarially robust with respect to an input x and its neighborhood Sin(x) if min x0Sin(x),i6=i fi(x) fi(x0) 0,wherei= argmax j fj(x).(1) Many recent works have proposed robustness veri

7、fi cation methods by lower-bounding eq. (1); the positivity of this lower bound proves the robustness w.r.t.Sin(x). A dominant approach thus far has tried to relax eq. (1) into a convex optimization problem, from either the primal view Zhang et al., 2018, Gehr et al., 2018, Singh et al., 2018, Weng

8、et al., 2018 or the dual view Wong and Kolter, 2018, Dvijotham et al., 2018b, Wang et al., 2018b. In our fi rst main contribution, we propose a layer-wise convex relaxation framework that unifi es these works and reveals the relationships between them (Fig. 1). We further show that the performance o

9、f methods within this framework is subject to a theoretical limit: the performance of the optimal layer-wise convex relaxation. This then begs the question: is the road to fast and accurate robustness verifi cation paved by just faster and more accurate layer-wise convex relaxation that approaches t

10、he theoretical limit? In oursecond main contribution, we answer this question in the negative. We perform extensive experiments Work done as part of the Microsoft AI Residency Program. 2Please see /abs/1902.08722 for the full and most recent version of this paper. 33rd Conference on N

11、eural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. DeepZ (Singh et al., 2018) DeepPoly (Singh et al., 2019) Neurify (Wang et al., 2018) Fast-Lin (Weng (ii) in terms of upper bounding the robust error, the optimal layer-wise convex relaxation does not signifi cantly close the gap

12、 between the PGD lower bound (or MILP exact answer) and the upper bound from Wong and Kolter 2018. Therefore, there seems to be an inherent barrier blocking our progress on this road of layer-wise convex relaxation, and we hope this work provokes much thought in the community on how to bypass it. 2P

13、reliminaries and Related Work Exactverifi ersandNP-completeness.ForReLUnetworks(piece-wiselinearnetworksingeneral), exact verifi ers solve the robustness verifi cation problem(1)by typically employing MILP solvers Cheng et al., 2017, Lomuscio and Maganti, 2017, Dutta et al., 2018, Fischetti and Jo,

14、2017, Tjeng et al., 2019, Xiao et al., 2019 or Satisfi ability Modulo Theories (SMT) solvers Scheibler et al., 2015, Katz et al., 2017, Carlini et al., 2017, Ehlers, 2017. However, due to the NP-completeness for solving such a problem Katz et al., 2017, Weng et al., 2018, it can be really challengin

15、g to scale these to large networks. It can take Reluplex Katz et al., 2017 several hours to fi nd the minimum distortion of an example for a ReLU network with 5 inputs, 5 outputs, and 300 neurons. A recent work by Tjeng et al. 2019 uses MILP to exactly verify medium-size networks, but the verifi cat

16、ion time is very sensitive to how a network is trained; for example, it is fast for networks trained using the LP-relaxed dual formulation of Wong and Kolter 2018, but much slower for normally trained networks. A concurrent work by Xiao et al. 2019 trains networks with the objective of speeding up t

17、he MILP verifi cation problem, but this compromises on the performance of the network. Relaxed and effi cient verifi ers. These verifi ers solve a relaxed, but more computationally effi cient, version of(1), and have been proposed from different perspectives. From the primal view, one can relax the

18、nonlinearity in(1)into linear inequality constraints. This perspective has been previously explored as in the framework of “abstract transformers” Singh et al., 2018, 2019a,b, Gehr et al., 3The radius of the largest l ball in which no adversarial examples can be found. 2 2018, Mirman et al., 2018, v

19、ia linear outer bounds of activation functions Zhang et al., 2018, Weng et al., 2018, Wang et al., 2018a,b, or via interval bound propagation Gowal et al., 2018, Mirman et al., 2018. From the dual view, one can study the dual of the relaxed problem Wong and Kolter, 2018, Wong et al., 2018 or study t

20、he dual of the original nonconvex verifi cation problem Dvijotham et al., 2018b,a, Qin et al., 2019. In this paper, we unify both views in a common convex relaxation framework for NN verifi cation, clarifying their relationships (as summarized in Fig. 1). Raghunathan et al. 2018b formulates the veri

21、fi cation of ReLU networks as a quadratic programming problem and then relaxes and solves this problem with a semidefi nite programming (SDP) solver. While our framework does not cover this SDP relaxation, it is not clear to us how to extend the SDP relaxed verifi er to general nonlinearities, for e

22、xample max-pooling, which can be done in our framework on the other hand. Other verifi ers have been proposed to certify via an intermediary step of bounding the local Lipschitz constant Hein and Andriushchenko, 2017, Weng et al., 2018, Raghunathan et al., 2018a, Zhang et al., 2019. These are outsid

23、e the scope of our framework. Combining exact and relaxed verifi ers, hybrid methods have shown some effectiveness Bunel et al., 2018, Singh et al., 2019b. In fact, many exact verifi ers also use relaxation as a subroutine to speed things up, and hence can be viewed as hybrid methods as well. In thi

24、s paper, we are not concerned with such techniques but only focus on relaxed verifi ers. 3Convex Relaxation from the Primal View Problem setting.In this paper, we assume that the neighborhoodSin(xnom)is a convex set. An example of this isSin(xnom) = x : kx xnomk ?, which is the constraint onxin the

25、adversarial attack model. We also assume thatf(x)is anL-layer feedforward NN. For notational simplicity, we denote0,1,.,L 1byLandx(0),x(1),.,x(L1)byxL . We defi nef(x) as, x(l+1)= (l)(W(l)x(l)+ b(l)l L,andf(x) := z(L)= W(L)x(L)+ b(L),(2) wherex(l) Rn (l), z(l) Rn (l) z ,x(0):= x Rn (0) is the input,

26、W(l) Rn (l) z n(l) andb(l) Rn (l) z are the weight matrix and bias vector of thelthlinear layer, and(l): Rn (l) z Rn (l+1) is a (nonlinear) activation function like (leaky-)ReLU, the sigmoid family (including sigmoid, arctan, hyperbolic tangent, etc), and the pooling family (MaxPool, AvgPool, etc).

27、Our results can be easily extended to networks with convolutional layers and skip connections as well, similar to what is done in Wong et al. 2018, as these can be seen as special forms of (2). Consider the following optimization problem O(c,c0,L,zL,zL): min (xL+1,zL)D cx(L)+ c0 s.t.z(l)= W(l)x(l)+

28、b(l),l L, x(l+1)= (l)(z(l),l L, (O) wheretheoptimizationdomainDisthesetofactivationsandpreactivations x(0),x(1),.,x(L),z(0),z(1),.,z(L1) satisfying the bounds z(l) z(l) z(l)l L, i.e., D = ?(xL+1,zL) : x(0) Sin(xnom),z(l) z(l) z(l),l L?.(3) Ifc= W(L) inom,: W (L) i,: ,c0= b(L) inom b(L) i ,zL= , andz

29、L= , then(O)is equivalent to problem(1). However, when we have better information about valid boundszlandzlofzl, we can signifi cantly narrow down the optimization domain and, as will be detailed shortly, achieve tighter solutions when we relax the nonlinearities. We denote the minimal value ofO(c,c

30、0,L,zL,zL)by p(c,c0,L,zL,zL), or just p Owhen no confusion arises. Obtaining lower and upper bounds (zL,zL) by solving sub-problems.This can be done by recursively solving(O) with specifi c choices ofcandc0, which is a common technique used in many works Wong and Kolter, 2018, Dvijotham et al., 2018

31、b. For example, one can obtainz() j , a lower bound ofz() j , by solvingO(W() j,: ,b() j ,z,z); this shows that one can estimatez(l)andz(l) 3 tanh tanh tanh -2-112 -1.0 -0.5 0.5 1.0 relu relu relu -1.0- 0.2 0.4 0.6 0.8 1.0 1.2 step step step -1.0- -0.2 0.2 0.4 0.6 0.8 1.0 1.2 Figur

32、e 2: Optimal convex relaxations for common nonlinearities. Fortanh, the relaxation contains two linear segments and parts of thetanhfunction. For ReLU and the step function, the optimal relaxations are written as 3 and 4 linear constraints, respectively. Forz = max(x,y), the light orange shadow indi

33、cates the pre-activation bounds forxandy, and the optimal convex relaxation is lower bounded by the max function itself. inductively inl. However, we may have millions of sub-problems to solve because practical networks can have millions of neurons. Therefore, it is crucial to have effi cient algori

34、thms to solve (O). Convex relaxation in the primal space.Due to the nonlinear activation functions(l), the feasible set of(O) is nonconvex, which leads to the NP-completeness of the neural network verifi cation problem Katz et al., 2017, Weng et al., 2018. One natural idea is to do convex relaxation

35、 of its feasible set. Specifi cally, one can relax the nonconvex equality constraintx(l+1)= (l)(z(l)to convex inequality constraints, i.e., min (xL+1,zL)D cx(L)+ c0s.t.z(l)= W(l)x(l)+ b(l),(l)(z(l) x(l+1) (l)(z(l),l L, (C) where(l)(z)(l)(z) ) is convex (concave) and satisfi es(l)(z) (l)(z) (l)(z)for

36、z(l) z z(l). We denote the feasible set of(C)bySCand its minimum byp C. Naturally, we have thatSC is convex andp C p O. For example, Ehlers 2017 proposed the following relaxations for the ReLU function ReLU(z) = max(0,z) and MaxPool MP(z) = maxkzk: ReLU(z) = max(0,z),ReLU(z) = z zz (z z),(4) MP(z) =

37、 max k zk X k (zk zk) + max k zk,MP(z) = X k (zk+ zk) max k zk.(5) The optimal layer-wise convex relaxation.As a special case, we consider the optimal layer-wise convex relaxation, where opt(z) is the greatest convex function majored by , opt(z) is the smallest concave function majoring . (6) A prec

38、ise defi nition can be found in(12)in Appendix B. In Fig. 2, we show the optimal convex relaxation for several common activation functions. It is easy to see that(4)is the optimal convex relaxation for ReLU, but(5)is not optimal for the MaxPool function. Under mild assumptions (non-interactivity as

39、defi ned in defi nition B.2), the optimal convex relaxation of a nonlinear layer x = (z), i.e., its convex hull, is simplyopt(z) x opt(z)(see proposition B.3). We denote the corresponding optimal relaxed problem as Copt, with its objective p Copt. We emphasize that by optimal, we mean the optimal co

40、nvex relaxation of the single nonlinear constraintx(l+1)= (l)(z(l)(see Proposition(B.3) instead of the optimal convex relaxation of the nonconvex feasible set of the original problem(O). As such, techniques as in Anderson et al., 2018, Raghunathan et al., 2018b are outside our framework; see appendi

41、x C for more discussions. Greedily solving the primal with linear bounds.As another special case, when there are exactly one linear upper bound and one linear lower bound for each nonlinear layer in (C) as follows: (l)(z(l) := a(l)z(l)+ b (l), (l)(z(l) := a(l)z(l)+ b(l).(7) the objectivep C can be g

42、reedily bounded in a layer-by-layer manner. We can derive one linear upper and one linear lower bound ofzL:= cTxL+ c0with respect toz(L1), using the fact that z(L)= cT(L1)(z(L1) + c0and that(L1)(z(L1)is linearly upper and lower bounded by 4 (L1)(z(L1)and(L1)(z(L1) . Because a linear combination of l

43、inear bounds (coeffi cients are related to the entries inc) can be relaxed to a single linear bound, we can apply this technique again and replacez(L1)with its upper and lower bounds with respect toz(L2), obtaining the bound forz(L)with respect toz(L2). Applying this repeatedly eventually leads to l

44、inear lower and upper bounds of z(L)with respect to the input x(0) Sin(xnom). This perspective covers Fast-Lin Weng et al., 2018, DeepZ Singh et al., 2018 and Neurify Wang et al., 2018b, where the proposed linear lower bound has the same slope as the upper bound, i.e., a(l)= a(l). The resulting shap

45、e is referred to as a zonotope in Gehr et al. 2018 and Singh et al. 2018. In CROWN Zhang et al., 2018 and DeepPoly Singh et al., 2019a, this restriction is lifted and they can achieve better verifi cation results than Fast-Lin and DeepZ. Fig. 1 summarizes the relationships between these algorithms.

46、Importantly, each of these works has its own merits on solving the verifi cation problem; our focus here is to give a unifi ed view on how they perform convex relaxation of the original verifi cation problem(O)in our framework. See Appendix D for more discussions and other related algorithms. 4Conve

47、x Relaxation from the Dual View We now tackle the verifi cation problem from the dual view and connect it to the primal view. Strong duality for the convex relaxed problem.As in Wong and Kolter 2018, we introduce the dual variables for (C) and write its Lagrangian dual as gC(L,L, L) := min (xL+1,zL)

48、D cx(L)+ c0+ L1 X l=0 (l)(z(l) W(l)x(l) b(l) L1 X l=0 (l)(x(l+1) (l)(z(l) + L1 X l=0 (l)(x(l+1) (l)(z(l). (8) By weak duality Boyd and Vandenberghe, 2004, d C := max L,L0,L0 gC(L,L, L) p C,(9) but in fact we can show strong duality under mild conditions as well (note that the following result cannot

49、 be obtained by trivially applying Slaters condition; see appendix E and fi g. 4). Theorem 4.1(p C = d C). Assume that both(l)and(l) have a fi nite Lipschitz constant in the domain z(l),z(l) for each l L. Then strong duality holds between (C) and (9). The optimal layer-wise dual relaxation.Theorem4.

50、1 shows that taking the dual of the layer-wise convex relaxed problem(C)cannot do better than the original relaxation. To obtain a tighter dual problem, one could directly study the Lagrangian dual of the original (O), gO(L,L) := min D cx(L)+ c0+ L1 X l=0 (l)(z(l) W(l)x(l) b(l) + L1 X l=0 (l)(x(l+1)

51、 (l)(z(l), (10) where the min is taken over(xL+1,zL) D . This was fi rst proposed in Dvijotham et al. 2018b. Note, again, by weak duality, d O := max L,L gO(L,L) p O,(11) and d Owould seem to be strictly better than d C. Unfortunately, they turn out to be equivalent: Theorem 4.2(d O = d Copt). Assum

52、e that the nonlinear layer(l) is non-interactive (defi nition B.2) and the optimal layer-wise relaxation(l) optand (l) opt are defi ned in(6). Then the lower boundd Copt provided by the dual of the optimal layer-wise convex-relaxed problem(9)andd Oprovided by the dual of the original problem (11) ar

53、e the same. 5 The complete proof is in Appendix F 4. Theorem 4.2 combined with the strong duality result of Theorem 4.1 implies that the primal relaxation(C)and the two kinds of dual relaxations,(9)and(11), are all blocked by the same barrier. As concrete examples: Corollary 4.3(p Copt = d O). Suppo

54、se that the nonlinear activation functions(l)for alll Lare (for example) among the following: ReLU, step, ELU, sigmoid, tanh, polynomials and max pooling with disjoint windows. Assume that(l) optand (l) opt are defi ned in(6), respectively. Then we have that the lower boundp Copt provided by the pri

55、mal optimal layer-wise relaxation(C)andd Oprovided by the dual relaxation (11) are the same. Greedily solving the dual with linear bounds.When the relaxed boundsandare linear as defi ned in (7), the dual objective (9) can be lower bounded as below: p C= d C L1 X l=0 ? b (l) ? (l) ? + b(l) ? (l) ? b(

56、l)(l) ? + c0sup xSin(xnom) ? W(0)(0) ? x, where the dual variables (L,L) are determined by a backward propagation (L1)= c,(l)= a(l) ? (l) ? + + a(l) ? (l) ? ,(l1)= W(l)(l)l L 1, We provide the derivation of this algorithm in Appendix G. It turns out that this algorithm can exactly recover the algori

57、thm proposed in Wong and Kolter 2018, where (l)(z(l) := (l)z(l),(l)(z(l) := z(l) z(l)z(l)(z (l) z(l), and0 (l) 1represents the slope of the lower bound. When(l)= z(l) z(l)z(l), the greedy algorithm also recovers Fast-Lin Weng et al., 2018, which explains the arrow from Wong and Kolter 2018 to Weng e

58、t al. 2018 in Fig. 1. When(l)is chosen adaptively as in CROWN Zhang et al., 2018, the greedy algorithm then recovers CROWN, which explains the arrow from Wong and Kolter 2018 to Zhang et al. 2018 in Fig. 1. See Appendix D for more discussions on the relationship between the primal and dual greedy solvers. 5 Optimal LP-relaxed Verifi cation In the previous sections, we presented a framework that s

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
  • 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論