iccv2019論文全集8494-partitioning-structure-learning-for-segmented-linear-regression-trees_第1頁
iccv2019論文全集8494-partitioning-structure-learning-for-segmented-linear-regression-trees_第2頁
iccv2019論文全集8494-partitioning-structure-learning-for-segmented-linear-regression-trees_第3頁
iccv2019論文全集8494-partitioning-structure-learning-for-segmented-linear-regression-trees_第4頁
iccv2019論文全集8494-partitioning-structure-learning-for-segmented-linear-regression-trees_第5頁
已閱讀5頁,還剩4頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)

文檔簡介

1、Partitioning Structure Learning for Segmented Linear Regression Trees Xiangyu Zheng Peking University zhengxiangyu Song Xi Chen Peking University csx Abstract This paper proposes a partitioning structure learning method for segmented linear regression trees (SLRT), which assigns linear predictors ov

2、er the terminal nodes. The recursive partitioning process is driven by an adaptive split selection algorithm that maximizes, at each node, a criterion function based on a conditional Kendalls statistic that measures the rank dependence between the regressors and the fi t- ted linear residuals. Theor

3、etical analysis shows that the split selection algorithm permits consistent identifi cation and estimation of the unknown segments. A suffi - ciently large tree is induced by applying the split selection algorithm recursively. Then the minimal cost-complexity tree pruning procedure is applied to att

4、ain the right-sized tree, that ensures (i) the nested structure of pruned subtrees and (ii) consistent estimation to the number of segments. Implanting the SLRT as the built-in base predictor, we obtain the ensemble predictors by random forests (RF) and the proposed weighted random forests (WRF). Th

5、e practical performance of the SLRT and its ensemble versions are evaluated via numerical simulations and empirical studies. The latter shows their advantageous predictive performance over a set of state-of-the-art tree-based models on well-studied public datasets. 1Introduction Data partitioning is

6、 a fundamental pre-processing method that explores the partitioning structure of the feature space such that the subspaces are more compliant to a simple model 1. We consider the segmented linear regression (SLR) models, which prescribes linear predictors over the partitions. Partitioning structure

7、learning is the core of SLR, that selects the split variables and levels as well as determines the number of segments. SLR has been studied in statistics and econometrics 2,3,4,5, but the existing methods tend to assume the split variable is known and univariate, with segments estimated by a costly

8、simultaneous optimization. We propose a tree-based approach for SLR called segmented linear regression trees (SLRT), that does not require the pre-specifi ed information about the split variables. SLRT is completely data-driven and facilitates more effi cient computation via recursive partitioning,

9、which is fundamentally based on a split selection algorithm and a tree pruning algorithm. Split Selection AlgorithmAt each internal node of the tree, the optimal split variable and level pair is selected to partition the feature space into two halves. Let e be the fi tted residuals by the ordinary l

10、east square regression. Any non-linearity in the underlying regression function is refl ected in the dependence between eand the regressors. Based on the conditional Kendallsrank correlation 6, we propose the following criterion function at a candidate split variable indexjand a split levela, C(j,a)

11、 = Pp k=1| (Xk, e|Xj a)| + | (Xk, e|Xj a)|, where is the sample version of the Kendalls,Xis ap-dimensional regressors vector withXkbeing itsk-th component. The optimal split is selected by maximizing C(j,a) over the candidate split variables Xj and levels a in the 33rd Conference on Neural Informati

12、on Processing Systems (NeurIPS 2019), Vancouver, Canada. observed sample ofXj . Theoretical analysis shows that it leads to the consistent identifi cation and estimation of the most prevailing split variable and level that attains the maximum of C(j,a). Tree Pruning Algorithm We defi ne an adaptive

13、cost-complexity measure that combines the accuracy of the linear regression fi t at each node with a penalty for a large tree size. The optimally pruned tree is selected from a nested sequence of pruned subtrees by minimizing the cost-complexity measure. Theoretical analysis shows that the pruning m

14、ethod leads to consistent estimation of the underlying number of segments, which promotes a parsimonious partitioning structure. Leaf Modeling and Ensemble MethodsFor predictors within segments, we employ the LASSO procedure 7 to select the most infl uential variables and estimate the linear paramet

15、ers. Furthermore, by implanting SLRT as the base predictor in the random forests (RF) formulation, we obtain the ensemble predictor that improves the model stability and predictive accuracy. A weighted version of the RF (WRF) is also proposed, which shows an improved performance over the RF by reduc

16、ing the importance of those under-performing trees in weighting. As a novel tree-based learning method for segmented linear models, SLRT possesses attractive theoretical properties of the consistent identifi cation and estimation of the partitioning structures, which are confi rmed favorably in nume

17、rical simulations. Applied on nine benchmark datasets, SLRT had advantageous predictive performance over several state-of-the-art tree-based methods, with further improvement offered by the RF and WRF with SLRT as the base predictor. The source code of the algorithm is available in the supplementary

18、 material. 1.1Related Work The proposed segmented linear regression tree is a tree-based approach to segmented linear regression (SLR) models, where the partitions of the feature space is axis-aligned. The existing methods of SLR tend to assume a known split variable, such that the partitioning stru

19、cture learning is reduced to the change-points detection with respect to a given variable. For instance, 2,3 considered the case where both the univariate partitioning variable and the number of segments are pre-specifi ed. 4,5 estimated the number of change-points by minimizing the Bayesian informa

20、tion criteria (BIC). 8 selected the change-points via the sum of squared residuals in conjunction with the permutation test, which also assumed a known split variable. Our approach does not require pre-specifi ed information of the segments, and learns the partitioning structure via a tree induction

21、 process. SLR also belongs to the class of region-specifi c linear models. 1 proposed a partition-wise linear model, where axis-aligned partitions are pre-specifi ed and an0-1activeness function was assigned to each region. With each region-specifi c linear model being estimated fi rst, the activene

22、ss functions are optimized through a global convex loss function. 9 proposed a local supervised learning through space partitioning for classifi cation which allows arbitrary partitions and considered linear classifi ers, while 10 employed a Bayesian updating process to partition the feature space t

23、o rectangular bounding boxes and assigned a constant estimation over each partition like CART. Our approach is closely related to the regression tree (regression part of CART, 11), a well-known region-specifi c approach that used a constant-valued predictor within each terminal node. There have been

24、 tree-based algorithms which assigns linear predictors in terminal nodes, which tend to be heuristic without theoretical analysis. One group of the methods 12,13,14 adopted splitting algorithms similar to that of CART, which tend to ignore the correspondence between the evaluation criteria for split

25、s and the models in terminal nodes. Another group 15,16,17,18,19 employed heuristic criteria designed to make the subsets more compliant for linear models in one step, without considering the properties of the estimated boundaries. Our split selection algorithm is closely related to GUIDE 17,18 as b

26、oth utilize the estimated residuals eat a node level. However, GUIDE used the signs of ethat would be less informative than using evia the Kenalls. Another difference is that GUIDE considered the marginal association between signs of eand the regressors instead of conditioning on a split variable an

27、d level, which can lead to the mis-identifi cation of the split variable. 2Segmented Linear Regression Models This section presents the framework of SLRT, and provides the motivation and the theoretical properties to the computational algorithms in Section 3. 2 2.1Framework Consider the relationship

28、 between a univariate responseYand a multivariate explanatory covariate X = (X1, ,Xp)T. Assume that the mean regression functionm(X) = E(Y |X)is partition-wise linear over L0unknown partitions DlL0 l=1 in the domain of X so that Y = L0 X l=1 (l+ X0l)1(X Dl) + ,(1) where(l,l) are regression coeffi ci

29、ents over domainDlandis the random error satisfying E(|X) = 0. In this paper, we consider the case of axis-aligned partitionsDl, which are determined by a collection of split levelsXjq= aqQ0 q=1. The model may be extended to more general shape of Dl by undergoing pre-transformations, which will be a

30、 topic for a future study. The determination ofDlis equivalent to selecting the split variable and level pairs, namely S = (jq,aq)Q0 q=1, whereQ0is determined byL0and the geometry ofDl. The task of partitioning structure learning is to identify the underlying split variables and estimate the split l

31、evels consistently. We adopt the computationally effi cient regression tree approach by applying the split variable and level selection algorithm recursively, ending with terminal nodes for the desired partitions. 2.2Statistical Analysis of Criterion Functions To select the optimal split variable an

32、d level at a node, we fi t the least square regression over the node and select the optimal split by studying the rank correlation between the estimated residuals and the regressors given a candidate split variable and a split level. This is computationally more effi cient than the commonly used cos

33、t minimization procedure 11,17,20, which would require repeated least square fi tting for each candidate split variable and level. For the ease of presentation, we consider the one-time split selection over the root nodet0which contains dataDt0ofnindependent observationsX(i),Y (i)n i=1generated from

34、 Model (1). We are to partitionDt0into two subsets to make the data on each subset more compliant to a linear model. To attain this, let Y = t0+ X0t0 be the fi tted ordinary least square (OLS) regression overDt0, and e=Y Ybe the estimated residuals. If the underlying regression functionm(X)is nonlin

35、ear, the non-linearity will be refl ected in the residuals eand their dependence with the potential split variables. Indeed, ifm(X)is piecewise linear, the estimated residuals eis also piecewise linear inXsince e = PL l=1 ? (l t0) + X0(l t0) ? I(X Dl)+.A regressorXkand etend to be accordant (discord

36、ant) for positive (negative) coeffi cient within each partition. To capture the dependence, we employ the Kendalls coeffi cient 6 to defi ne the following criterion function: C(j,a) = p X k=1 | (Xk, e|Xj a)| + | (Xk, e|Xj a)|,(2) for 1 j p, a Xj(i)n i=1and ?X k, e ? ?Xj a?= P i a), ItR(j,a) and NtR

37、(j,a) are defi ned analogously. Based onC(j,a), we propose the split selection Algorithm 1 in Section 3.1, which is essentially motivated by the following Lemma 2.1. Lemma 2.1 Suppose the regressors are uncorrelated conditional on each partition of DlL0 l=1. As- sume the following technical conditio

38、ns:(1) E(|X)=median(|X)=median(XjE(Xj)|Xk) = 0fork 6= j;(2)for(jk,ak)S,0P(X(s) jk ak|X(s) j0 a0) ? 0asn under the assumptions of Lemma 2.1. Specially, when C(j,a)has a unique maximum(j,a), we have (j,a) S and (j, a) P (j,a) as n . When the regressors are not conditional uncorrelated as required by L

39、emma2.1, we conduct a linear transformation when calculating the conditional Kendalls coeffi cients with e . Specifi cally, letX = (X(1), ,X(n)0be the data matrix fornobservations ofX. Given a split variable and levelXj=a , defi neXL=Xdiag1(Xj(1) a), ,1(Xj(n) a)andXR=Xdiag1(Xj(1) a), ,1(Xj(n) a). Th

40、en, there exists a non-singular matrixP(j,a)such thatZ0Zis diagonal forZ=XP(j,a),XLP(j,a)andXRP(j,a), which is facilitated by the simultaneous diagonalization of positive defi nite matrices (see supplements for detailed calculation procedures ofP(j,a)that are based on the spectral decomposition). Le

41、tZ( j, a) = XP( j, a) be the transformed regressors with Z( j, a) k being the k-th element. Defi ne the modifi ed criterion function with index (j, a), C( j, a)(j,a) = pr X k=1 ? ? ? t ? Z( j, a) k , e|Xj a ? ? ? + ? ? ? t ? Z( j, a) k , e|Xj a ? ? ?, (4) whereZ( j, a) replacesXin (2) and eis still

42、the residuals of OLS regression at the nodet0without transformation. The following Lemma 2.2 shows thatC( j, a)possesses properties similar to that ofC introduced in Lemma 2.1, while Lemma 2.2 does not requireXto be conditional uncorrelated. This motivates the Algorithm 2 in Section 3.1 and leads to

43、 the convergence result in Theorem 2.2. Lemma 2.2Let C( j, a)(j,a)be the probability limit of C( j, a)(j,a), Then, under the same technical conditions (1) and (2) as required in Lemma 2.1,argmax(j,a) C( j, a)(j,a) = ( j, a)when(j, a) S, with S = (jq,aq)Q0 q=1being the genuine set of split variable a

44、nd level pairs. Theorem 2.2Let(j, a)( j, a)= argmax(j,a)C(j, a)(j,a). Under the technical conditions in Lemma 2.2, if (j, a) S, then (j, a)( j, a) P (j, a) as n . Theorem 2.2 implies that the convergence of ? j, a ? (j, a) to(j, a)is a necessary condition for(j, a) S. This motivates the distance min

45、imization procedure in Line 7 of Algorithm 2 . 3Partitioning Structure Learning We fi rst use the recursive partitioning procedures to generate the initial partitions. Then, we employ the cost-complexity tree pruning procedure to obtain a parsimonious partitions structure. 3.1Initial Partitions by R

46、ecursive Partitioning The recursive partitioning needs a split selection algorithm at the node level, and a stopping rule for the termination of the partitioning process, the latter is based on two tuning parameters:Nminthat controls the sample size in any leaf node and Depmaxthat limits the depth o

47、f the tree. The split selection is the core of the recursive partitioning. In the following Algorithm 1 of recursive partitioning, the split is selected by maximizingC(j,a). This is directly motivated by Lemma2.1, 4 that shows the maximum of C(j,a)is within the genuine set of split variable and leve

48、l pairsS. Besides, according to Theorem2.1, the selected split levelX j = adetermined by the maximum of the criterion functionC(j,a)is consistent to one of the underlying genuine partitioning boundary provided the regressors are uncorrelated conditional on each partition. Algorithm 1 Recursive Parti

49、tioning for Conditional Uncorrelated Regressors Input: Training data Dt0= (Xi,Yi)n i=1. Output: Data partitions D = DiL i=1. 1: Initialize: No pre-specifi ed partitions, D = ; the depth of the root node Dep(t0) = 0. 2:if |Dt0| Depmaxthen 3:return D = D Dt0 4:else 5:Fit a least square linear regressi

50、on of Y on X over Dt0and get the estimated residuals e. 6:Calculate the criterion functionC(j,a)for(j,a)in the set of candidate split pairsCt0= (j,a)?j = 1, ,p,aXj(i)|(X(i),Y (i) Dt0,Nmin |i|Xj(i) a| a o Dt0. 9:t0 tLand execute step 2 11. 10:t0 tRand execute step 2 11. 11:end if When taking the corr

51、elation between regressors into consideration, we apply Algorithm 2 to select the optimal split over the original untransformed variables, which retains easy interpretability of the partitions but requires higher computation cost as is analyzed in the following. Since Algorithm 1 has outlined the re

52、cursive partitioning process, Algorithm 2 will concentrate on the split selection at the node level, which corresponds to Line 57 of Algorithm 1. Enlightened by Theorem 2.2, we select the optimal split by minimizing the distance between(j, a)( j, a) and(j, a)in Algorithm 2, where the standardized di

53、stanced( a( j, a), a)=| a(j, a) a|/ (Xj)is used forj =j( j, a), with (Xj)being the sample standard deviation of Xj. Algorithm 2 Split Selection for Correlated Regressors Input: Training data Dt0= (Xi,Yi)n i=1, |Dt0| 2Nmin. Output: The optimal split variable and level pair (j, a); or no splits and t0

54、is a terminal node. 1:Fit a least square linear regression of Y on X over Dt0and get the estimated residuals e. 2:for each (j, a) Ct0do 3:calculate the criterion function C( j, a)(j,a) for each (j,a) Ct0; 4:(j, a)( j, a)= argmax(j,a)Cj, a(j,a), the local optimal split under ( j, a), 5:end for 6:if (

55、j, a)( j, a)| j =j( j, a) 6= then 7:return the optimal split (j, a) = argmin( j, a)d( a(j, a), a) ? ?( j, a) Ct0, j =j( j, a). 8:else 9:return no suitable splits, t0is a terminal node. 10:end if As for the computation complexity of Algorithm 2, suppose there areMtcandidate splits in a nodet, then it

56、 involvesM2 t times of calculations of the criterion functions. Since the calculation of Kendalls inC( j, a)(,)is of complexityO(Ntlog(Nt), withNtbeing the sample size of nodet. Hence the complexity of Algorithm 2 isM2 tO(Ntlog(Nt), which is costly compared to Algorithm 1 that is onlyMtO(Ntlog(Nt). Therefore, we may adopt a stopping strategy that terminates the split process when the mind( a( j, a), a) in Line 7 in Algorithm 2 is larger than a given threshold. Applying either Algorithm 1 or Algorithm 2 recursively for split sele

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論