HammersleyClifford定理_第1頁
HammersleyClifford定理_第2頁
HammersleyClifford定理_第3頁
HammersleyClifford定理_第4頁
HammersleyClifford定理_第5頁
已閱讀5頁,還剩8頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、Hammersley Clifford定理Hammersley Clifford theorem The Hammersley Clifford theorem is aresult in probability theory,mathematical statistics and statistical mechanics,that gives necessary and sufficient conditions under which apositive probability distribution can be represented as aMarkov network(also

2、 known as aMarkov random field).It states that aprobability distribution that has apositive mass or density satisfies one of the Markov properties with respect to an undirected graph Gif and only if it is aGibbs random field,that is,its density can be factorized over the cliques(or complete subgraph

3、s)of the graph.The relationship between Markov and Gibbs random fields was initiated by Roland Dobrushin1and Frank Spitzer2in the context of statistical mechanics.The theorem is named after John Hammersley and Peter Clifford who proved the equivalence in an unpublished paper in 1971.34Simpler proofs

4、 using the inclusion-exclusion principle were given independently by Geoffrey Grimmett,5Preston6and Sherman7in 1973,with afurther proof by Julian Besag in 1974.8NotesDobrushin,P.L.(1968),The Description of aRandom Field by Means of Conditional Probabilities and Conditions of Its Regularity,Theory of

5、 Probability and its Applications 13(2):197 224,doi:10.1137/1113026,Spitzer,Frank(1971),Markov Random Fields and Gibbs Ensembles,The American Mathematical Monthly 78(2):142 154,doi:10.2307/2317621,JSTOR 2317621,Hammersley,J.M.;Clifford,P.(1971),Markov fields on finite graphs and lattices,Clifford,P.

6、(1990),Markov random fields in statistics,in Grimmett,G.R.;Welsh,D.J.A.,Disorder in Physical Systems:A Volume in Honour of John M.Hammersley,Oxford University Press,pp.19 32,ISBN 0198532156,MR 1064553,retrieved 2009-05-04Grimmett,G.R.(1973),A theorem about random fields,Bulletin of the London Mathem

7、atical Society 5(1):81 84,doi:10.1112/blms/5.1.81,MR 0329039Preston,C.J.(1973),Generalized Gibbs states and Markov random fields,Advances in Applied Probability 5(2):242 261,doi:10.2307/1426035,JSTOR 1426035,MR 0405645.JSTOR 1426035,Sherman,S.(1973),Markov random fields and Gibbs random fields,Israe

8、l Journal of Mathematics 14(1):92 103,doi:10.1007/BF 02761538,MR 0321185Besag,J.(1974),Spatial interaction and the statistical analysis of lattice systems,Journal of the Royal Statistical Society.Series B(Methodological)36(2):192 236,MR 0373208.JSTOR 2984812 Further reading Bilmes,Jeff(Spring 2006),

9、Handout 2:Hammersley Clifford,course notes from University of Washington course.Grimmett,Geoffrey,Probability on Gr aphs,Chapter 7,Helge,The Hammersley Clifford Theorem and its Impact on Modern Statistics,probability-related article is astub.You can help Wikipedia by expanding it.Retrieved from Clif

10、ford_theoremFrom Wikipedia,the free encyclopediaThe first afternoon of the memorial session for Julian Besag in Bristol was an intense and at times emotional moment,where friends and colleagues of Julian shared memories and stories.This collection of tributes showed how much of alarger-than-life cha

11、racter he was,from his long-termed and wide-ranged impact on statistics to his very high expectations,both for himself and for others,leading to atotal and uncompromising research ethics,to his passion forextremesports and outdoors.(The stories during and after diner were of amore personal nature,bu

12、t at least as much enjoyable!)The talks on the second day showed how much and how deeply Julian had contributed to spatial statistics and agricultural experiments,to pseudo-likelihood,to Markov random fields and image analysis,and to MCMC methodology and practice.I hope Idid not botch too much my pr

13、esentation on the history of MCMC,while Ifound reading through the 1974,1986 and 1993 Read Papers and their discussions an immensely rewarding experiment(I wish Ihad done prior to completing our Statistical Science paper,but it was bound to be incomplete by nature!).Some interesting links made by th

14、e audience were the prior publication of proofs of the Hammersley-Clifford theorem in 1973(by Grimmet,Preston,and Steward,respectively),as well as the proposal of aGibbs sampler by Brian Ripley as early as 1977(even though Hastings did use Gibbs steps in one of his examples).Christophe Andrieu also

15、pointed out to me avery early Monte Carlo review by John Halton in the 1970 SIAM Rewiew,review that Iwill read(and commment)as soon as possible.Overall,I am quite glad Icould take part in this memorial and Iam grateful to both Peters for organising it as afitting tribute to Julian.Markov Chain Monte

16、 Carlo(MCMC)methods are currently avery active field of research.MCMC methods are sampling methods,based on Markov Chains which are ergodic with respect to the target probability measure.The principle of adaptive methods is to optimize on the fly some design parameters of the algorithm with respect

17、to agiven criterion reflecting the samplers performance(opti mize the acceptance rate,optimize an importance sampling function,etc).A postdoctoral position is opened to work on the numerical analysis of adaptive MCMC methods:convergence,numerical efficiency,development and analysis of new algorithms

18、.A particular emphasis will be given to applications in statistics and molecular dynamics.(Detailed description)Position funded by the French National Research Agency(ANR)through the 2009-2012 project ANR-08-BLAN-0218.The position will benefit from an interdisciplinary environment involving numerica

19、l analysts,statisticians and probabilists,and of strong interactions between the partners of the project ANR-08-BLAN-021 In the most recent issue of Statistical Science,the special topic isCelebrating the EM Algorithms Quandunciacentennial.It contains an historical survey by Martin Tanner and Wing W

20、ong on the emergence of MCMC Bayesian computation in the 1980s,This survey is more focused and more informative than our global history(also to appear in Stati stical Science).In particular,it provides the authorsanalysis as to why MCMC was delayed by ten years or so(or even more when considering th

21、at aGibbs sampler as asimulation tool appears in both Hastings(1970)and Besags(1974)papers).They dismissourconcerns about computing power(I was running Monte Carlo simulations on my Apple IIe by 1986 and asingle mean square error curve evaluation for aJames-Stein type estimator would then take close

22、 to aweekend!)and Markov innumeracy,rather attributing the reluctance to alack of confidence into the method.This perspective remains debatable as,apart from Tony OHagan who was then fighting again Monte Carlo methods as being un-Bayesian(1987,JRSS D),I do not remember any negative attitude at the t

23、ime about simulation and the immediate spread of the MCMC methods from Alan Gelfands and Adrian Smiths presentations of their 1990 paper shows on the opposite that the Bayesian community was ready for the move.Another interesting point made in this historical survey is that Metropolisand other Marko

24、v chain methods were first presented outside simulation sections of books like Hammersley and Handscomb(1964),Rubinstein(1981)and Ripley(1987),perpetuating the impression that such methods were mostly optimisation or niche specific methods.This is also why Besags earlier works(not mentioned in this

25、survey)did not get wider recognition,until later.Something Iwas not aware is the appearance of iterative adap tive importance sampling(i.e.population Monte Carlo)in the Bayesian literature of the 1980s,with proposals from Herman van Dijk,Adrian Smith,and others.The appendix about Smith et al.(1985),

26、the 1987 special issue of JRSS D,and the computation contents of Valencia 3(that Isadly missed for being in the Army!)is also quite informative about the perception of computational Bayesian statistics at this time.A missing connection in this survey is Gilles Celeux and Jean Diebolts stochastic EM(

27、or SEM).As early as 1981,with Michel Broniatowski,they proposed asimulated version of EM for mixtures where the latent variable zwas simulated from its conditional distribution rather than replaced with its expectation.So this was the first half of the Gibbs sampler for mixtures we completed with Je

28、an Diebolt about ten years later.(Also found in Gelman and King,1990.)These authors did not get much recognition from the community,though,as they focused almost exclusively on mixtures,used simulation to produce arandomness that would escape the local mode attraction,rather than targeting the poste

29、rior distribution,and did not analyse the Markovian nature of their algorithm until later with the simulated annealing EM algorithm.Share:Share概率圖模型分為有向和無向的模型。有向的概率圖模型主要包括貝葉斯網(wǎng)絡(luò)和隱馬爾可夫模型,無向的概率圖模型則主要包括馬爾可夫隨機(jī)場(chǎng)模型和條件隨機(jī)場(chǎng)模型。2001年,卡耐基.梅隆大學(xué)的Lafferty教授(John Lafferty,Andrew McCallum,F(xiàn)ernando Pereira)等針對(duì)序列數(shù)據(jù)處理提出

30、了CRF模型(Conditional Random Fields Probabilistic Models for Segmenting and Labeling Sequence Data)。這種模型直接對(duì)后驗(yàn)概率建模,很好地解決了MRF模型利用多特征時(shí)需要復(fù)雜的似然分布建模以及不能利用觀察圖像中上下文信息的問題。Kumar博士在2003年將CRF模型擴(kuò)展到2-維格型結(jié)構(gòu),開始將其引入到圖像分析領(lǐng)域,吸引了學(xué)術(shù)界的高度關(guān)注。對(duì)給定觀察圖像,估計(jì)對(duì)應(yīng)的標(biāo)記圖像y觀察圖像,x未知的標(biāo)記圖像1.如果直接對(duì)后驗(yàn)概率建模(即考慮公式中的第一項(xiàng)),可以得到判別的(Discriminative)概率框架。

31、特別地,如果后驗(yàn)概率直接通過Gibbs分布建模,(x,y)稱為一個(gè)CRF,得到的模型稱為判別的CRF模型。2.通過對(duì)(x,y)的聯(lián)合建模(即考慮公式中的第二項(xiàng)),可以得到聯(lián)合的概率框架?。特別地,如果考慮雙隨機(jī)場(chǎng)(x,y)的馬爾可夫性,即公式的第二項(xiàng)為Gibbs分布,那么(x,y)被稱為一個(gè)雙MRF(Pairwise MRF,PMRF)9。3.后驗(yàn)概率通過公式所示的p(x)和p(y|x)建模,其中p(y|x)為生成觀察圖像的模型,因此這種框架稱為生成的(Generative)概率框架。特別地,如果先驗(yàn)p(x)服從Gibbs分布,x稱為一個(gè)MRF12,得到的模型稱為生成的MRF模型。-【面向圖像

32、標(biāo)記的隨機(jī)場(chǎng)模型研究】運(yùn)用Hammersley-Clifford定理,標(biāo)記場(chǎng)的后驗(yàn)概率服從Gibbs分布其中,z(y,)為歸一化函數(shù),c為定義在基團(tuán)c上的帶有參數(shù)的勢(shì)函數(shù)。CRF模型中一個(gè)關(guān)鍵的問題是定義合適的勢(shì)函數(shù)。因此發(fā)展不同形式的擴(kuò)展CRF模型是當(dāng)前CRF模型的一個(gè)主要研究方向。具體的技術(shù)途徑包括:一是擴(kuò)展勢(shì)函數(shù)。通過引進(jìn)更復(fù)雜的勢(shì)函數(shù),更多地利用多特征和上下文信息;二是擴(kuò)展模型結(jié)構(gòu)。通過引入更復(fù)雜的模型結(jié)構(gòu),可以利用更高層次、更多形式的上下文信息。擴(kuò)展勢(shì)函數(shù)(1)對(duì)數(shù)回歸(Logistic Regression,LR)(2)支持向量機(jī)(Support Vector Machine,SV

33、M)(3)核函數(shù)(4)Boost(5)Probit擴(kuò)展模型結(jié)構(gòu)(1)動(dòng)態(tài)CRF模型動(dòng)態(tài)CRF(Dynamic CRF,DCRF)模型用于對(duì)給定的觀測(cè)數(shù)據(jù),同時(shí)進(jìn)行多個(gè)標(biāo)記任務(wù),以此充分利用不同類型標(biāo)記之間的相關(guān)性。(2)隱CRF模型CRF模型的另一類擴(kuò)展圖結(jié)構(gòu)是在觀察圖像和標(biāo)記圖像之間引入過渡的隱變量層h,得到的模型稱為隱CRF(Hidden Conditional Random Field,HCRF)。隱含層的引入使CRF模型具有更豐富的表達(dá)能力,可以對(duì)一些子結(jié)構(gòu)進(jìn)行建模。隱變量可以是抽象的,也可以具有明確的物理意義。(3)樹結(jié)構(gòu)CRF模型CRF模型的標(biāo)準(zhǔn)圖結(jié)構(gòu)中,標(biāo)記之間的相關(guān)性通過格型結(jié)

34、構(gòu)的邊(edge)表示。(4)混合CRF模型1.Markov假設(shè)有限歷史以及平穩(wěn)。有限歷史指的是和有限的歷史相關(guān)平穩(wěn)指的是兩個(gè)狀態(tài)的關(guān)系和時(shí)間無關(guān)。2.HMM給定觀察序列O1,O2,O3.,每個(gè)觀察Oi對(duì)應(yīng)隱狀態(tài)序列S1,S2.Sn。HMM解決三個(gè)問題:1.計(jì)算觀察序列的概率利用forward算法即可2.跟定觀察序列,計(jì)算出對(duì)應(yīng)概率最大的隱狀態(tài)序列Viterbi算法,提供O(N*N*T)的復(fù)雜度3.給定觀察序列以及狀態(tài)集合,估計(jì)參數(shù)A(狀態(tài)轉(zhuǎn)移矩陣)B(發(fā)射概率)EM算法,forward-backword算法問題2類似序列標(biāo)注的問題Pr(O|S)=p(O1|S1)*p(O2|S2).p(On|

35、Sn)P(O)=p(O1|start)*p(O2|O1).p(On|On-1)P(S|O)=argmaxPr(O|S)P(O)=argmax(.p(Oi|Si)*p(Si|Si-1).)ME:分類器,將給定的觀察值O進(jìn)行分類。ME需要從O中提取出相關(guān)的Feature以及計(jì)算對(duì)應(yīng)w。注意:主要解決的是觀察值O分類問題,如文本分類d那個(gè)P(C=c|O)MEMM:序列標(biāo)注問題,綜合ME和HMM,提供更多的Featrue,優(yōu)于HMM考慮到t時(shí)間附近觀察以及狀態(tài)對(duì)其影響。P(S|O)=argmax(P(S|O)=argmax(.p(Si|O,Si-1).),其中O可以是Oi,也可以是Oi-1等觀察。最大

36、熵模型Maximum Entropy現(xiàn)從一個(gè)簡(jiǎn)單例子看起:比如華盛頓和維吉利亞都可以作人名和地名,而從語料中只知道p(人名)=0.6,那么p(華盛頓=人名)的概率為多少比較好呢?一個(gè)直觀的想法就是p(華盛頓=人名)=0.3。為什么呢?這就是在滿足已有證據(jù)的情況下不做任何其他假設(shè),也就是熵最大,這就是最大熵模型的原理?,F(xiàn)在來看模型的定義:首先,明確模型的目標(biāo):給定一個(gè)上下文x,估計(jì)p(y|x)接著,從訓(xùn)練樣本中我們可以得到一串標(biāo)注過的樣本(x_i,y_i),其中x_i為上下文,y_iin Y為類別然后構(gòu)造特征函數(shù)f(x,y)=1如果x,y滿足一些條件,比如x=記者*,y=人名0 otherwis

37、e注意x是一個(gè)上下文,是個(gè)向量,而不是單個(gè)詞(最大熵模型里的特征概念不同于模式識(shí)別里的特征,這里的特征即特征函數(shù),通常是二值函數(shù),也有直接定義成實(shí)數(shù)的,比如jeon-sigir06里直接把f定義為KDE距離,不是很明白那樣定義的好處。)于是模型的約束就是對(duì)于所有的特征函數(shù)模型中的期望等于樣本的期望,即E_p(f)=E_tilde p(f)其中E_p(f)=sum_x,yp(x,y)f(x,y)=sum_x,yp(x)p(y|x)f(x,y)approxsum_x,ytilde p(x)p(y|x)f(x,y)tilde p(f)=sum_x,ytilde p(x,y)f(x,y),并且對(duì)于任意

38、的x:sum_y p(y|x)=1而模型的熵為H(p)=-sum_x,ytilde p(x)p(y|x)log p(y|x)在滿足約束的情況下,使熵最大,于是問題即求p*=argmax_pin P-sumx,yp(y|x)tilde p(x)log p(y|x)where P=p(y|x)|all f_i:sum_x,yp(y|x)tilde p(x)f_i(x,y)=sum_x,ytilde p(x,y)f_i(x,y),all x:sum_y p(y|x)=1可以證明,模型的最優(yōu)解的形式為p(y|x)=exp(sum_ilambda_i f_i(x,y)/Zx where Zx=sum_y

39、 exp(sum_ilambda_i f_i(x,y)具體證明請(qǐng)見拜下qxred大牛隱馬爾可夫模型Hidden Markov Model馬爾可夫模型實(shí)際上是個(gè)有限狀態(tài)機(jī),兩兩狀態(tài)間有轉(zhuǎn)移概率;隱馬爾可夫模型中狀態(tài)不可見,我們只能看到輸出序列,也就是每次狀態(tài)轉(zhuǎn)移會(huì)拋出個(gè)觀測(cè)值;當(dāng)我們觀察到觀測(cè)序列后,要找到最佳的狀態(tài)序列。設(shè)O為觀測(cè)值,x為隱變量,那么模型要找到讓P(O)最大的最佳隱藏狀態(tài),而P(O)=sum_x P(O|X)P(X)而其中P(X)=p(x_1)p(x_2.n|x_1)=p(x_1)p(x_2|x_1)p(x_3.n|x_1,x_2)根據(jù)x_i只與x_i-1相關(guān)的假設(shè)有P(X)=

40、p(x_1)p(x_2|x_1)p(x_3|x_2)而類似的P(O|X)=p(o_1|x_1.n)p(o_2.n|o_1x_1.n)=p(o_1|x_1.n)p(o_2|o_1x_1.n)p(o_3.n|o_1,2,x_1.n)根據(jù)o_i只與x_i有關(guān)的假設(shè)有P(O|X)=p(o_1|x_1)p(o_2|x_2)合起來就是P(O)=sum_x p(x_1)p(x_2|x_1)p(o_1|x_1)p(x_3|x_2)p(o_2|x_2)定義向前變量alpha_i(t)為t時(shí)刻以狀態(tài)S_i結(jié)束時(shí)的總概率alpha_j(t)=sum_i=1Nalpha_ip(x_t=j|x_t-1=i)p(o_t=

41、i|x_t=i)定義向后變量beta_i(t)為給定當(dāng)前狀態(tài)S_i和t時(shí)刻情況下觀測(cè)序列中剩余部分的概率和beta_i(t)=sum_j=1Np(x_t=j|x_t+1=i)p(o_t=i|x_t=i)beta_j(t+1)于是觀測(cè)序列的概率為P(O,X_t=i)=alpha_i(t)beta_i(t)最佳狀態(tài)可以由動(dòng)態(tài)規(guī)劃得到模型參數(shù)可以由EM算法得到EM具體請(qǐng)見再拜qxred大牛最大熵隱馬Maximum Entropy Markov Model HMM的缺點(diǎn)是根據(jù)觀測(cè)序列決定狀態(tài)序列,是用聯(lián)合模型解決條件問題;另外,幾乎不可能枚舉所有所有可能的觀測(cè)序列。而MEMM解決了這些問題。首先,ME

42、MM和MM或HMM都有本質(zhì)不同,MEMM估計(jì)的是P(S|O),而MM估計(jì)的是P(S),HMM估計(jì)的都是P(O)。P(S|O)=P(s_1|O)P(s_2.n|s_1,O)=P(s_1|O)P(s_2|s_1,O)P(s_3.n|s_1,s_2,O)然后根據(jù)假設(shè)有P(S|O)=P(s_1|O)P(s_2.n|s_1,O)=P(s_1|o_1)P(s_2|s_1,o_2)P(s_3.n|s_1,s_2,o_3)重新定義特征函數(shù):a=b,r b是指示函數(shù)用于指示當(dāng)前觀測(cè)r是狀態(tài)值f_a(o_t,S_t)=1 if b(o_t)is true and s_t=r于是約束變?yōu)镋_a=sum_k=1m_ssum_sin SP(s|s,o_k)f_a(o_k,s)/m_s=sum_k=1m_sf_a(o_k,s_k)=F_a這個(gè)目標(biāo)函數(shù)和ME的目標(biāo)函數(shù)實(shí)質(zhì)是一樣的于是解的形式為P(s|s,o)=exp(sum_alambda_a f_a(o,s)/Z(o,s)然后依然采用HMM中的前向后向變量,尋找最佳序列而實(shí)際上得到的序列是由計(jì)算P(s|o)=P(s_0)P(s_1|s_0,o_0)P(s_2|s_1,o_1)得到條件隨機(jī)場(chǎng)Conditional Random Fields MEMM其實(shí)是用

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論