版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
1、Poisson-Randomized Gamma Dynamical Systems Aaron Schein Data Science Institute Columbia University Scott W. Linderman Department of Statistics Stanford University Mingyuan Zhou McCombs School of Business University of Texas at Austin David M. Blei Department of Statistics Columbia University Hanna W
2、allach Microsoft Research New York, NY Abstract This paper presents the Poisson-randomized gamma dynamical system (PRGDS), a model for sequentially observed count tensors that encodes a strong inductive bias toward sparsity and burstiness. The PRGDS is based on a new motif in Bayesian latent variabl
3、e modeling, an alternating chain of discrete Poisson and continuous gamma latent states that is analytically convenient and computationally tractable. This motif yields closed-form complete conditionals for all variables by way of the Bessel distribution and a novel discrete distribution that we cal
4、l the shifted confl uent hypergeometric distribution. We draw connections to closely related models and compare the PRGDS to these models in studies of real-world count data sets of text, international events, and neural spike trains. We fi nd that a sparse variant of the PRGDS, which allows the con
5、tinuous gamma latent states to take values of exactly zero, often obtains better predictive performance than other models and is uniquely capable of inferring latent structures that are highly localized in time. 1Introduction Political scientists routinely analyze event counts of the number of times
6、 countryitook actiona toward countryjduring time stept1. Such data can be represented as a sequence of count tensors Y (1),.,Y(T) each of which contains theVVAevent counts for that time step for every combina- tion ofVsender countries,Vreceivers, andAaction types. International event data sets exhib
7、it “com- plex dependence structures” 2 like coalitions of countries and bursty temporal dynamics. These de- pendence structures violate the independence assumptions of the regression-based methods that politi- cal scientists have traditionally used to test theories of international relations 35. Pol
8、itical scientists have therefore advocated for using latent variable models to infer unobserved structures as a way of controlling for them 6. This approach motivates interpretable yet expressive models that are capable of capturing a variety of complex dependence structures. Recent work has applied
9、 tensor factorization methods to international event data sets 711 to infer coalition structures among countries and topic structures among actions; however, these methods assume that the sequentially observed count tensors are exchangeable, thereby failing to capture the bursty temporal dynamics in
10、herent to such data sets. Sequentiallyobservedcounttensorspresentuniquestatisticalchallengesbecausetheytendtobebursty 12, high-dimensional, and sparse 13,14. There are few models that are tailored to the challenging properties of both time series and count tensors. In recent years, Poisson factoriza
11、tion has emerged as a framework for modeling count matrices 1520 and tensors 13,21,9. Although factorization methods generally scale with the size of the matrix or tensor, many Poisson factorization models yield inference algorithms that scale linearly with the number of non-zero entries. This prope
12、rty allows researchers to effi ciently infer latent structures from massive tensors, provided these tensors are sparse; however, this property is unique to a subset of Poisson factorization models that only posit 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Can
13、ada. (3) Y (2) Y (3) Y (1) ?(1)(2) ? discrete continuous intractable tractable (a) Poissongamma dynamical systems 22 (3) h(1) Y (2) Y (3) h(2) Y (1) g h(3) ?(1) (2) ? (b) Poisson-randomized gamma dynamical systems Figure 1:Left: The PGDS imposes dependencies directly between the gamma latent states,
14、 preventing closed-form complete conditionals. Right: The PRGDS (this paper) breaks these dependencies with discrete Poisson latent statesdoing so yields closed-form conditionals for all variables without data augmentation. non-negative prior distributions, which are diffi cult to chain in state-spa
15、ce models for time series. Hier- archical compositions of non-negative priorsnotably, gamma and Dirichlet distributionstypically introduce non-conjugate dependencies that require innovative approaches to posterior inference. This paper fi lls a gap in the literature between Poisson factorization mod
16、els that are tractablei.e., yielding closed-form complete conditionals that make inference algorithms easy to deriveand those that are expressivei.e., capable of capturing a variety of complex dependence structures. To do so, we introduce an alternating chain of discrete Poisson and continuous gamma
17、 latent states, a new modeling motif that is analytically convenient and computationally tractable. We rely on this motif to construct the Poisson-randomized gamma dynamical system (PRGDS), a model for sequentially observed count tensors that is tractable, expressive, and effi cient. The PRGDS is cl
18、osely related to the Poissongamma dynamical system (PGDS) 22, a recently introduced model for dynamic count ma- trices, that is based on non-conjugate chains of gamma states. These chains are intractable; thus, poste- rior inference in the PGDS relies on sophisticated data augmentation schemes that
19、are cumbersome to derive and impose unnatural restrictions on the priors over other variables. In contrast, the PRGDS in- troduces intermediate Poisson states that break the intractable dependencies between the gamma states (see Fig. 1). Although this motif is only semi-conjugate, it is tractable, y
20、ielding closed-form complete conditionals for the Poisson states by way of the little-known Bessel distribution 23 and a novel discrete distribution that we derive and call the shifted confl uent hypergeometric (SCH) distribution. We study the inductive bias of the PRGDS by comparing its smoothing a
21、nd forecasting performance to that of the PGDS and two other baselines on a range of real-world count data sets of text, interna- tional events, and neural spike data. For smoothing, we fi nd that the PRGDS performs better than or similarly to the PGDS; for forecasting, we fi nd the converse relatio
22、nship. Both models outperform the other baselines. Using a specifi c hyperparameter setting, the PRGDS permits the continuous gamma latent states to take values of exactly zero, thereby encoding a unique inductive bias tailored to sparsity and burstiness. We fi nd that this sparse variant always obt
23、ains better smoothing and forecasting perfor- mance than the non-sparse variant. We also fi nd that this sparse variant yields a qualitatively broader range of latent structuresspecifi cally, bursty latent structures that are highly localized in time. 2Poisson-randomized gamma dynamical systems (PRG
24、DS) Notation.Consider a data set of sequentially observed count tensorsY (1),.,Y(T), each of which hasMmodes. An entryy (t) i 0,1,2,.in thetthtensor is subscripted by a multi-index i (i1,.,iM)that indexes into theMmodes of the tensor. As an example, the event count of the number of times countryitoo
25、k actionatoward countryjduring time steptcan be written as y (t) i where the multi-index corresponds to the sender, receiver, and action typei.e., i = (i,j,a). Generative process. The PRGDS is a form of canonical polyadic decomposition 24 that assumes y (t) i Pois ? (t) K X k=1 k (t) k M Y m=1 (m) k
26、im ? ,(1) 2 where (t) k represents the activation of thekthcomponent at time stept. Each component represents a dependence structure in the data set by way of a factor vector (m) k for each modem. For international events, the fi rst factor vector (1) k = ( (1) k1,., (1) kV) represents the rate at w
27、hich each of theV countries acts as a sender in thekthcomponent while the second factor vector (2) k represents the rate at which each country acts as a receiver. The weightskand(t)represent the scales of component k and time step t. The PRGDS is stationary if (t)=. We posit the following conjugate
28、priors: (t) Gam(a0,b0)and (m) k Dir(a0,.,a0).(2) The PRGDS is characterized by an alternating chain of discrete and continuous latent states. The continuous states (1) k ,., (T) k evolve via the intermediate discrete states h (1) k ,.,h (T) k as follows: (t) k Gam ?() 0 +h (t) k , ? and h (t) k Pois
29、 ? K X k2=1 kk2 (t 1) k2 ? ,(3) where we defi ne (0) k = kto be the per-component weight from Eq. (1). In other words, the PRGDS assumes that (t) k is conditionally gamma distributed with rateand shape equal toh (t) k plus hyperparameter? () 0 0. We adopt the convention that a gamma random variable
30、will be zero, almost surely, if its shape is zero. Therefore, setting? () 0 =0 defi nes a sparse variant of the PRGDS, where the gamma latent state (t) k takes the value of exactly zero providedh (t) k =0i.e., (t) k a.s. = 0ifh (t) k =0. The transition weightkk2in Eq. (3) represents how strongly com
31、ponentk2excites componentk at the next time step. We view these weights collectively as aKKtransition matrixand impose Dirichlet priors over the columns of this matrix. We also place a gamma prior over concentration parameter . This prior is conjugate to the gamma and Poisson distributions in which
32、it appears: Gam(0,0) and k Dir(a0,.,a0) such that PK k1k1k = 1.(4) Fortheper-componentweights1,.,K , weuseahierarchicalpriorwithasimilarfl avortoEq.(3): k Gam ? ?() 0 K + gk, ? and gk Pois ? K ? ,(5) where? () 0 is analogous to? () 0 . Finally, we use the following gamma priors, which are both conju
33、gate: Gam(a0,b0)and Gam(0,0).(6) The PRGDS has fi ve fi xed hyperparameters:? () 0 ,? () 0 ,0,a0, andb0. For the empirical studies in 5, we seta0=b0=0.01 to defi ne weakly informative gamma and Dirichlet priors and set0=10 to defi ne a gamma prior that promotes values close to 1; we consider ? () 0
34、0,1 and set ? () 0 =1. Properties.In Eq. (5), both? () 0 andare divided by the number of componentsK. This means that asthenumberofcomponentsgrowsK , theexpectedsumoftheweightsremainsfi niteandfi xed: X k=1 Ek = X k=1 ?() 0 K + Egk ?1 = X k=1 ?() 0 K + K ?1 = ?() 0 + ?1.(7) This prior encodes an ind
35、uctive bias toward small values ofk and may be interpreted as the fi nite truncation of a novel Bayesian nonparametric process. A small value ofkshrinks the Poisson rates of bothy (t) i and the fi rst discrete latent stateh (0) k . As a result, this prior encourages the PRGDS to only infer component
36、s that are both predictive of the data and useful for capturing the temporal dynamics. The marginal expectation of (t)=( (t) 1 ,., (t) K) takes the form of a linear dynamical system: E ?(t) |(t 1)?= E ?E?(t) |h(t 1)?= ? () 0 1+ (t 1).(8) This is becauseE ?(t) k ? = ?() 0 +E ?h(t) k ?1 = ?() 0 + PK k
37、2=1kk2 (t 1) k2 ?1 by iterated expec- tation. Concentration parameterappears in both the Poisson and gamma distributions in Eq. (3). It contributes to the variance of the PRGDS, while simultaneously canceling out of the expectation in Eq. (8), except for its role in the additive term ? () 0 1, which
38、 itself disappears when ? () 0 =0. Finally, we can analytically marginalize out all of the discrete Poisson latent states to obtain a purely continuous dynamical system. When ? () 0 0, this dynamical system can be written as follows: (t) k RG1 ? ? () 0 , K X k2=1 kk2 (t 1) k2 , ? ,(9) where RG1 deno
39、tes the randomized gamma distribution of the fi rst type 23,25. When? () 0 =0, the dynamical system can be written in terms of a limiting form of the RG1. We describe the RG1 in Fig. 2. 3 051015 0.0 0.2 0.4 0.6 0.8 1.0 1.2 P(|,?,?=1) = 0.5 051015 = 1 051015 = 4 ?=4 ?=2 ?=0.5 RG1(;,?,?) = ? r ? ? !?1
40、 e?I?1 2 p ? Figure 2: The randomized gamma distribution of the fi rst type (RG1) 23,25 has support0 and is defi ned by three parameters:?,0 . Its PDF is displayed in the fi gure;I?1() is the modifi ed Bessel function of the fi rst kind 26. When? 0: m Pois(c3), Gam ?() 0 +h,c2?, and h Pois(c1).(12)
41、This model is semi-conjugate. The gamma prior overis conjugate to the Poisson and its posterior is ?| ? Gam ?() 0 +h + m, c2+ c3?.(13) The Poisson prior overhis not conjugate to the gamma; however, despite this, the posterior ofh is still available in closed form by way of the Bessel distribution 23
42、, which we defi ne in Fig. 3(a): ?h| ? Bes? () 0 1, 2 p c2c1?.(14) The Bessel distribution can be sampled effi ciently 53; our Cython implementation is available online.1Provided that? () 0 0, samplingandhiteratively from Eqs. (13) and (14) constitutes a valid Markov chain for posterior inference. W
43、hen? () 0 =0, though, a.s. = 0ifh=0, and vice versa. As a result, this Markov chain has an absorbing condition ath=0and violates detailed balance. In this case, we must therefore sample h with marginalized out. Toward that end, we prove Theorem 1. 1 5 020406080100 h 0.00 0.05 0.10 0.15 0.20 0.25 0.3
44、0 P(h|v,a) v=-0.5, a=30 v=-0.8, a=10 v=1, a=50 v=2, a=7 v=10, a=150 v=400, a=300 Bes(h;v,a) = (a 2) 2h+v h!?(v+h+1)Iv(a) (a) Bessel distribution 23 120406080100 h 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 P(h|m,) m=1, =1 m=1, =10 m=1, =50 m=10, =1 m=1000, =1 m=45, =45 SCH(h;m,) = (h+m+1)! (h+1)!h!m! h
45、?1 1F1(m+1;2;) (b) Shifted confl uent hypergeometric (SCH) distribution Figure 3:Two discrete distributions that arise as posteriors in PoissongammaPoisson chains. Theorem 1: The incomplete conditional P(h|? () 0 =0,) , R P(h,|? () 0 =0,)d is (h|) ( Pois? c1c2 c3+c2 ? if m = 0 SCH?m, c1c2 c3+c2 ? ot
46、herwise, (15) where SCH denotes the shifted confl uent hypergeometric distribution. We describe the SCH in Fig. 3(b) and provide further information in the supplementary material, including the derivation of its PMF, PGF, and mode, along with details of how we sample from it and the proof for Theore
47、m 1. 4.2Closed-form complete conditionals for the PRGDS The PRGDS admits a latent source representation, so the fi rst step of posterior inference is therefore ?(y(t) ik) K k=1| ? Multinom ? y (t) i , ? k (t) k QM m=1 (m) kim ?K k=1 ? .(16) We may similarly representh (t) k under its latent source r
48、epresentationi.e.,h (t) k h (t) k= PK k2=1h (t) kk2, whereh (t) kk2Pois ? kk2 (t 1) k2 ?. When notationally convenient, we use dot-notation (“”) to denote summing over a mode. In this case,h (t) k denotes the sum of thekthrow of theKKmatrix of latent countsh (t) kk2. The complete conditional of thek
49、 th row of counts, when conditioned on their sumh (t) k, is ?(h(t) kk2) K k2=1| ? Multinom ?h(t) k, (kk2 (t 1) k2 )K k2=1 ? .(17) Toderivetheconditionalfor (t) k weaggregatethePoissonvariablesthatdependonit. ByPoissonaddi- tivity, thecolumnsumh (t + 1) k =PK k1=1h (t + 1) k1k isdistributedash (t + 1
50、) k Pois ?(t) k k?andsimilarlyy (t) k is distributed asy (t) k Pois? (t) k (t)k QM m=1 (m) k ?. The count m (t) k , h (t + 1) k +y (t) k isolates all depen- dence on (t) k and is also Poisson distributed. By gammaPoisson conjugacy, the conditional of (t) k is ?(t) k | ? Gam? () 0 +h (t) k + m (t) k
51、, + k+ (t)k QM m=1 (m) k ?. (18) When ? () 0 0, we apply the identity in Eq. (14) and sample h (t) k from its complete conditional: ?h(t) k | ? Bessel ? ? () 0 1, 2 q (t) k 2PK k2=1kk2 (t 1) k2 ? .(19) When? () 0 =0, we instead apply Theorem 1 to sampleh (t) k, wherem (t) k is analogous tomin Eq. (1
52、5): ?h(t) k | (t) k ? ?Pois(t) k )if m (t) k =0 SCH(m (t) k , (t) k )otherwise where (t) k , 2 PK k2=1kk2 (t 1) k2 + k+(t)k QM m=1 (m) k . (20) The complete conditionals forkandgkfollow from applying the same PoissongammaPoisson identities, while the complete conditionals for , , (m) k , k, and all
53、follow from conjugacy. 6 0.0 0.5 1.0 1.5 2.0 SMOOTHING Information gain over BPTF (nats) GDELT 0.0 0.5 1.0 ICEWS 0 1 2 NeurIPS 0.0 0.2 0.4 DBLP 0.00 0.05 0.10 0.15 SOTU 0.00 0.25 0.50 0.75 1.00 FORECASTING Information gain over BPTF (nats) 0.0 0.2 0.4 0 2 5 7 10 0.0 0.1 0.2 0.3 0.0 0.1 0.2 GP-DPFA P
54、GDS PRGDS () 0 =0 PRGDS () 0 =1 (a) Matrix empirical studies (originally described by Schein et al. 22) 0.00 0.02 0.04 0.06 GDELT 0.000 0.002 0.004 0.006 0.008 0.010 ICEWS 0.00 0.01 0.02 0.03 0.04 Macaques 0.000 0.025 0.050 0.075 0.100 0.125 0.000 0.001 0.002 0.003 0.004 0.00 0.02 0.04 0.06 (b) Tens
55、or empirical studies Figure 4:The smoothing performance (top row) or forecasting performance (bottom row) of each model is quantifi ed by its information gain over a non-dynamic baseline (BPTF 9), where higher values are better. 5Empirical studies As explained in the previous section, the Poissongam
56、maPoisson motif of the PRGDS (see 4.1) yields a more tractable (see Fig. 1) and fl exible (see 3) model than previous models. This motif also encodes a unique inductive bias tailored to sparsity and burstiness that we test by comparing the PRGDS to the PGDS (described in 3). As we can see by compari
57、ng Eqs. (9) and (10), comparing these models isolates the impact of the PoissongammaPoisson motif. Because the PGDS was pre- viously introduced to model aTVmatrixYof sequentially observedV-dimensional count vectors y(1),.,y(T), we generalize the PGDS toM-mode tensors and provide derivations of its c
58、omplete conditionals in the supplementary material. Our Cython implementation of this generalized PGDS (and the PRGDS) is available online. We also compare the variant of the PRGDS with? () 0 =1to the variant with? () 0 =0, which allows the continuous gamma latent states to take values of exactly zero. Setup.Our empirical studies all have the following setup. For each data setY (1),.
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2026年哈爾濱市道里區(qū)愛建社區(qū)衛(wèi)生服務中心招聘備考題庫及完整答案詳解一套
- 2026年中國能源建設集團山西電力建設有限公司招聘備考題庫及1套完整答案詳解
- 餐飲安全意識培訓課件
- 2026年中原細胞和免疫治療實驗室公開招聘勞務派遣人員備考題庫及參考答案詳解1套
- 2026年中交物業(yè)服務集團有限公司招聘備考題庫及完整答案詳解1套
- 2026年四川啟賽微電子有限公司招聘研發(fā)部部長崗位的備考題庫含答案詳解
- 2026年中國機械總院集團海西(福建)分院有限公司招聘備考題庫及1套完整答案詳解
- 2026年南寧市百花嶺路幼兒園招聘備考題庫參考答案詳解
- 7.3.1世界上最大的黃土堆積區(qū)-黃土高原 課件 2025-2026學年人教版地理八年級下冊
- 2026年南昌大學附屬眼科醫(yī)院高層次人才招聘9人備考題庫完整答案詳解
- 羅茨鼓風機行業(yè)發(fā)展趨勢報告
- 慢性阻塞性肺疾病患者非肺部手術麻醉及圍術期管理的專家共識
- 燈謎大全及答案1000個
- 中建辦公商業(yè)樓有限空間作業(yè)專項施工方案
- 急性胰腺炎護理查房課件ppt
- 初三數(shù)學期末試卷分析及中考復習建議課件
- GB/T 4074.8-2009繞組線試驗方法第8部分:測定漆包繞組線溫度指數(shù)的試驗方法快速法
- 第十章-孤獨癥及其遺傳學研究課件
- 人教版四年級上冊語文期末試卷(完美版)
- 防空警報系統(tǒng)設計方案
- 酒店管理用水 酒店廚房定額用水及排水量計算表分析
評論
0/150
提交評論