版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
1、Jianwei Gou,Slide 1,Principal Components Analysis,Objectives: Understand the principles of principal components analysis (PCA) Recognize conditions under which PCA may be useful Use R procedure PRINCOMP to perform a principal components analysis interpret PRINCOMP output.,Jianwei Gou,Slide 2,Typical
2、 Form of Data,A data set in a 8x3 matrix. The rows could be species and columns sampling sites.,1009799 969090 807560 758595 624028 778078 929180 7585100,X =,A matrix is often referred to as a nxp matrix (n for number of rows and p for number of columns). Our matrix has 8 rows and 3 columns, and is
3、an 8x3 matrix.,Jianwei Gou,Slide 3,What are Principal Components?,Principal components are linear combinations of the observed variables. The coefficients of these principal components are chosen to meet three criteria What are the three criteria?,Y = b1X1 + b2 X2 + bn Xn,Jianwei Gou,Slide 4,What ar
4、e Principal Components?,The three criteria: There are exactly p principal components (PCs), each being a linear combination of the observed variables; The PCs are mutually orthogonal (i.e., perpendicular and uncorrelated); The components are extracted in order of decreasing variance.,Jianwei Gou,Sli
5、de 5,A Simple Data Set,XY X11 Y11,XY X11.414 Y1.4142,Correlation matrix Covariance matrix,Jianwei Gou,Slide 6,General Patterns,The total variance is 3 (= 1 + 2) The two variables, X and Y, are perfectly correlated, with all points fall on the regression line. The spatial relationship among the 5 poi
6、nts can therefore be represented by a single dimension. PCA is a dimension-reduction technique. What would happen if we apply PCA to the data?,Jianwei Gou,Slide 7,Graphic PCA,Jianwei Gou,Slide 8,R Program,# Pricipal Components Analysis# entering raw data and extracting PCs# from the correlation matr
7、ix x=c(-1.264911064,-0.632455532,0,0.632455532,1.264911064) y=c(-1.788854382,-0.894427191,0,0.894427191,1.788854382) mydata=cbind(x,y)fit - princomp(mydata, cor=TRUE)summary(fit) # print variance accounted forloadings(fit) # pc loadingsplot(fit,type=lines) # scree plotfit$scores # the principal comp
8、onentsbiplot(fit),Jianwei Gou,Slide 9,Steps in a PCA,Have at least two variables Generate a correlation or variance-covariance matrix Obtain eigenvalues and eigenvectors (This is called an eigenvalue problem, and will be illustrated with a simple numerical example) Generate principal component (PC)
9、scores Plot the PC scores in the space with reduced dimensions All these can be automated by using R.,Jianwei Gou,Slide 10,Covariance or Correlation Matrix?,0,10,20,30,40,Abundance,Sp1,Sp2,Xuhua Xia,Slide 11,Covariance or Correlation Matrix?,Xuhua Xia,Slide 12,Covariance or Correlation Matrix?,Jianw
10、ei Gou,Slide 13,The Eigenvalue Problem,The covariance matrix. The Eigenvalue is the set of values that satisfy this condition. The resulting eigenvalues (There are n eigenvalues for n variables). The sum of eigenvalues is equal to the sum of variances in the covariance matrix.,Finding the eigenvalue
11、s and eigenvectors is called an eigenvalue problem (or a characteristic value problem).,Jianwei Gou,Slide 14,Get the Eigenvectors,An eigenvector is a vector (x) that satisfies the following condition:A x = x In our case A is a variance-covariance matrix of the order of 2, and a vector x is a vector
12、specified by x1 and x2.,Jianwei Gou,Slide 15,Get the Eigenvectors,We want to find an eigenvector of unit length, i.e., x12 + x22 = 1 We therefore have,From Previous Slide,The first eigenvector is one associated with the largest eigenvalue.,Solve x1,Jianwei Gou,Slide 16,Get the PC Scores,First PC sco
13、re,Second PC score,Original data (x and y),Eigenvectors,The original data in a two dimensional space is reduced to one dimension.,Jianwei Gou,Slide 17,What Are Principal Components?,Principal components are a new set of variables, which are linear combinations of the observed ones, with these proper
14、ties: Because of the decreasing variance property, much of the variance (information in the original set of p variables) tends to be concentrated in the first few PCs. This implies that we can drop the last few PCs without losing much information. PCA is therefore considered as a dimension-reduction
15、 technique. Because PCs are orthogonal, they can be used instead of the original variables in situations where having orthogonal variables is desirable (e.g., regression).,Jianwei Gou,Slide 18,Index of hidden variables,The ranking of Asian universities by the Asian Week HKU is ranked second in finan
16、cial resources, but seventh in academic research How did HKU get ranked third? Is there a more objective way of ranking? An illustrative example:,Jianwei Gou,Slide 19,A Simple Data Set,School 5 is clearly the best school School 1 is clearly the worst school,Jianwei Gou,Slide 20,Graphic PCA,-1.7889 -
17、0.8944 0 0.8944 1.7889,Jianwei Gou,Slide 21,Crime Data in 50 States,STATE MURDER RAPE ROBBE ASSAU BURGLA LARCEN AUTO ALABAMA 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7 ALASKA 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3 ARIZONA 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5 ARKANSAS 8.8 27.6 83.2 203.4 972.6 18
18、62.1 183.4 CALIFORNIA 11.5 49.4 287.0 358.0 2139.4 3499.8 663.5 COLORADO 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1 CONNECTICUT 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2 DELAWARE 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0 FLORIDA 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4 GEORGIA 11.7 31.1 140.5 256.5 135
19、1.1 2170.2 297.9 HAWAII 7.2 25.5 128.0 64.1 1911.5 3920.4 489.4 IDAHO 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6 ILLINOIS 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6 . . . . . . . . . . . . . . . . PROC PRINCOMP OUT=CRIMCOMP;,DATA CRIME; TITLE CRIME RATES PER 100,000 POP BY STATE; INPUT STATENAME $1-15 MU
20、RDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO; CARDS; Alabama 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7 Alaska 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3 Arizona 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5 Arkansas 8.8 27.6 83.2 203.4 972.6 1862.1 183.4 California 11.5 49.4 287.0 358.0 2139.4 3499.8 663
21、.5 Colorado 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1 Connecticut 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2 Delaware 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0 Florida 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4 Georgia 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9 Hawaii 7.2 25.5 128.0 64.1 1911.5 3920.4 489
22、.4 Idaho 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6 Illinois 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6 Indiana 7.4 26.5 123.2 153.5 1086.2 2498.7 377.4 Iowa 2.3 10.6 41.2 89.8 812.5 2685.1 219.9 Kansas 6.6 22.0 100.7 180.5 1270.4 2739.3 244.3 Kentucky 10.1 19.1 81.1 123.3 872.2 1662.1 245.4 Louisiana 15
23、.5 30.9 142.9 335.5 1165.5 2469.9 337.7 Maine 2.4 13.5 38.7 170.0 1253.1 2350.7 246.9 Maryland 8.0 34.8 292.1 358.9 1400.0 3177.7 428.5 Massachusetts 3.1 20.8 169.1 231.6 1532.2 2311.3 1140.1 Michigan 9.3 38.9 261.9 274.6 1522.7 3159.0 545.5 Minnesota 2.7 19.5 85.9 85.8 1134.7 2559.3 343.1 Mississip
24、pi 14.3 19.6 65.7 189.1 915.6 1239.9 144.4 Missouri 9.6 28.3 189.0 233.5 1318.3 2424.2 378.4 Montana 5.4 16.7 39.2 156.8 804.9 2773.2 309.2 Nebraska 3.9 18.1 64.7 112.7 760.0 2316.1 249.1 Nevada 15.8 49.1 323.1 355.0 2453.1 4212.6 559.2 New Hampshire 3.2 10.7 23.2 76.0 1041.7 2343.9 293.4 New Jersey
25、 5.6 21.0 180.4 185.1 1435.8 2774.5 511.5 New Mexico 8.8 39.1 109.6 343.4 1418.7 3008.6 259.5 New York 10.7 29.4 472.6 319.1 1728.0 2782.0 745.8,Slide 22,North Carolina 10.6 17.0 61.3 318.3 1154.1 2037.8 192.1 North Dakota 0.9 9.0 13.3 43.8 446.1 1843.0 144.7 Ohio 7.8 27.3 190.5 181.1 1216.0 2696.8
26、400.4 Oklahoma 8.6 29.2 73.8 205.0 1288.2 2228.1 326.8 Oregon 4.9 39.9 124.1 286.9 1636.4 3506.1 388.9 Pennsylvania 5.6 19.0 130.3 128.0 877.5 1624.1 333.2 Rhode Island 3.6 10.5 86.5 201.0 1489.5 2844.1 791.4 South Carolina 11.9 33.0 105.9 485.3 1613.6 2342.4 245.1 South Dakota 2.0 13.5 17.9 155.7 5
27、70.5 1704.4 147.5 Tennessee 10.1 29.7 145.8 203.9 1259.7 1776.5 314.0 Texas 13.3 33.8 152.4 208.2 1603.1 2988.7 397.6 Utah 3.5 20.3 68.8 147.3 1171.6 3004.6 334.5 Vermont 1.4 15.9 30.8 101.2 1348.2 2201.0 265.2 Virginia 9.0 23.3 92.1 165.7 986.2 2521.2 226.7 Washington 4.3 39.6 106.2 224.8 1605.6 33
28、86.9 360.3 West Virginia 6.0 13.2 42.2 90.9 597.4 1341.7 163.3 Wisconsin 2.8 12.9 52.2 63.7 846.9 2614.2 220.7 Wyoming 5.4 21.9 39.7 173.9 811.6 2772.2 282.0 ; PROC PRINCOMP out=crimcomp; run; PROC PRINT; ID STATENAME; VAR PRIN1 PRIN2 MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO; run; PROC GPLO
29、T; PLOT PRIN2*PRIN1=STATENAME; TITLE2 PLOT OF THE FIRST TWO PRINCIPAL COMPONENTS; run; PROC PRINCOMP data=CRIME COV OUT=crimcomp; run; PROC PRINT; ID STATENAME; VAR PRIN1 PRIN2 MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO; run;,/* Add to have a map view*/ proc sort data=crimcomp out=crimcomp; b
30、y STATENAME; run; proc sort data=maps.us2 out=mymap; by STATENAME; run; data both; merge mymap crimcomp; by STATENAME; run; proc gmap data=both; id _map_geometry_; choro PRIN1 PRIN2/levels=15; /* choro PRIN1/discrete; */ run;,Slide 23,Jianwei Gou,Slide 24,MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY
31、 AUTO MURDER 1.0000 0.6012 0.4837 0.6486 0.3858 0.1019 0.0688 RAPE 0.6012 1.0000 0.5919 0.7403 0.7121 0.6140 0.3489 ROBBERY 0.4837 0.5919 1.0000 0.5571 0.6372 0.4467 0.5907 ASSAULT 0.6486 0.7403 0.5571 1.0000 0.6229 0.4044 0.2758 BURGLARY 0.3858 0.7121 0.6372 0.6229 1.0000 0.7921 0.5580 LARCENY 0.10
32、19 0.6140 0.4467 0.4044 0.7921 1.0000 0.4442 AUTO 0.0688 0.3489 0.5907 0.2758 0.5580 0.4442 1.0000,Correlation Matrix,If variables are not correlated, there would be no point in doing PCA. The correlation matrix is symmetric, so we only need to inspect either the upper or lower triangular matrix.,Jianwei Gou,Slide 25,Eigenvalue Difference Proportion Cumulative PRIN1 4.11496 2.87624 0.587851 0.58785 PRIN2 1.23872 0.51291 0.176960 0.76481 PRIN3 0.72582 0.40938 0.103688 0.86850 PRIN4 0.31643 0.05846 0.045205 0.91370 PRIN5 0.25797 0.03593 0.036853 0.95056 PRIN6 0.22204 0.09798 0.
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 幼兒園家園共育活動(dòng)策劃方案
- 食品加工安全及衛(wèi)生管理方案
- 建筑施工臨時(shí)用電安全管理方案
- 人防地下室側(cè)墻防水施工方案
- 橋面瀝青混凝土鋪設(shè)方案
- 建筑施工方案編制軟件軟件評(píng)價(jià)
- 視頻監(jiān)控系統(tǒng)錄像設(shè)備安裝施工方案
- 地下車站沉降控制方案
- 橋梁結(jié)構(gòu)上雕塑附著施工方案
- 高大模板支撐專項(xiàng)施工方案實(shí)施細(xì)則
- 2026年房地產(chǎn)經(jīng)紀(jì)協(xié)理考試題庫及答案(名師系列)
- 2025年湖北警官學(xué)院馬克思主義基本原理概論期末考試真題匯編
- 河道工程測(cè)量施工方案
- 2025嵐圖汽車社會(huì)招聘參考題庫及答案解析(奪冠)
- 2025河南周口臨港開發(fā)區(qū)事業(yè)單位招才引智4人考試重點(diǎn)題庫及答案解析
- 2025年無人機(jī)資格證考試題庫+答案
- 南京工裝合同范本
- 登高作業(yè)監(jiān)理實(shí)施細(xì)則
- DB42-T 2462-2025 懸索橋索夾螺桿緊固力超聲拉拔法檢測(cè)技術(shù)規(guī)程
- 大學(xué)生擇業(yè)觀和創(chuàng)業(yè)觀
- 《經(jīng)濟(jì)法學(xué)》2025-2025期末試題及答案
評(píng)論
0/150
提交評(píng)論