R軟件做PCAppt課件.ppt_第1頁
R軟件做PCAppt課件.ppt_第2頁
R軟件做PCAppt課件.ppt_第3頁
R軟件做PCAppt課件.ppt_第4頁
R軟件做PCAppt課件.ppt_第5頁
已閱讀5頁,還剩24頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、Jianwei Gou,Slide 1,Principal Components Analysis,Objectives: Understand the principles of principal components analysis (PCA) Recognize conditions under which PCA may be useful Use R procedure PRINCOMP to perform a principal components analysis interpret PRINCOMP output.,Jianwei Gou,Slide 2,Typical

2、 Form of Data,A data set in a 8x3 matrix. The rows could be species and columns sampling sites.,1009799 969090 807560 758595 624028 778078 929180 7585100,X =,A matrix is often referred to as a nxp matrix (n for number of rows and p for number of columns). Our matrix has 8 rows and 3 columns, and is

3、an 8x3 matrix.,Jianwei Gou,Slide 3,What are Principal Components?,Principal components are linear combinations of the observed variables. The coefficients of these principal components are chosen to meet three criteria What are the three criteria?,Y = b1X1 + b2 X2 + bn Xn,Jianwei Gou,Slide 4,What ar

4、e Principal Components?,The three criteria: There are exactly p principal components (PCs), each being a linear combination of the observed variables; The PCs are mutually orthogonal (i.e., perpendicular and uncorrelated); The components are extracted in order of decreasing variance.,Jianwei Gou,Sli

5、de 5,A Simple Data Set,XY X11 Y11,XY X11.414 Y1.4142,Correlation matrix Covariance matrix,Jianwei Gou,Slide 6,General Patterns,The total variance is 3 (= 1 + 2) The two variables, X and Y, are perfectly correlated, with all points fall on the regression line. The spatial relationship among the 5 poi

6、nts can therefore be represented by a single dimension. PCA is a dimension-reduction technique. What would happen if we apply PCA to the data?,Jianwei Gou,Slide 7,Graphic PCA,Jianwei Gou,Slide 8,R Program,# Pricipal Components Analysis# entering raw data and extracting PCs# from the correlation matr

7、ix x=c(-1.264911064,-0.632455532,0,0.632455532,1.264911064) y=c(-1.788854382,-0.894427191,0,0.894427191,1.788854382) mydata=cbind(x,y)fit - princomp(mydata, cor=TRUE)summary(fit) # print variance accounted forloadings(fit) # pc loadingsplot(fit,type=lines) # scree plotfit$scores # the principal comp

8、onentsbiplot(fit),Jianwei Gou,Slide 9,Steps in a PCA,Have at least two variables Generate a correlation or variance-covariance matrix Obtain eigenvalues and eigenvectors (This is called an eigenvalue problem, and will be illustrated with a simple numerical example) Generate principal component (PC)

9、scores Plot the PC scores in the space with reduced dimensions All these can be automated by using R.,Jianwei Gou,Slide 10,Covariance or Correlation Matrix?,0,10,20,30,40,Abundance,Sp1,Sp2,Xuhua Xia,Slide 11,Covariance or Correlation Matrix?,Xuhua Xia,Slide 12,Covariance or Correlation Matrix?,Jianw

10、ei Gou,Slide 13,The Eigenvalue Problem,The covariance matrix. The Eigenvalue is the set of values that satisfy this condition. The resulting eigenvalues (There are n eigenvalues for n variables). The sum of eigenvalues is equal to the sum of variances in the covariance matrix.,Finding the eigenvalue

11、s and eigenvectors is called an eigenvalue problem (or a characteristic value problem).,Jianwei Gou,Slide 14,Get the Eigenvectors,An eigenvector is a vector (x) that satisfies the following condition:A x = x In our case A is a variance-covariance matrix of the order of 2, and a vector x is a vector

12、specified by x1 and x2.,Jianwei Gou,Slide 15,Get the Eigenvectors,We want to find an eigenvector of unit length, i.e., x12 + x22 = 1 We therefore have,From Previous Slide,The first eigenvector is one associated with the largest eigenvalue.,Solve x1,Jianwei Gou,Slide 16,Get the PC Scores,First PC sco

13、re,Second PC score,Original data (x and y),Eigenvectors,The original data in a two dimensional space is reduced to one dimension.,Jianwei Gou,Slide 17,What Are Principal Components?,Principal components are a new set of variables, which are linear combinations of the observed ones, with these proper

14、ties: Because of the decreasing variance property, much of the variance (information in the original set of p variables) tends to be concentrated in the first few PCs. This implies that we can drop the last few PCs without losing much information. PCA is therefore considered as a dimension-reduction

15、 technique. Because PCs are orthogonal, they can be used instead of the original variables in situations where having orthogonal variables is desirable (e.g., regression).,Jianwei Gou,Slide 18,Index of hidden variables,The ranking of Asian universities by the Asian Week HKU is ranked second in finan

16、cial resources, but seventh in academic research How did HKU get ranked third? Is there a more objective way of ranking? An illustrative example:,Jianwei Gou,Slide 19,A Simple Data Set,School 5 is clearly the best school School 1 is clearly the worst school,Jianwei Gou,Slide 20,Graphic PCA,-1.7889 -

17、0.8944 0 0.8944 1.7889,Jianwei Gou,Slide 21,Crime Data in 50 States,STATE MURDER RAPE ROBBE ASSAU BURGLA LARCEN AUTO ALABAMA 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7 ALASKA 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3 ARIZONA 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5 ARKANSAS 8.8 27.6 83.2 203.4 972.6 18

18、62.1 183.4 CALIFORNIA 11.5 49.4 287.0 358.0 2139.4 3499.8 663.5 COLORADO 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1 CONNECTICUT 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2 DELAWARE 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0 FLORIDA 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4 GEORGIA 11.7 31.1 140.5 256.5 135

19、1.1 2170.2 297.9 HAWAII 7.2 25.5 128.0 64.1 1911.5 3920.4 489.4 IDAHO 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6 ILLINOIS 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6 . . . . . . . . . . . . . . . . PROC PRINCOMP OUT=CRIMCOMP;,DATA CRIME; TITLE CRIME RATES PER 100,000 POP BY STATE; INPUT STATENAME $1-15 MU

20、RDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO; CARDS; Alabama 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7 Alaska 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3 Arizona 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5 Arkansas 8.8 27.6 83.2 203.4 972.6 1862.1 183.4 California 11.5 49.4 287.0 358.0 2139.4 3499.8 663

21、.5 Colorado 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1 Connecticut 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2 Delaware 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0 Florida 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4 Georgia 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9 Hawaii 7.2 25.5 128.0 64.1 1911.5 3920.4 489

22、.4 Idaho 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6 Illinois 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6 Indiana 7.4 26.5 123.2 153.5 1086.2 2498.7 377.4 Iowa 2.3 10.6 41.2 89.8 812.5 2685.1 219.9 Kansas 6.6 22.0 100.7 180.5 1270.4 2739.3 244.3 Kentucky 10.1 19.1 81.1 123.3 872.2 1662.1 245.4 Louisiana 15

23、.5 30.9 142.9 335.5 1165.5 2469.9 337.7 Maine 2.4 13.5 38.7 170.0 1253.1 2350.7 246.9 Maryland 8.0 34.8 292.1 358.9 1400.0 3177.7 428.5 Massachusetts 3.1 20.8 169.1 231.6 1532.2 2311.3 1140.1 Michigan 9.3 38.9 261.9 274.6 1522.7 3159.0 545.5 Minnesota 2.7 19.5 85.9 85.8 1134.7 2559.3 343.1 Mississip

24、pi 14.3 19.6 65.7 189.1 915.6 1239.9 144.4 Missouri 9.6 28.3 189.0 233.5 1318.3 2424.2 378.4 Montana 5.4 16.7 39.2 156.8 804.9 2773.2 309.2 Nebraska 3.9 18.1 64.7 112.7 760.0 2316.1 249.1 Nevada 15.8 49.1 323.1 355.0 2453.1 4212.6 559.2 New Hampshire 3.2 10.7 23.2 76.0 1041.7 2343.9 293.4 New Jersey

25、 5.6 21.0 180.4 185.1 1435.8 2774.5 511.5 New Mexico 8.8 39.1 109.6 343.4 1418.7 3008.6 259.5 New York 10.7 29.4 472.6 319.1 1728.0 2782.0 745.8,Slide 22,North Carolina 10.6 17.0 61.3 318.3 1154.1 2037.8 192.1 North Dakota 0.9 9.0 13.3 43.8 446.1 1843.0 144.7 Ohio 7.8 27.3 190.5 181.1 1216.0 2696.8

26、400.4 Oklahoma 8.6 29.2 73.8 205.0 1288.2 2228.1 326.8 Oregon 4.9 39.9 124.1 286.9 1636.4 3506.1 388.9 Pennsylvania 5.6 19.0 130.3 128.0 877.5 1624.1 333.2 Rhode Island 3.6 10.5 86.5 201.0 1489.5 2844.1 791.4 South Carolina 11.9 33.0 105.9 485.3 1613.6 2342.4 245.1 South Dakota 2.0 13.5 17.9 155.7 5

27、70.5 1704.4 147.5 Tennessee 10.1 29.7 145.8 203.9 1259.7 1776.5 314.0 Texas 13.3 33.8 152.4 208.2 1603.1 2988.7 397.6 Utah 3.5 20.3 68.8 147.3 1171.6 3004.6 334.5 Vermont 1.4 15.9 30.8 101.2 1348.2 2201.0 265.2 Virginia 9.0 23.3 92.1 165.7 986.2 2521.2 226.7 Washington 4.3 39.6 106.2 224.8 1605.6 33

28、86.9 360.3 West Virginia 6.0 13.2 42.2 90.9 597.4 1341.7 163.3 Wisconsin 2.8 12.9 52.2 63.7 846.9 2614.2 220.7 Wyoming 5.4 21.9 39.7 173.9 811.6 2772.2 282.0 ; PROC PRINCOMP out=crimcomp; run; PROC PRINT; ID STATENAME; VAR PRIN1 PRIN2 MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO; run; PROC GPLO

29、T; PLOT PRIN2*PRIN1=STATENAME; TITLE2 PLOT OF THE FIRST TWO PRINCIPAL COMPONENTS; run; PROC PRINCOMP data=CRIME COV OUT=crimcomp; run; PROC PRINT; ID STATENAME; VAR PRIN1 PRIN2 MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY AUTO; run;,/* Add to have a map view*/ proc sort data=crimcomp out=crimcomp; b

30、y STATENAME; run; proc sort data=maps.us2 out=mymap; by STATENAME; run; data both; merge mymap crimcomp; by STATENAME; run; proc gmap data=both; id _map_geometry_; choro PRIN1 PRIN2/levels=15; /* choro PRIN1/discrete; */ run;,Slide 23,Jianwei Gou,Slide 24,MURDER RAPE ROBBERY ASSAULT BURGLARY LARCENY

31、 AUTO MURDER 1.0000 0.6012 0.4837 0.6486 0.3858 0.1019 0.0688 RAPE 0.6012 1.0000 0.5919 0.7403 0.7121 0.6140 0.3489 ROBBERY 0.4837 0.5919 1.0000 0.5571 0.6372 0.4467 0.5907 ASSAULT 0.6486 0.7403 0.5571 1.0000 0.6229 0.4044 0.2758 BURGLARY 0.3858 0.7121 0.6372 0.6229 1.0000 0.7921 0.5580 LARCENY 0.10

32、19 0.6140 0.4467 0.4044 0.7921 1.0000 0.4442 AUTO 0.0688 0.3489 0.5907 0.2758 0.5580 0.4442 1.0000,Correlation Matrix,If variables are not correlated, there would be no point in doing PCA. The correlation matrix is symmetric, so we only need to inspect either the upper or lower triangular matrix.,Jianwei Gou,Slide 25,Eigenvalue Difference Proportion Cumulative PRIN1 4.11496 2.87624 0.587851 0.58785 PRIN2 1.23872 0.51291 0.176960 0.76481 PRIN3 0.72582 0.40938 0.103688 0.86850 PRIN4 0.31643 0.05846 0.045205 0.91370 PRIN5 0.25797 0.03593 0.036853 0.95056 PRIN6 0.22204 0.09798 0.

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

評(píng)論

0/150

提交評(píng)論