云計算交流.pptx_第1頁
云計算交流.pptx_第2頁
云計算交流.pptx_第3頁
云計算交流.pptx_第4頁
云計算交流.pptx_第5頁
已閱讀5頁,還剩55頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

1、,,Cloud Computing Challenges and Opportunities,Cheng-Zhong Xu Cloud and Internet Computing Lab Wayne State University Detroit, USA ,Wayne State University,Mother Nature in Tear,Norway Austfonna Ice Cap, Photo: Michael Nolan, July 16, 2009,C. Xu, Dec 2009,2,Aut

2、onomic Cloud Management,IT Liability,ICT infra accounts for 23% of global electricity usage and greenhouse gas About the emissions of airlines Google search generated 7g carbon emission; 200m search/day 70,000 cars CO2 emission IEA updated a warning in 5/2009 that ICT energy use could double by 2022

3、, and tripled by 2030 More than half due to server and DC energy and emissions; 40% goes to PC/monitors.,C. Xu, Dec 2009,Autonomic Cloud Management,3,Sustainability: we care, but ,Server Usage 1015% utilization on average PC Usage 70% in idle in office, 31% in idle at home; but only 4-6% in sleep st

4、ate (50w in idle vs. 24w in sleep by Energy Star for PC, as of 2009) Power management not widespread in IT (13% of business PCs) Very short life cycle (35 years) 70% hazardous waste,C. Xu, Dec 2009,Autonomic Cloud Management,4,Sustainability: Culture Issues,General-purpose computer in open arch is h

5、ardly sustainable by its nature. Reasons for replacing a desktop (consumers08) Too slow (50%), newer features (50%), upgrade/failure (30%), virus/spyware (14%) Install new apps Lib upgrade OS upgrade current appl failure/too slow Replace machine! Little chances for “good” enough computing Be “good”

6、enough for todays applications, but not for emerging appls. (vs TV, printer, camera, etc),C. Xu, Dec 2009,Autonomic Cloud Management,5,Sustainable Computing in Cloud,“Boundless infrastructure”, but you use what you need and you pay for what you use Virtualization on shared infra. leads to high utili

7、zation, and power conservation Server-based computing would create the culture of “good” enough computing,C. Xu, Dec 2009,Autonomic Cloud Management,6,Cloud computing makes IT cool,C. Xu, Dec 2009,7,Autonomic Cloud Management,Outline,Sustainable Computing in Cloud Cloud Computing and Challenges Auto

8、matic Cloud Management Reinforcement Learning for Auto-configuration of virtual machines in cloud Cloud Computing in Retrospect,C. Xu, Dec 2009,8,Autonomic Cloud Management,Cloud Computing,A form of Internet computing, aiming to offer IT capabilities over the Internet as an on-demand, per-per-use se

9、rvice Infrastructure-as-Service E.g. Amazons Web Services Platform as Service Google AppEng, Microsoft Azure Software as Service S,9,C. Xu, Dec 2009,Autonomic Cloud Management,Usage increase from 1525% to 3040%,Cloud Service: Current and Future,10,Cloud Service: Cross the Chasm,Rise from 1525% to 30

10、40%,11,C. Xu, Dec 2009,Key Enabling Technologies,Server/Datacenter Exponential growth of processing/storage capacity Industries race to build next-gen DCs DCs become the next computing platform High bw and pervasive connectivity 100 gigbit, 1 terabit networks Ulta wideband wireless mobile access,C.

11、Xu, Dec 2009,12,Autonomic Cloud Management,Key Enabling Technologies (cont),Server virtualization Consolidation of servers in DC Realize economics of scale Pay-as-you-go business model Service-oriented computing Low-touch, low commitment self-service Autonomic management Scaling up and down with loa

12、d on auto-pilot, self-configuration, self-* Quality of service (QoS) provisioning,C. Xu, Dec 2009,13,Autonomic Cloud Management,C. Xu, Dec 2009,14,QoS,Autonomic Cloud Management,Quality of Service,Performance Perf. isolation and differentiation Availability Server behavior in stress conditions? Resi

13、lient to failure Security Prevention like ID/AC is not enough for flash-crowd like attacks Admit good, block bad, and contain suspicious ones. How to contain?,C. Xu, Dec 2009,15,Autonomic Cloud Management,Challenges in Management,Systems are too large, too complex Multiple levels of abstractions and

14、 interactions between the components Too much data, less info Little actionable info Heuristic knowledge based diagnosis, not enough to deal with complex systems Need for online decision makes the problem even harder Some problems could take days/weeks to resolve,C. Xu, Dec 2009,16,Autonomic Cloud M

15、anagement,SLA Example: Amazon EC2 99.95% availability for the EC2 service on a yearly basis (4 hours and 23 min outage per year) Unavailability in a 5-minute period 10% credit, if SLA violation is proved Revenue Loss Per Down Hour Amazon outage on June 6, 2008 for 2 hrs; estimated a loss of $16,000/

16、minute and $2M in total eBay search engine down for 1.5 hrs on Aug 16, 08 Google Gmail down for 2+ hrs on Aug 11, 2008,Examples of the Challenge,17,C. Xu, Dec 2009,Autonomic Cloud Management,Worldwide Server/DC Spending,C. Xu, Dec 2009,Autonomic Cloud Management,18,Source: IDC, June 2009,Management

17、cost dominant factor in total COS,Autonomic Cloud Management for rapid deployment and management of cloud Adaptive to workload change, client requirements, resource supplies, system failures, power cap/energy budget, etc Machine Learning, Optimization and Control Auto-configuration of virtual machin

18、es Service quality assurance and adaptation Proactive failure management Coordinated power/perf management,ACM Wayne State,C. Xu, Dec 2009,19,Autonomic Cloud Management,Ph.D. Graduates/Post-Docs H. Shen (06, U. of Clemson), J. Wei (06, Yahoo! Technical), X. Zhong (07, Microsoft), S. Fu (08, New Mexi

19、co Tech.), B. Yu (07-10, GN R Auto-configuration; Failure repair,Challenge due to scale and real-time.,C. Xu, Dec 2009,40,Autonomic Cloud Management,Proactive Failure Management,Autonomic Cloud Management for rapid deployment and management of cloud Adaptive to workload change, client requirements,

20、resource supplies, system failures, power cap/energy budget, etc Machine Learning, Optimization and Control Reinforcement learning for VM autoonfiguration Machine learning to characterize the systems uncertainty and predict anomaly (e.g. overload, failure, violation of SLA, etc) Feedback control for

21、 assurance and adaption,ACM Wayne State,C. Xu, Dec 2009,42,Autonomic Cloud Management,MLOC for QoS,43,Control, machine learning, and stochastic optimization to deal with systems uncertainty, predict anomaly, and scale resource in a real-time manner. Multi-agent coordination for tradeoff,C. Xu, Dec 2

22、009,Autonomic Cloud Management,Autonomic Cloud Management,44,Model-Free, Self-Tuning Fuzzy Control,See TC05, Computer08 for stability analysis and other details,C. Xu, Dec 2009,C. Xu, Dec 2009,Autonomic Cloud Management,45,Transient Behavior on PlanetLab,on PlanetLab (World Cup Trace),on PlanetLab (

23、Surge),Statistical guarantee of the target time,C. Xu, Dec 2009,Autonomic Cloud Management,46,Robustness,Self-adaptive to load change,Self-adaptive to net condition,Dealing with Failures,Preventive measures Keep enhancing system components reliability Simplify systems design Disable components which

24、 are prone to failures BlueGene/L in LLNL disabled L1 cache in each node when jobs larger than a few hours were running Checkpoint-based recovery measure Frequent checkpointing is costly and conservative Proactive management to handle failures before they occur,C. Xu, Dec 2009,Autonomic Cloud Manage

25、ment,47,Proactive Failure Management,Key is failure prediction/detection Predict failure occurrences in the near future based on statistics of observed failures and dependence on performance states Opportunities Failure occurrences display uneven inter-arrival time They are correlated in time and sp

26、ace domains But, no simple and general model for failure dynamics on production systems,C. Xu, Dec 2009,48,Autonomic Cloud Management,Multi-time Scale Predictor,Autonomic Cloud Management,49,Multiscale spherical covar for time loc. Aggregate model for spatial locality,See SC07 and SRDS07 for details

27、,C. Xu, Dec 2009,50,C. Xu, Dec 2009,Failure Prediction on LANL Trace,20 failures of Node 1,Bayesian network model,Predict a failure for 3:50 am on 9/7/2004,A scheduler SW failure at 08:20am 9/7/2004,Autonomic Cloud Management,Online Failure Prediction,On-line prediction from 5/12/2006 to 4/2/2007 on

28、 the Wayne State Grid.,40 high-end compute servers in 3 clusters: ISC, CIT, CHM (16, 16, 8),C. Xu, Dec 2009,51,Autonomic Cloud Management,QoS toward Effective Cloud Comp,Performance Perf. isolation & differentiation Availability Graceful perf degradation in stress conditions Resilient to failure Sec

29、urity ABC admission policy: Admit good, Block bad, and Contain suspicious ones.,C. Xu, Dec 2009,Autonomic Cloud Management,52,Anomaly Detection,System Anomaly Detection,How to detect/predict system anomaly: overload, failure, SLO miss?,C. Xu, Dec 2009,54,Autonomic Cloud Management,Dynamics of a Mult

30、i-tier Website,Throughput is input traffic dependent Browsing has burden on DB server Ordering put extra work on App server Perf. bottleneck may shift between tiers,How to measure the system capacity?,C. Xu, Dec 2009,55,Autonomic Cloud Management,Limitations of Response Time,Threshold setting for admission control? Long dead time: RT wont be available until request is completed Response time provides little insight into constrained resources for online trouble shooting.,C. Xu, Dec 2009,56,Autonomic Cloud Management,Machine Learning for Anomaly De

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論