01 并行與分布式系統(tǒng)特征與模型.ppt_第1頁
01 并行與分布式系統(tǒng)特征與模型.ppt_第2頁
01 并行與分布式系統(tǒng)特征與模型.ppt_第3頁
01 并行與分布式系統(tǒng)特征與模型.ppt_第4頁
01 并行與分布式系統(tǒng)特征與模型.ppt_第5頁
已閱讀5頁,還剩108頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

1、Parallel and Distributed Systems,Instructor: Zhang Weizhe (張偉哲) Computer Network and Information Security Technique Research Center , School of Computer Science and Technology, Harbin Institute of Technology,2,課程安排,Class hours: Weeks 7-14 Mon.,7-8 pm 正心44 Wed. ,5-6 pm 正心44 Office: 綜合樓 708 Informatio

2、n exchange: we will mainly use email Email: Contact times: after class or by appointment Transparencies: available after class ( *.pdf),3,實(shí)驗(yàn)安排,Class hours: Weeks 11,13,15 Wed. Fri. Sat. 3:40 7- 10 pm 網(wǎng)絡(luò)實(shí)驗(yàn)室 Content: 1)Intel多核編譯器及性能分析器 2)基于Windows Thread和OpenMP的多線程編程 3)基于MPI的并行程序設(shè)計(jì),4,考核安排,Exam hours:

3、Weeks 16 Tue. 3:45-5:45 pm 正心 31-33 Requirement: (1)Exam:60 (2)Experiments:30 (3)Attendance:10,5,教材,Designing and Building Parallel Programs 并行程序設(shè)計(jì) Ian Foster等著,人民郵電出版社 網(wǎng)址:/dbpp/,6,教材,Distributed Systems-Concepts and Designs 英文版 第4版 George Coulouris等著,機(jī)械工業(yè)出版社,7,參考書目,多核程序設(shè)計(jì)技

4、術(shù)通過軟件多線程提升性能 Multi-Core Programming-increasing performance through software multi-threading Shameem Akhter等著/李寶峰等譯, 電子工業(yè)出版社,2007 高性能計(jì)算并行編程技術(shù)MPI并行程序設(shè)計(jì) 都志輝等著,清華大學(xué)出版社,2001 網(wǎng)址:,8,參考書目,分布式系統(tǒng)概念與設(shè)計(jì) 第三版 George Coulouris等著/金蓓弘等譯, 機(jī)械工業(yè)出版社,2004 分布式系統(tǒng):原理與范型 Andrew S. Tanenbaum等著/楊建峰等譯,清華大學(xué)出版社,2004,9,課程基礎(chǔ)知識,高級程序

5、設(shè)計(jì) 計(jì)算機(jī)網(wǎng)絡(luò)(TCP/IP) 操作系統(tǒng)(UNIX和Windows) 密碼學(xué),10,本課程的目的,掌握并行與分布式系統(tǒng)的概念、模型和算法 掌握并行與分布式編程模型與程序設(shè)計(jì)技術(shù) 掌握分布式系統(tǒng)的設(shè)計(jì)思想和實(shí)現(xiàn)技術(shù) 熟悉經(jīng)典分布式系統(tǒng)案例 了解分布式系統(tǒng)最新研究進(jìn)展,11,課程的主要內(nèi)容,12,關(guān)于如何學(xué)習(xí),1、課堂知識 2、搜索引擎 3、論壇和專業(yè)站點(diǎn) 4、期刊論文(CNKI) 5、花足夠的時(shí)間研究案例實(shí)踐,Slides do not cover anything,Not a replacement to your own reading and own notes,Chapter 1: C

6、haracterization of Parallel and Distributed Systems,14,Outline,15,Origin of Distributed System,Internet,Mainframe,PC,Network-Centric,1950s 1980s 1980s,1990s 1990s Current,16,Development:二十年的不懈努力,WWW技術(shù) 搜索引擎技術(shù),80s,90s,00s,基于TCP/IP的IPC 網(wǎng)絡(luò)就是計(jì)算機(jī),分布式操作系統(tǒng)Amoeba,MACH,分布計(jì)算環(huán)境 OSF,DCE OMG/CORBA J2EE、.Net,網(wǎng)格計(jì)算

7、Globus OGSA Web Services WSRF,立項(xiàng)依據(jù)及科學(xué)問題,17,Examples: Multi-core Platform,Pentiumprocessor,Pentium processor era chips optimized for raw speed on single threads. Pipelined, out of order execution,Todays chips use cores which balance single threaded and multi-threaded performance,5-10 years: 10s-100s

8、of energy efficient, IA cores optimized for Multi-threading,Optimized for speed,Optimized for performance/watt,Cache,Cache,Shared Cache,Local Cache,Streamlined IA Core,18,Examples: Cluster Computing Systems,美洲豹,“美洲豹”超級計(jì)算機(jī)系統(tǒng)隸屬于美國能源部,坐落于美國橡樹嶺國家實(shí)驗(yàn)室。在本期排行榜上,它以每秒1.8千萬億次的運(yùn)算速度超越“走鵑”而名列榜首,運(yùn)算速度比“走鵑”快大約70%。臺民

9、用計(jì)算機(jī),將主要用于模擬氣候變化、能源產(chǎn)生以及其他基礎(chǔ)科學(xué)的研究。,星云,中國深圳的國家超算中心(NSCS)的高性能計(jì)算系統(tǒng)曙光TC3600“星云”超級計(jì)算機(jī)以1.271 PFlop/s的Linpack成績(每秒1271萬億次)成為全世界第二快的超級計(jì)算機(jī)。刀片服務(wù)器作為節(jié)點(diǎn),處理器是32nm工藝的六核至強(qiáng)X5650,并且采用了Nvidia Tesla C2050 GPU做協(xié)處理(理論計(jì)算峰值性能可以達(dá)到每秒2980萬億次),走鵲,世界上第一臺打破每秒千萬億次運(yùn)算速度的超級計(jì)算機(jī)。“走鵑”位于美國新墨西哥州的洛斯阿拉莫斯國家實(shí)驗(yàn)室,它也是一種IBM系統(tǒng)計(jì)算機(jī),每秒運(yùn)算速度可達(dá)1042萬億次。它

10、采用索尼“游戲站3”的九核Cell處理器和AMD雙核皓龍?zhí)幚砥?。因此,“走鵑”是全球第一臺采用Cell處理器的混合式超級計(jì)算機(jī)。“走鵑”系統(tǒng)主要用于對美國核武器進(jìn)行復(fù)雜而秘密的評估。,海妖,“海妖”超級計(jì)算機(jī)由美國田納西大學(xué)國家計(jì)算科學(xué)研究院所研制?!昂Q毕到y(tǒng)中擁有10萬個AMD雙核皓龍?zhí)幚砥?,運(yùn)算速度為每秒831萬億次,它主要用于一些高端服務(wù)器或工作站中?!昂Q币彩鞘澜缟嫌蓪W(xué)術(shù)機(jī)構(gòu)所擁有的運(yùn)算速度最快的計(jì)算機(jī)。,尤金,“尤金”是歐洲運(yùn)算速度最快的巨型計(jì)算機(jī),曾經(jīng)也名列全球排行榜第二名。它是由德國尤利希超級計(jì)算機(jī)中心所研制,采用的是IBM藍(lán)色基因/P型機(jī)設(shè)計(jì)方案,使用許多小型、低能耗的芯片

11、。該方案中,每一個獨(dú)立處理器的最大運(yùn)行速度為850兆赫,甚至比普通家用電腦的處理速度都還要慢。但是,“尤金”巨型機(jī)總共擁有292000個處理器芯片,如此多的芯片使得它的整體運(yùn)算速度高達(dá)每秒825萬億次。本圖拍攝于今年初,當(dāng)時(shí)科學(xué)家們正在對其進(jìn)行升級。,天河一號,“天河一號” 第七。它是中國首臺千萬億次超級計(jì)算機(jī)系統(tǒng),其系統(tǒng)峰值性能為每秒1206萬億次雙精度浮點(diǎn)運(yùn)算?!疤旌右惶枴笔怯商旖?yàn)I海新區(qū)和國防科技大學(xué)共同建設(shè)的國家超級計(jì)算機(jī)天津中心所研制。在“天河一號”中,共有6144個Intel處理器和5120個AMD圖像處理單元?!疤旌右惶枴睂V泛應(yīng)用于航天、勘探、氣象、金融等眾多領(lǐng)域,為國內(nèi)外提供

12、超級計(jì)算服務(wù)。,25,Examples:A typical Intranet,26,Examples: DNS systems,27,Examples: Transaction Processing Systems,28,Examples: Grid Computing Systems,大規(guī)模 總節(jié)點(diǎn)數(shù)超過1000,廣域性 31個省、區(qū)、市分中心 海外節(jié)點(diǎn),可控性 項(xiàng)目組擁有控制資源的能力,實(shí)驗(yàn)床,至美國,至德國,29,30,Examples: distributed multimedia system,31,Examples: P2P networks,32,Examples: Mobile

13、 and ubiquitous computing,33,Examples: Sensor Networks,34,Examples: The Internet,35,Outline,36,Definition of a Distributed System (1),A distributed system is: A collection of independent computers that appears to its users as a single coherent system. By: Tanenbaum,37,Definition of a Distributed Sys

14、tem (2),A distributed system is: A distributed system is one in which hardware or software components located at networked computers, communicate and coordinate their actions only by passing messages. By: Coulouris,38,Characteristics of Distributed System,Concurrency concurrent programs execution sh

15、are resource No global clock programs coordinate actions by exchanging messages Independent failures when some systems fail, others may not know By: Coulouris,39,Outline,40,Heterogeneity,Networks Ethernet, token ring, etc Computer hardware big endian / little endian (字節(jié)序) Operating systems different

16、 API of Unix and Windows Programming languages different representations for data structures Implementations from different developers no application standards,41,Middleware (中間件) applies to a software layer that provides a programming abstraction as well as masking the heterogeneity of the underlyi

17、ng networks, hardware, OSs and programming languages Mobile code is used to refer to code that can be sent from one computer to another and run at the destination,Masking Heterogeneity,42,Middleware,A distributed system organized as middleware.Note that the middleware layer extends over multiple mac

18、hines.,1.1,43,Openness,Openness of a computer system - is the characteristic that determines whether the system can be extended and re-implemented in various way. e.g. Unix Openness of distributed systems - is determined by the degree to which new resource sharing services can be added and be made a

19、vailable for use by a variety of client programs. e.g. Web How to deal with openness? - key interfaces are published, e.g. RFC,44,Security,Confidentiality (機(jī)密性) protection against disclosure to unauthorized individuals, e.g. ACL in Unix File System Integrity (完整性) protection against alteration or co

20、rruption, e.g. checksum Availability (可用性) protection against interference with the means to access the resources, e.g. Denial of service,45,Scalability,A system is described as scalable if will remain effective when there is a significant increase in the number of resources and the number of users

21、A scalable example system: the Internet design challenges The cost of physical resources, e.g., servers support users at most O(n) The performance loss, e.g., DNS no worse than O(logn) Prevent software resources running out, e.g., IP address Avoid performance bottlenecks, e.g., partitioning name tab

22、le of DNS, cache and replication,46,Failure handling,Detecting e.g. checksum for corrupted data Sometimes impossible so suspect, e.g. a remote crashed server in the Internet Masking e.g. Retransmit message, standby server Tolerating e.g. a web browser cannot contact a web server Recovery e.g. Roll b

23、ack Redundancy e.g. IP route, replicated name table of DNS,47,Concurrency,Correctness ensure the operations on shared resource correct in a concurrent environment e.g. records bids (拍賣) for an auction Performance Ensure the high performance of concurrent operations,48,Transparency,Access transparenc

24、y using identical operations to access local and remote resources, e.g. a graphical user interface with folders Location transparency resources to be accessed without knowledge of their location, e.g. URL Concurrency transparency several processed operate concurrently using shared resources without

25、interference with between them,49,Transparency continued,Replication transparency multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or application programmers, e.g. realcourse( Failure transparency users and applications to

26、complete their tasks despite the failure of hardware and software components, e.g., email,50,Transparency continued,Mobility transparency movement of resources and clients within a system without affecting the operation of users and programs, e.g., mobile phone Performance transparency allows the sy

27、stem to be reconfigured to improve performance as loads vary Scaling transparency allows the system and applications to expand in scale without change to the system structure or the application algorithms,51,Outline,52,Summary,Distributed systems are pervasive Resource sharing is the primary motivat

28、ion for constructing distributed systems Characterization of Distributed System Concurrency No global clock Independent failures,53,Summary continued,Challenges to construct distributed system Heterogeneity Openness Security Scalability Failure handling Concurrency Transparency,Chapter 2: System Mod

29、el of Distributed Systems,55,2.1 Hardware Classification of DS 2.2 Software Classification of DS 2.3 Architectural Models 2.4 Fundamental Models 2.5 Summary,Outline,56,How to Classify?,on memory access on interconnection on heterogeneity,57,Hardware Concepts,1.6,Different basic organizations and mem

30、ories in distributed computer systems,Multicomputers 多計(jì)算機(jī)系統(tǒng),Multiprocessors 多處理器系統(tǒng),58,How to Classify?,on memory access on interconnection on heterogeneity,59,Classification of Multiprocessors Based onInterconnection Network (1),a) A bus-based multiprocessor. Simple Bus is a broadcast medium Content

31、ion for access to bus (does not scale well) Complicates caches (need snoopy cache),1.7,60,Classification of Multiprocessors Based onInterconnection Network (2),b) A crossbar switch (交叉點(diǎn)交換) little contention for memory access multiple memories can be accessed in parallel Simple routing Number of cros

32、sbar switches grows quadratically,61,Classification of Multiprocessors Based onInterconnection Network (3),c) An omega switching network (omega 交換網(wǎng)絡(luò)) Reduced number of switches Increased communication delay (number of hops Increased contention for memory access Complex network,62,Classification of M

33、ulticomputers Based onInterconnection Network (1),LAN (local area network) bus based Mesh (網(wǎng)狀拓?fù)洌?Grid n2 - nodes arranged as an n x n grid easy to lay-out Maximum route proportional to n2 Most messages take multiple hops,63,Classification of Multicomputers Based onInterconnection Network (2),Hypercu

34、be (超立方體拓?fù)洌?a n-degree hypercube (n-cube) consists of 2n nodes arranged in an n-dimensional cube, where each node is connected to n other nodes Maximum route proportional to n Most messages take multiple hops,64,How to Classify?,on memory access on interconnection on heterogeneity,65,Classification

35、of Multiprocessors Based onHeterogeneity,66,Classification of Multicomputers Based onHeterogeneity(1),Dawning 4000A public: int value() return count; void incr () count = count + 1; void decr() count = count 1; ,72,Multiprocessor Operating Systems (2),A monitor to protect an integer against concurre

36、nt access, but blocking a process.,monitor Counter private: int count = 0; int blocked_procs = 0; condition unblocked; public: int value () return count; void incr () if (blocked_procs = 0) count = count + 1; else signal (unblocked); ,void decr() if (count =0) blocked_procs = blocked_procs + 1; wait

37、 (unblocked); blocked_procs = blocked_procs 1; else count = count 1; ,73,Multicomputer Operating Systems (1),1.14,General structure of a multicomputer operating system,74,Multicomputer Operating Systems (2),1.15,Alternatives for blocking and buffering in message passing.,75,Distributed Shared Memory

38、 Systems,Pages of address space distributed among four machines,76,Network Operating System,General structure of a network operating system.,77,Positioning Middleware,General structure of a distributed system as middleware.,1-22,78,Comparison between Systems,79,2.1 Hardware Classification of DS 2.2

39、Software Classification of DS 2.3 Architectural Models 2.4 Fundamental Models 2.5 Summary,Outline,80,What is a model?,Each model is intended to provide an abstract, simplified but consistent description of a relevant aspect of distributed system design,81,Architectural model,Architecture model defin

40、e the way in which the components of systems interact with one another define the way in which they are mapped onto the underlying network of computers Including Client-server model Peer process model Variations of the client-server model,82,Be historically the most important and remain the most wid

41、ely employed Servers may in turn be clients of other servers,Arch. 1: Client/Server,83,Partition service objects on different servers e.g. Web, CDAL Maintain replicated service objects on several hosts e.g. Sun NIS, realcourse,Arch. 2: Services provided by multiple servers,84,Cache a store of recent

42、ly used data objects that is closer than the objects themselves E.g., web page cache at web browser or web proxy server,Arch. 3: Proxy servers and caches,85,Arch. 4: Peer processes,All processes play similar roles Interacting cooperatively to perform a distributed activity Maintain consistency or sy

43、nchronize at application level Example: a distributed whiteboard application,crawler,86,Variations on the client-server model,Reasons of variation The use of mobile code and mobile agents Users need for low-cost computers with limited hardware resources The requirement to add and remove mobile devic

44、es in a convenient manner,87,For good interactive response, e.g. applet,Arch. 5: Mobile Code,88,A running program that travels from one computer to another in a network Carrying out a task on someones behalf, e.g. wormXerox PARC Mobile agents (like mobile code) are a potential security threat to the

45、 resources in computers that they visit.,Arch. 6: Mobile Agent,89,Download operating system and any application software from a remote file server Since all the application data and code is stored by a file server, users may migrate If a disk is included, it holds only a minimum of software. The rem

46、ainder is used as cache storage.,Arch. 6: Network Computer,90,A GUI on a computer that is local to the user while executing application programs on a remote computer Drawback : high latencies Implementation: X-11, VNCAT&T 1998,Arch. 7: Thin Client,91,Arch. 8: Spontaneous network,Integrate mobile dev

47、ices and other devices into a given network Key features Easy connection to a local network Easy integration with local services Key design issues Convenient connection and integration Limited connectivity mobile device move around continuously, disconnection Security and privacy Discovery Services

48、registration service, lookup service,92,2.1 Hardware Classification of DS 2.2 Software Classification of DS 2.3 Architectural Models 2.4 Fundamental Models 2.5 Summary,Outline,93,A system model should address,What are the main entities in the system? How do they interact? What are the characteristic

49、s that affect their individual and collective behavior?,94,Purpose of a fundamental model,Make explicit all the relevant assumptions about the system we are modelling Make generalisations concerning what is possible or impossible by logical analysis and mathematical proof,95,Fundamental models inten

50、d to discuss,Interaction model (交互模型) - The processes interact by passing messages, resulting in communication and coordination. Failure model(故障模型) - a fault occurs in computers or network Security model (安全模型) - nature of DSs and their openness,96,Two variants of the interaction model,Synchronous

51、(同步)distributed system The time to execute each step of a process has know lower and upper bounds Each message transmitted over a channel is received within a known bounded time Each process has a local clock whose drift rate from real time has a known bound,97,Two variants of the interaction model,

52、Asynchronous (異步)distributed system no bounds on Process execution speed e.g. each step may take an arbitrarily long time Message transmission delay e.g. a message may be received after an arbitrarily long time Clock drift rate the drift rate of a clock is arbitrary,98,Examples of Syn. DS and Asyn.

53、DS,Asynchronous DS Email FTP Synchronous DS VOD Voice Conference System,99,Failure model,Define the ways in which failure may occur in order to provide an understanding of the effects of failures TaxonomyHadzilacos and Toueg, 1994 Omission failures (遺漏故障) Arbitrary failures(隨機(jī)故障) Time failures (時(shí)序故障

54、),100,1. Omission failures,A process or communication channel fails to perform actions that it is supposed to do Process omission failure: Crash Fail-stop: Crash that can be detected by other processes certainly, e.g., by timeouts in synchronous DS Communication omission failures: dropping messages

55、Send omission, receive omission, channel omission Benign failures,101,2. Arbitrary (Byzantine) failures,The worst possible failure semantics Arbitrarily omit intended processing steps or take unintended processing steps. E.g. return a wrong value in response to an invocation Arbitrary failures in process is hard to be detected Arbitrary failures in communication channel exist but rare. E.g. checksum, sequence number,102,103,3.

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論