版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
1、High Performance Spark via Separation of Compute and Storage通過計算存儲分離架構(gòu)實現(xiàn)高性能彈性化的Spark部署高性能彈性化的Spark部署架構(gòu)MotivationSpark shuffle with disaggregated storageSplash shuffle managerA reference design with in-memory distributed file systemEvaluation resultsFuture work and conclusionTable of ContentsHow did
2、we design data application?Network bandwidth vs. disk throughputMove code rather than moving dataFast small memory vs. slow large diskOptimize sequential R/WBack To the Date of MapReduce1GbpsThe Trends of HW in DChttps:/c/dam/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-
3、c11-734328.pdfhttps:/blog/hdd-vs-ssd-in-data-centers/Enterprise Bytes Shipments: HDD and SSDDatacenter Bandwidth MigrationChanges happen to modern DC?Disaggregated storage and computationhigh-speed network between compute nodes and storage boxesTiered storage for hot and cold dataModern DC Architect
4、ure25100GbpsCompute nodesstorage boxesAcceleratorsReimaging the DC Memory and Storage HierarchyHDD/TAPESSDDRAMMemoryStorageHOTMCOLDImproving memory capacityImproving SSD performanceWAREfficient and scalable storageLow latency and high throughput, like DRAMLatency: 200 400nsBandwidth per DIMM:Read: U
5、p to 8GB/sWrite: Up to 3GB/sHigh density and non-volatility, like NANDUp to 6TB per serverMemory-speed storage systemEmbrace the New ArchitectureIntel OptaneTM DC Persistent MemoryHow to Use DCPMMRDMA/DPDKDCPMM per nodeDCPMM centered archMemVerge Elastic Spark SolutionRDDCaching and StorageShuffle D
6、ataEthernet SwitchData SourceA PMEM Centric Data PlatformMemVerge DMOCluster Shared Persistent MemoryMemVerge Spark AdaptorsNode 1DRAM PMEMNode 2DRAM PMEMNode 3DRAM PMEMNode 4DRAM PMEMNode NDRAM PMEMSpark IntegrationRDDCaching and StorageShuffle DataData SourceHadoop compatible storage APIsA new gen
7、eric shuffle managerSpark with additional RDD persist APIsMemVerge DMOSpark Shuffle with Disaggregated StorageBlock manager persists data to memory or disk in local nodes.Losing an executor means recomputing of the whole shuffle task.The storage and network implementation is coupled with the shuffle
8、 implementation.Shuffle & Block ManagerBlock ManagerMemory StoreDisk StoreLocal DiskCompute NodeSpark ExecutorShuffle ManagerPersist & Retrieve DataShuffle OutputPoor elasticityThe failure of node leads to shuffle data lostFurther leads to recomputeHeavy overhead to NodeManagerCoexisting with NM bri
9、ngs heavy overhead to NM for heavy workloadsUnsuitable to cloud environmentstorage/computation disaggregation architecture brings no advantages to local shuffleThe Spark community is also working on these problemsSPARK-25299 Use remote storage for persisting shuffle dataSPARK-26268 Decouple shuffle
10、data from Spark deploymentThe Problems of Current Shuffle Manager DesignA flexible shuffle managerSupports user-defined storage backend and network transport for shuffle dataOpen source/MemVerge/splashSpark JIRA:SPARK-25299MemVerge Splash Shuffle ManagerSplash Shuffle ManagerStorage System (NFS, loc
11、al FS, HDFS, S3, DMO )Write shuffleWorker 1Executor 1 Splash Storage PluginRead shuffleWorker 2Executor 2 Splash Storage PluginA new shuffle managerImplementing shuffle manager interfaceSeparating storage and computeExtracting storage and network implementations outside of shuffle manager itself int
12、o pluginsBenefitsShuffle becomes statelessStorage becomes easier to maintainEnables the use of 3rd party high performance networking and storageDistributed Memory Object (DMO) is a distributed file system built on PMEM.The storage plugin allows us to persist data into the DMO system, a separated sto
13、rage cluster.The use of PMEM and fast network technologies (RDMA or DPDK) in the storage cluster speeds up the shuffle.Persisting Shuffle Data to PMEMPersistent MemoryDMOSystemShuffle ManagerSplash Shuffle ManagerStorage PluginDMO PluginCommon4 compute nodes10GbE networkDriver memory 4gExecutor memo
14、ry 6gTotal cores 160Executor cores 4Spark 2.3.2Hadoop 2.7.4Benchmark ConfigurationsBaseline4 local 1TB HDDs/nodeVanilla shuffle managerTCP/IP based NettyOurs2 DMO nodes512GB PMEM/nodeSplash shuffle managerDPDK network9.26.52210Baseline116DMO with UDPDMO with DPDKTeraSort 400GB, 216G Shuffle WriteRed
15、uce Stage (min) Map Stage (min)TeraSort PerformanceIntel HiBench: /Intel-bigdata/HiBench1800160014001200100080060040020007846424a24b8023a23b251729119374501640Duration (s)Query IDTPC-DS 1.2TBBaseline DMOTPC-DS Performance on Some Shuffle Heavy Queriesspark-sql-perf: /databricks/spark-sql-perfData Siz
16、e Scaling120010008006004002000400GB800GB1200GBTPC-DS Query 8016001400120010008006004002000400GB800GB1200GBTPC-DS Query 4180016001400120010008006004002000400GB800GB1200GBTPC-DS Query 2305001000150020002500400GB800GB1200GBTPC-DS Query 24BaselineDMOTPC-DS Performance - All Queriesspark-sql-perf: /datab
17、ricks/spark-sql-perfSplash + DMO + RDMAValidate in production and cloud environmentsPerformance tuningIntegration with Spark on K8SFuture WorkSeparating compute and storage is beneficial for SparkPerformanceElasticityFault toleranceA reference design based on Splash shuffle managerNo Spark modification is neededStorage and network becom
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 五年級下冊英語期末考試卷及答案
- 無領(lǐng)導(dǎo)小組面試題目及答案
- 文學(xué)常識題庫及答案
- 2026年兒科先天性白內(nèi)障術(shù)后護(hù)理
- 部編版六年級下冊道德與法治第二單元-愛護(hù)地球-共同責(zé)任-5-應(yīng)對自然災(zāi)害測試卷含完整答案【歷年真題】
- 以享受為話題的作文
- 吊頂裝修施工技術(shù)要領(lǐng)
- 生殖健康考試題庫及答案
- 實驗試劑管理試題及答案
- 三相交流電試題及答案
- 2025亞馬遜云科技中國峰會:基于Amazon Lambda 的AI應(yīng)用創(chuàng)新 (Featuring Dify)
- 內(nèi)蒙古自治區(qū)滿洲里市2026屆中考聯(lián)考英語試題含答案
- 高三一??己蠹议L會課件
- 2022依愛消防E1-8402型消防控制室圖形顯示裝置安裝使用說明書
- 職業(yè)培訓(xùn)機(jī)構(gòu)五年發(fā)展策略
- 《小盒子大舞臺》參考課件
- 任捷臨床研究(基礎(chǔ)篇)
- DBJ41-T 263-2022 城市房屋建筑和市政基礎(chǔ)設(shè)施工程及道路揚(yáng)塵污染防治差異化評價標(biāo)準(zhǔn) 河南省工程建設(shè)標(biāo)準(zhǔn)(住建廳版)
- 水工鋼結(jié)構(gòu)平面鋼閘門設(shè)計計算書
- JJG 291-2018溶解氧測定儀
- 《抗體偶聯(lián)藥物》課件
評論
0/150
提交評論