重構(gòu)基因大數(shù)據(jù)分析解決方案_第1頁
重構(gòu)基因大數(shù)據(jù)分析解決方案_第2頁
重構(gòu)基因大數(shù)據(jù)分析解決方案_第3頁
重構(gòu)基因大數(shù)據(jù)分析解決方案_第4頁
重構(gòu)基因大數(shù)據(jù)分析解決方案_第5頁
已閱讀5頁,還剩28頁未讀, 繼續(xù)免費(fèi)閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡(jiǎn)介

1、重構(gòu)基因大數(shù)據(jù)分析解決方案從自動(dòng)化到智能化隨著成本降低及分析流程的簡(jiǎn)化,基因測(cè)序分析技術(shù)正越來越多進(jìn)入 到生命科學(xué)研究的各個(gè)領(lǐng)域$100$1,000$10,000$100,000$1,000,000$10,000,000$100,000,00020012003200520072009201120132015201720COST PER HUMAN GENOMEMoores Law/27541954/dna-sequencing-costs-data/腫瘤診斷、分型藥物研發(fā),伴隨診斷群體遺傳研究個(gè)體化醫(yī)學(xué)常見慢性病罕見疾病感染性疾病農(nóng)業(yè)育種數(shù)字時(shí)代信息時(shí)代工業(yè)化時(shí)代人類基因組計(jì) 劃DNA的發(fā)現(xiàn)孟

2、德爾時(shí)代TECHNICAL & BUSINESS SUPPORTMARKETPLACEANALYTICSDEV/OPSMOBILESERVICESIoTAIENTERPRISEAPPSHYBRIDARCHITECTUREMIGRATIONAPP SERVICESINFRASTRUCTURECORE SERVICESSECURITY & COMPLIANCEMANAGEMENT TOOLS數(shù)字化時(shí)代工具和技術(shù)AWS提供的豐富服務(wù)將幫助創(chuàng)造新的商業(yè),產(chǎn)品,服務(wù)和體驗(yàn)CORE SERVICESIntegrated NetworkingRules EngineDevice ShadowsDevice

3、SDKsDevice GatewayRegistryLocal ComputeCustom Model Training & HostingConversational ChatbotsVirtual DesktopsApp StreamingSchema ConversionImage & Scene RecognitionSharing & CollaborationExabyte-Scale Data MigrationText to SpeechCorporate EmailApplication MigrationDatabase MigrationRegionsAvailabili

4、ty ZonesPoints of PresenceBusiness IntelligenceData WarehousingElasticsearchHadoop/SparkData PipelinesStreaming Data CollectionETLStreaming Data AnalysisInteractive SQL QueriesWorkflowQueuing & NotificationsEmailTranscodingDeep Learning (Apache MXNet, TensorFlow, & others)Server MigrationCommunicati

5、onsMARKETPLACEBusiness AppsBusiness IntelligenceDevOps ToolsSecurityNetworkingDatabasesStorageAPI GatewaySingle Integrated ConsoleIdentitySyncMobile AnalyticsMobile App TestingTargeted Push NotificationsOne-click App DeploymentDevOps Resource ManagementApplication Lifecycle ManagementContainersTrigg

6、ersResource TemplatesBuild & TestAnalyze & DebugIdentity ManagementKey Management & StorageMonitoring & LogsConfiguration ComplianceWeb Application FirewallAssessment & ReportingResource & Usage AuditingAccess ControlAccount GroupingDDOSProtectionTECHNICAL & BUSINESS SUPPORTSupportProfessional Servi

7、cesOptimization GuidancePartner EcosystemTraining & CertificationSolutions ManagementAccount ManagementSecurity & Billing ReportsPersonalized DashboardMonitoringManage ResourcesData IntegrationIntegrated Identity & AccessIntegrated Resource & Deployment ManagementIntegrated Devices & Edge SystemsRes

8、ource TemplatesConfiguration TrackingServer ManagementService CatalogueSearchMIGRATIONHYBRID ARCHITECTUREENTERPRISE APPSMACHINE LEARNINGIoTMOBILE SERVICESDEV OPSANALYTICSAPP SERVICESINFRASTRUCTURESECURITY & COMPLIANCEMANAGEMENT TOOLSComputeVMs, Auto-scaling, Load Balancing, Containers, Virtual Priva

9、te Servers, Batch Computing, Cloud Functions, Elastic GPUs, Edge ComputingStorageObject, Blocks, File, Archivals, Import/Export, Exabyte-scale data transferCDNDatabasesRelational, NoSQL, Caching, Migration, PostgreSQL compatibleNetworkingVPC, DX, DNSFacial Recognition & AnalysisFacial SearchPatching

10、Contact Center信息化時(shí)代構(gòu)建復(fù)雜工作流程Hours基因數(shù)據(jù)分析工作流概要tool-1tool-2tool-4tool-atool-3tool-b原始數(shù)據(jù)atgatct gatcgat ctga處理結(jié)果0100100101010010102Mins.3計(jì)算密集型 內(nèi)存密集型Files inFiles out基因大數(shù)據(jù)分析中面臨的挑戰(zhàn)海量,多元化, 分散數(shù)據(jù)可擴(kuò)展計(jì)算多種基因數(shù)據(jù)類型(全基因組, 外顯子, 目標(biāo)區(qū)域,RNA,)單個(gè)樣本基因數(shù)據(jù)大小,從10+GB,到100+GB。測(cè)序技術(shù)進(jìn)步使得更多測(cè)序被完成從前端實(shí)驗(yàn)室到后臺(tái)數(shù)據(jù)中心的傳輸存儲(chǔ)基因分析涉及多個(gè)步驟和工具針對(duì)不同基因數(shù)據(jù)

11、類型和工具,有不同計(jì)算 資源需求On- PremisesLift & Shift典型的 TCO 比較可使用的基因分析工具開源工具AWS中國區(qū)已支持30+款常用基因分析工具自動(dòng)部署.Sfn/Cromwell/Nextflow/AirflowGATK,BLAST,GROMACS, AMBEROMOP Common Data ModelObservational Health Data Scienceand Informatics (OHDSI) toolsProject REDCapHail, Bulter for genomicsRStudio, Rshiny, Jupyter,DeepVari

12、ant,DLNexNVIDIA Clara GenomicsHPC orchestrators, ParallelCluster, Slumer, SGE, HTCondor,ML frameworks like TensorFlow, MXNet, etc.商業(yè)版工具在AWS Marketplace里可以找到諸多商業(yè)版工具.Illumina DRAGENSentieonSASTableauNVIDIA ParabricsSchodingerHortonworksCorda Enterprise BlockchainInformaticaCloudyClusterAlces FlightMat

13、hWorks MATLABTeradataAnd more 自研工具定制化分析流程.Create automation for any of your existing tools.Automate the application of your organizations security best practices.On- PremisesLift & ShiftInstance Right- SizingImproved ElasticityMeasure, Monitor, ImproveOptimized EC2典型的 TCO 比較利用AWS FPGA實(shí)例的Illumina Dra

14、gon2017 年費(fèi)城兒童醫(yī)院(CHOP)使用DRAGEN平臺(tái),在2小時(shí)25分鐘內(nèi)完成1000例兒 科基因組數(shù)據(jù)分析,被授予吉尼斯世界紀(jì)錄。一鍵式部署的彈性基因分析HPC集群Amazon S3DynamoDBAmazon SQSCloudWatchInternet Gateway (IGW)私有子網(wǎng) / 安全組 / 置放群組CloudFormation公有子網(wǎng)VPC NATGateway客戶端堡壘機(jī)互聯(lián)網(wǎng)計(jì)算節(jié)點(diǎn)1 ComputeNode1 (pbs_mom)計(jì)算節(jié)點(diǎn)2計(jì)算節(jié)點(diǎn)群集Computenode2 Computenodes (pbs_mom)(pbs_mom)自動(dòng)擴(kuò)展組Private

15、subnet掛載 NFS主節(jié)點(diǎn)NFS 文件共享服務(wù)本文使用Masternode直接掛載共享Masternode(/public/)(pbs_server)掛載 NFS互聯(lián)網(wǎng)訪問外部客戶端訪問SSH 通信NFS 文件共享圖例AWS CloudVPCPublic subnet借助AWS深度定制的集群工具, 星亢原的工程師能夠在分鐘級(jí)啟 動(dòng)HPC環(huán)境(CPU/GPU),并且按照資源需求自動(dòng)地進(jìn)行彈性伸 縮,在降低成本的同時(shí)將分析任務(wù)周期縮短30%??梢暬幣帕鞒谭治龉ぞ逳extflow借助AWS的服務(wù),金匙基因建立了一個(gè)新的HPC分析平臺(tái),該平臺(tái) 可將分析速度提高一倍,同時(shí)降低運(yùn)營和維護(hù)成本On-L

16、ift & ShiftInstanceImprovedMeasure,OptimizedStoragePremisesRight-ElasticityMonitor,EC2OptimizationSizingImprove典型的 TCO 比較Amazon S3 分級(jí)存儲(chǔ) 與 自動(dòng)生命周期管理頻繁不頻繁活動(dòng),頻繁訪問毫秒級(jí)訪問 3 AZ起算: ¥0.147/GB訪問頻率可變毫秒級(jí)訪問 3 AZ起算: ¥0.1470 至¥0.0875/GB逐對(duì)象監(jiān)控計(jì)費(fèi).最小存儲(chǔ)期限不常訪問毫秒級(jí)訪問 3 AZ起算: ¥0.0875/GB數(shù)據(jù)獲取按GB計(jì)費(fèi)最小存儲(chǔ)時(shí)長最小對(duì)象大小S3 StandardS3 Sta

17、ndard-IA訪問頻率S3 Glacier歸檔數(shù)據(jù)恢復(fù)在線延時(shí)3-5小 時(shí) 3 AZ起算: ¥0.028/GB數(shù)據(jù)獲取按GB計(jì)費(fèi)最小存儲(chǔ)時(shí)長最小對(duì)象大小S3 Intelligent- TieringS3 Glacier Deep Archive歸檔數(shù)據(jù)恢復(fù)在線延時(shí)10+小時(shí) 3 AZ起算: ¥0.007/GB數(shù)據(jù)獲取按GB計(jì)費(fèi)最小存儲(chǔ)時(shí)長最小對(duì)象大小通過生命周期管理優(yōu)化基因數(shù)據(jù)存儲(chǔ)EBSS3 / EFS /FSxS3 - IAGlacierGlacier Deep ArchiveBCLFASTQCRAM/BAMVCF“”genomicsMacrogen 利用Amazon S3 Glacier

18、實(shí)現(xiàn)更安全、更可靠的標(biāo)準(zhǔn)化全球 大規(guī)模數(shù)據(jù)管理通過自動(dòng)化分布式的存儲(chǔ),提 升了數(shù)據(jù)穩(wěn)定性和持久性滿足安全合規(guī)要求建立了標(biāo)準(zhǔn)化的全球備份系, 相比自建本地?cái)?shù)據(jù)中心備份成 本降低了35%解決方案用戶受益Sukang Lee, Chief Operating Officer, Macrogen, Inc.Company: Industry: Country:Macrogen, Inc. Life SciencesRepublic of KoreaEmployees: 550+ globally Website: 關(guān)于 MacrogenMacrogen 是一家在韓國KOSDAQ上 市的生物創(chuàng)新企業(yè),是

19、全球領(lǐng)先的精 準(zhǔn)醫(yī)學(xué)及生物技術(shù)公司。Macrogen 通過其在韓國最大測(cè)序中 心和數(shù)據(jù)基礎(chǔ)設(shè)施,為全球153個(gè)國 家上萬的客戶提供基因數(shù)據(jù)服務(wù)。Macrogen manages 15+PB of data and generates massive genomic data for further analysis across global sites every day. Using Amazon S3 Glacier, Macrogen is able to manage big data in a more secure, reliable, cost-effective, andst

20、andardized manner globally.挑戰(zhàn)需要安全、可靠以及經(jīng)濟(jì)的方式用于每天備份大量基因數(shù)據(jù)滿足安全合規(guī)要求,例如歐盟GDPR 以及韓國的ISMS需要標(biāo)準(zhǔn)化的備份系統(tǒng)服務(wù)于全 球各分子公司Amazon S3 Glacier 提供了 可靠,便宜的大規(guī)模數(shù)據(jù)備 份方案全球AWS區(qū)域,提供了標(biāo)準(zhǔn) 化,安全的數(shù)據(jù)安全備份方 案”“On- PremisesLift & ShiftInstance Right- SizingImproved ElasticityMeasure, Monitor, ImproveOptimized EC2StorageServerless Optimiza

21、tion Architecture典型的 TCO 比較借助 Lambda 和事件觸發(fā)機(jī)制構(gòu)建自動(dòng)化報(bào)告交付在云上推理結(jié)果傳輸至S3對(duì)象存儲(chǔ)桶自動(dòng)觸發(fā)無服務(wù)器lambda生成 報(bào)告,交付給終端用戶AI InferenceInstanceAmazon RDSAmazon Elastic Block StoreLabel analysisAI TrainingP3 instanceLoad data to NFSS3LifecycleAmazon S3 GlacierUpload dataInstanceNFSMONITORHospital ADoctorMONITORHospital BDocto

22、rModel FilesInstanceInstanceOutput ModelUpload dataUpload dataLambdaReferenceResultReportS3Reference Request“”genomics“”CSIRO 利用AWS Lambda 服務(wù)講將數(shù)天的分 析縮減到分鐘級(jí)別解決方案2018年CSIRO開發(fā)并發(fā)布了Gen-Phen- Insight工具,用于支持臨床決策?;?Amazon API Gateway,AWS Lambda,和 Amazon SNS搭建的無服務(wù)器架構(gòu),Gen- Phen-Insight允許用戶在輸入基因名稱或坐 標(biāo)后快速返回可能的

23、目標(biāo)位點(diǎn)。同時(shí)利用 AWS X-Ray CSIRO能夠更好的監(jiān)控軟件的 運(yùn)行狀態(tài)。挑戰(zhàn)CSIRO需要處理大量不可預(yù)期 的分析負(fù)載需求,他們通過持 續(xù)運(yùn)行的在線網(wǎng)頁服務(wù),來處 理大量實(shí)時(shí)需求。用戶受益降低了 80%運(yùn)行時(shí)間處理大量請(qǐng)求的時(shí)間由數(shù)天降低至 數(shù)分鐘節(jié)省了硬件及架構(gòu)管理成本Saved in hard costs and infrastructure management快速完成交付,測(cè)試,原型組件Company: CSIROIndustry: Country: Website:Scientific & Industrial Research Australiawww.csiro.au-

24、 Denis Bauer, Head of Cloud Computing and Bioinformatics at CSIRO(We) reduced the runtime from days to minutes. The cost of traditional approaches would have been prohibitive if we wanted to persist appropriately- sized compute resources.關(guān)于 CSIROCSIRO (Commonwealth Scientific and Industrial Research

25、 Organisation) 位于澳 大利亞的首都堪培拉,是一個(gè)支持臨床和 科學(xué)研究的基因分析機(jī)構(gòu)。On- PremisesLift & ShiftInstanceRight-ElasticityMonitor,EC2Optimization ArchitectureServicesOptimizedSizingImproveImprovedMeasure,OptimizedStorageServerlessManagedTrue AWS典型的 TCO 比較數(shù)據(jù)正在改變醫(yī)療生命科學(xué)行業(yè)的創(chuàng)新模式市場(chǎng)數(shù) 據(jù)個(gè)人健康環(huán)境數(shù)據(jù)臨床信息基因組學(xué)對(duì)數(shù)據(jù)的需求將會(huì)愈發(fā)強(qiáng)烈分析人工智能機(jī)器學(xué)習(xí)哪些數(shù)據(jù)可訪問

26、到?公開數(shù)據(jù)集AWS hosts a variety of public datasets that anyone can access for free. Below are just a few examples.1000 Genomes ProjectThe Cancer Genome AtlasInternational Cancer Genome Consortium3000 Rice GenomeGenome in a Bottle (GIAB)The Genome Modeling SystemMedicare Drug SpendingThe Human Connectome

27、ProjectThe Human Microbiome ProjectOpenNeuroPhysionetTabula murisOpenStreetMapsand more.本地?cái)?shù)據(jù)Access your existing data.Electronic health recordsMedical imaging (PACS/DiCOM/VNA)LabsGenomicPatient monitoringFinancialSupply chain共享數(shù)據(jù)Privately give and receive access to data.Easily give and receive acces

28、s to data on AWS from other researchersCreate shared data lakes with other institutionsAccess commercial data setsAWS 公開數(shù)據(jù)集生命科學(xué)International Neuroimaging Data-Sharing Initiative (INDI)Fly Brain Anatomy: FlyLight Gen1 and Split-GAL4 ImageryeBird Status and Trends Model ResultsOpen NeuroDataNYU Langon

29、e & FAIR FastMRI Dataset3000 Rice Genomes ProjectEncyclopedia of DNA Elements (ENCODE)Allen Mouse Brain AtlasAfrica Soil Information Service (AfSIS) Soil ChemistryMIMIC-III (Medical Information Mart for Intensive Care)Allen Brain Observatory - Visual Coding AWS Public Data Set基因相關(guān)腫瘤COVID19生命科學(xué)基因相關(guān)NI

30、H NCBI Sequence Research Archive (SRA) on AWS1000 GenomesThe Genome Modeling SystemGATK Test DataGenome ArkBasic Local Alignment Sequences Tool (BLAST) DatabasesThe Human Microbiome ProjectVariant Effect Predictor (VEP) and the Loss-Of-Function Transcript Effect Estimator (LOFTEE) PluginRefgenie ref

31、erence genome assetsICGC on AWSCloud Indexes for Genomic Analyses腫瘤COVID19AWS 公開數(shù)據(jù)集生命科學(xué)基因相關(guān)腫瘤The Cancer Genome AtlasTherapeutically Applicable Research to Generate Effective Treatments (TARGET)Cancer Cell Line Encyclopedia (CCLE)Clinical Proteomic Tumor Analysis Consortium 2 (CPTAC-2)Clinical Proteo

32、mic Tumor Analysis Consortium 3 (CPTAC-3)CoMMpass from the Multiple Myeloma Research FoundationBeat Acute Myeloid Leukemia (AML) 1.0Clinical Trial Sequencing Project - Diffuse Large B-Cell LymphomaFoundation Medicine Adult Cancer Clinical Dataset (FM-AD)Variant Effect Predictor (VEP) and the Loss-Of

33、-Function Transcript Effect Estimator (LOFTEE) PluginCancer Genome Characterization Initiatives - Burkitt Lymphoma Genome Sequencing ProjectNational Cancer Institute Center for Cancer Research - Diffuse Large B Cell Lymphoma (DLBCL) Genomics and ExpressionGabriella Miller Kids First Pediatric Resear

34、ch Program (Kids First)ICGC on AWSPancreatic Cancer Organoid ProfilingHuman Cancer Models Initiative (HCMI) Cancer Model Development CenterOregon Health & Science University Chronic Neutrophilic Leukemia DatasetCancer Genome Characterization Initiatives - Burkitt Lymphoma, HIV+ Cervical CancerCOVID1

35、9AWS 公開數(shù)據(jù)集生命科學(xué)基因相關(guān)腫瘤COVID19COVID-19 Molecular Structure and Therapeutics HubCOVID-19 Genome Sequence DatasetOzone Monitoring Instrument (OMI) / Aura NO2 Tropospheric Column DensityCOVID-19 Harmonized DataCOVID-19 Data LakeCOVID-19 Open Research Dataset (CORD-19)AWS 公開數(shù)據(jù)集Complying with virtually ever

36、y regulatory agency符合醫(yī)療生命科學(xué)合規(guī)要求CSACloud Security Alliance ControlsISO 9001Global Quality StandardISO 27001Security ManagementControlsISO 27017Cloud Specific ControlsISO 27018Personal Data ProtectionPCI DSS Level 1 Payment Card StandardsSOC 1Audit Controls ReportSOC 2Security, Availability, & Confidentiality ReportSOC 3General Controls ReportGlobalUnited StatesCJISCriminal Justice Information ServicesDoD SRGDoD Data ProcessingFedRAMPGovernment Data StandardsFERPA

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。

最新文檔

評(píng)論

0/150

提交評(píng)論