2025阿里云開源大數據Workshop·杭州站_第1頁
2025阿里云開源大數據Workshop·杭州站_第2頁
2025阿里云開源大數據Workshop·杭州站_第3頁
2025阿里云開源大數據Workshop·杭州站_第4頁
2025阿里云開源大數據Workshop·杭州站_第5頁
已閱讀5頁,還剩134頁未讀 繼續(xù)免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

OPENING李鈺

(絕頂)ASF

Member,Apache

Celeborn/Flink/HBase/Paimon

PMC

Member阿里云智能

EMR

負責人Data

TrendsAIGCfurther

promotestheexplosion

of

big

data

DataVolume:AIfurtherdrivesmassivedata

explosion,

far

exceeding

the

data

growth

of

the

previous

era

Data

Diversity:

Multimodaldata

processingwill

becomeastandardforfuture

data

processing,

including

storage,

computation,andmanagement

DataGovernance:Onedataservingdifferent

roles,

including

Data

Engineer/

Data

Analysts

/

Data

Scientists

/

AI

EngineersAnalytic

Data46%PicturesAI

Models

1%Others43%Vedio5%5%Data

WarehouseReportsDatawarehousesETLApplicationsData

LakehousestreamingAnalyticsstructured,

semi

structured

andunstructured

DataData

LakeRealtimeAnalyticsData

Explore

ETL

Data

warehousesData

Lakestructured,

semi

structuredandunstructuredDataThe

EvolutionofDataArchitectureMachineLearningMachineLearningDatascienceDatascienceReportsDatabaseData

WarehouseReportsData

warehousesETLApplicationsStrengthsWeaknessesExcellent

performance·

Data

Format

isnot

openout-of-box,

Easy

to

use·Lack

ofsupport

for

Non/semi

structureFriendly

toData

AnalystsDataAll

Data

notimmediatelyrequiredwill

be

discardedApplication

DataWarehouseTheData

warehouse

ArchitectureETL

PipelineDatabaseDatabaseData

LakeRealtimeAnalyticsData

Explore

ETL

Data

warehousesData

Lakestructured,

semi

structured

andunstructured

DataStrengths

unifiedstoragewith

lowcost·performance

isnotasgood

asDW

openDataand

Meta

FormatDataGovernance

is

notmature Fits

Both

BI

an

d

AI

Hard

to

construct

and

operateAnalyze

See

ResultsData

LakeThe

DataLake

ArchitectureIterateELT

ModelMachineLearningWeaknessesApplicationDatascienceDatabaseReportsstoreAlIData

LakeData

ExploreData

LakeData

LakehousestreamingAnalyticsstructured,

semi

structured

and

unstructured

DataData

WarehouseReportsDatawarehousesETLApplicationsDatabaseData

Lake

+Data

warehouse

=Data

Lake

houseMachineLearningDatascienceDevOpsComputingEnginesManagement

Services

Apache

Gra

vit

inoDataStorageAliba

baC

lo

udOSSGovernance

ServicesData

Formats

Apache

paimoncom

pos

ableopensourceLake

housesolution

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)Data

Lake

FormationTieredStorage

CompactionRealtime

ComputeMaxCompute

HologresE-MapReduceDataworks

IDE

Copilot

open

Lake

TheLake

house

solution

onAli

babacloudApplicationIngestionWorkflowDataGovernanceData

QualityLakeAuthenticationOpenAuthorizationLineageMetaStoreDatabaseBUILDOPENSOURCECOMPATIBLE

LAKEHOUSEONALIBABACLOUD李鈺

(絕頂)ASF

Member,Apache

Celeborn/Flink/HBase/Paimon

PMC

Member阿里云智能

EMR

負責人F

lin

kSQ

LS

tre

a

m

in

g

&

B

a

tchQ

u

erie

sPaimon

Paimon

PaimonLake

houseprocessingpipelinebin

logRD

BM

SLogsHologresF

lin

kSQ

L

LakeGovernanceD

ata

S

erving

System

sA

D

SO

D

S

D

WD

D

WS

Lake

Format

Lake

StorageF

lin

kSQ

LS

tre

a

m

in

g

&

B

a

tchF

lin

kSQ

LS

tre

a

m

in

g

&

B

a

tch

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)Data

Lake

FormationMetaStoreLineageAuthenticationAuthorization

TieredStorageE圖

CompactionRealtime

ComputeMaxCompute

Hologresssa$storrocksE-MapReduceDataworks

IDE

Copilot

DataGovernanceDataQualityRecap

TheLake

house

solution

onAli

babacloudApplication

Ingestion

Open

LakeWorkflowDatabaseResilient?Enterprise

remote

shuffle

service(RSS)solutionto

support

better

elasticity?

On-demandandseamless

rescaling?Native

integration

with

DLF

and

OSSEasyto

Use?

One-stopdataengineering

support?

Visualized

jobandworkflow

monitor?

Convenient

resourceandsession

managementFlexible?Rich

Open

API

supplied

forintegration?100%compatible

with

open

sourceusage,

bothAPIand

binaryaspect?Rich

ecology

supportedFast?Native

Engine

supported,

3X

fasterthanopen

source

Spark?Enhanced

RSS

supplies

1.5Xthroughputfor

IO-intensiveappsServerlessSparkTransforms

Data

ManagementwithOne-Stop,

Fully

ManagedServicesfor

Seamless

Development,

Scheduling,

and

Maintenance.100%CompatiblewithOpen-sourceSpark,

3X

Fasterwith

Fusion,an

Enterprise

Native

Engine.EMR

server

less

sparkApp

ScenarioControl

PlaneRemote

ShuffleSpark

Native

EngineCompute

PlaneData

IOStorage

LayerLake

FormatsObjectStorage

ServiceEnterpriseCache

ServiceSecurityandAuth(DLF)EnterpriseRemoteShuffle

Serviceproduct

ArchitectureDashboard

ReportOperationalAnalyticsData

DiscoveryMeta

ServiceData

EngineerSchedulingIntelligent

MaintenanceVersionControlAccountingData

ScienceConnection

ManagementResource

Usage

MonitoringSession

Management

(Resourcefor

Interactive

Query)Queue

Management

(Resourcefor

ETL)controlplanework

space

AdministrationVersion

ControlSQL

EditorArtifacts

ManagementCatalog

Viewcontrolplane

DataEngineeringIntelligent

DiagnoseJob

ListLogsMetricscontrolplane

Job

Monitor

and

DiagnoseWorkflow

ListWorkflow

Instance

Monitor

GlobalViewCanvas

EditorWorkflow

Instance

Monitor

Single

ExecutionViewcontrolplane

work

flow

Managementx86

(Intel/AMD)andARMsupportHardware

awareness

optimization?SVE

SIMD

acceleration?

zstd-ptg

compression

accelerationNative

C++Integration?

OSS-HDFSSupport?Deep

Parquet

and

ORC

integration?

Paimon

、Delta

Lake

andIcebergsupportVectorized

Execution

Engine?

Native

Operator?

SIMDJson

OptimizationFastColumnarShuffle?EnterpriseRSS

basedon

ApacheCeleborn?

Datashuffle

reduced

upto

40%computeplane

FusionEngineFusion

isanenterprise

nativeenginewhich

is3X

Fasterthan

the

open

source

Spark

Java

engineTesting

Environment?

6d3s.

16xlargeECSserver?

Alibaba

Cloud

Linux

3?OpenJDK

1.8.0?ApacheTop

Level

Project,donated

byAlibaba

Cloud?De-facto

RSS

choice,

used

byAlibaba,

LinkedIn,

etc.Multi-?Enterprisesecurity

assurancewith

data

encryptionTenancy?Enhanced

IOscheduling,flow

controland

quota

management?Widelyadopted

inAlibaba,

used

by

bothSpark

and

Flink?Successfullysupportsjobwith600TB+shuffle

data?69%

Performance

boostthanYARN

externalshuffle?Performance

gain

increaseswithshuffle

data

scaleFunctionalit?

SupportsSpark

DRAy?

SupportsSparkAQE?8d2s.10xlarge

ECSservers?

AlibabaCloud

Linux

3?OpenJDK

1.8.0?Spark

3.3.1?Shuffle

Partition

=8000computeplan

EnterpriseRemote

shuffle

serviceRSS

removesthedependencyon

localdiskfor

shuffle

data

and

enables

100%

disaggregation

of

compute

and

storageScalabilityPerformanceTestEnvironmentOpen

SourceWorkflow

IntegrationOpenAPI?

Workspace?

Job

Runs?SQL

Editor?

WorkflowsTools?

Spark-submitCompatibleJob

Submission?

Notebook?Git

integration(Planning)Alibaba

Cloud

Product

IntegrationOSS-HDFSMaxCompute

DLF

DataWorksopenAPI

and

EcosystemFunctionDatabricksEMRServerlessSparkNative

EngineYESYESSQL

EditorYESYESWorkflow

ManagementYESYESDebuggingand

MonitorYESYESIntelligent

DiagnoseNOYESCatalogandAuthenticationYESYESData

&

FSYES(DBFS)YES(OSS-HDFS)AuditingYESYESNotebookYESYESCI/CDwith

GitYESNOAssistant/CopilotYESNOML&

Vector

ServingYESNOEMR

server

less

spark

vs.Data

bricks

Function-wise

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)Data

Lake

FormationMetaStoreLineageAuthenticationAuthorization

TieredStorageE圖

CompactionRealtime

ComputeMaxCompute

Hologresssa$storrocksE-MapReduceDataworks

IDE

Copilot

DataGovernanceDataQualityTheLake

house

solution

onAli

babacloudApplication

Ingestion

Open

LakeWorkflowDatabase?Large

scale

data

analytics?

SIMD-Optimizedqueryengine?High

speed

real-time

data

ingestion?Innovative

pipeline

executionengine?Full

stack

vectorized

technology?Innovative

CBO

technology?Multi-dimensional

LakehouseAnalyticswith

rich

lakedataformat

support?Materialized

Views

and

ETL

support?High

concurrency

support

(10k

persec)?Real-time

data

analysis?Diverse

data

model

support?Maintenance

free

with

high

SLA?

Compatiblewith

MySQL

protocol?

Compatiblewith

multiple

BItools?

Supportsslowquery

diagnose?

Visual

metadata

management?Easy

migration

with

cluster

link

tool?

Out-of-box,

minute

level

delivery?Efficient

resilience

support?Deep

integration

with

DLF

and

VVP?DisAgg

and

Virtual

Warehouse

supportServerless

StarRocksOffersa

High-Performance,All-Scenario,

Blazing-Fastand

Unified

Data

LakehouseAnalyticsService.100%CompatiblewithOpen-sourceStarRocks,

3X

Fasterthantraditional

OLAP

(Presto/Trino,

ClickHouse,

Druid..)

providing.Easy-to-use

Cloud-nativeEMR

server

lessFastUnifiedstar

ROCKSApplication

Scenario

Ad-hoc

dashboard

Operation

analytics

User

profile

Real-time

analytics

Self-service

reporting

Product

LayerStarRocks-instanceLayerStoragelayerAuto-ScalingLakehouse

Analytics

Shared-NothingArchitecture

HIVEData

LakeTable

FormatStarRocksTable

FormatData

LakeFast

and

unified?Acomprehensivevectorizedexecutionengine,modernizedcost-based

optimizer

(CBO),

with

concurrency

reachingtens

ofthousandsofqueries

persecond

(QPS).?

Fully

compatible

with

datalake

formats,

offering

morethan

a3X

performance

improvement

relative

to

Trino.?Supports

materialized

view

ELT

scenarios,enabling

one-

step

data

tier

processing.Separationofstorageandcompute?Optimizedcomputationalelasticity

for

on-demand

usage,with

the

potentialto

reduce

storagecosts

by

up

to

60%.?

Offers

multi-computing

cluster

capabilities,

ensuring

resourceisolation

between

different

business

unitswithout

interference.?

Various

caching

strategies

available,

allowing

customers

to

flexibly

configure

according

to

their

business

needs.Use

withease?Outofbox,theStarRocksManageroffersa

wide

rangeof

enterprise-level

features.?Intelligent

diagnostics

and

analysis,

providingcomprehensive

analysisinconjunction

withcustomer

business

operations.Data

Loading

Security

SQL

profiling

Audit

log

…Configuration

Monitoring

andManagement

alertVirtualWarehouseVirtualWarehouseVirtualWarehouseproduct

ArchitectFEFEFEData

CacheData

CacheData

CacheHealth

analysis

Upgrading

…InstanceManagementSQL

EditorCNCNCNCNCNCNCNCNCNStarRocks

ManagerStarRocks

ConsoleInstance

MonitorOne-Stop

SQL

Editand

QuerySlowSQL

Profileand

DiagnoseInstance

Diagnosecontrolplanestar

ROCKS

ManagerFully

ManagedExtreme

ElasticityOne-stop

DevandAnalyzeDis-aggregation

SupportHighlightsMaturityADSMVAccelerationcomputeplane

Fastand

stableLakehouse

Hierarchy3x-5xfasterthanTrinoSignificantlyfasterthan

ClickHouseandApache

DorisHive/Paimon/Iceberg/HudiHive/Paimon/Iceberg/HudiSupportexternal

MVand

Lakehouse

HierarchySophisticatedcachingandtieredstoragecapabilityOn-demandSecond-level

Elasticitywith

LowCostComprehensive

loadanalysisanddiagnosticHigh

PerfElasticityLakeQueryAccelerationDWDLocal

CacheCompute

NodeLocal

CacheCompute

NodeODSData

LakeData

LakeQueryAccelerationLakehouseBuild-upStarRocksStarRocksData

IngestionData

IngestionWarehouseWarehouseDWS

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)Realtime

Computessa$storrocksE-MapReduceMaxCompute

HologresDataworks

IDE

Copilot

DataGovernanceDataQualityRecap

TheLake

house

solution

onAli

babacloud

TieredStorageE圖

CompactionApplication

Ingestion

Open

LakeWorkflowData

Lake

FormationAuthenticationAuthorizationLineageMetaStoreDatabaseAPIs?HMS

Compatible?Import/Export

from

/

to

HMS?

MySQL

JDBC?

Open

API

&

SDKsFunctionality?Table

Schema?

TableLineage

(WIP)?

Meta

Retrieval?

MetaStats

forCBOFullyManaged?

Serverless,

Elastic?

High

Available?

HighThroughputs?

OpenAPI

/

SDKLake

Formats?ApachePaimon?

Apache

Iceberg?ApacheHudi?

Databricks

DeltaMetaDataManagementAuditing?

Audit

Log

for

Authorization?

Audit

Log

for

Meta

Operation?

Audit

Log

for

Data

Operation

(WIP)Authorization?

RBAC?Policy&ACL(WIP)Modes?

ApacheRangerCompatibleEnterprise-class

securityAuthentication?

Open

LDAP?

Kerberos

(WIP)?AlibabaCloud

RAMOpen

LakeHot

LayerWarm

LayerCold

LayerIntelligent

optimizationCompaction

ManagerTieredStorage

ManagerMeta

StoreCompactCompactStatsThanksYu

Liliyu@Paimon

+

DLF打通阿里云自研和開源計算引擎李勁松Apache

Paimon

PMC

Chair1.

Open

Lake:

一套存儲對接全生態(tài)2.Apache

Paimon

與開源計算引擎3.Apache

Paimon

與自研計算引擎4.Apache

Paimon

實踐場景CONTENTS1.openLake:一套存儲對接全生態(tài)

+

Kafka

湖格式

SDK

讀寫

湖倉一體元數據湖格式+AITo

Be

Continue…+

內表

+

Parquet

+

Kafka

Hologres

+

內表MaxCompute

+

內表+

內表

+

Parquet

Hologres

+

內表MaxCompute

+

內表0101010101010101010101010100101OSS

數據湖

10101010101010101010101010100101OSS

數據湖

1數據湖到湖倉一體數據交換OSS

文件讀寫數據架構的選擇批式數倉實時湖倉實時數倉

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)Data

Lake

FormationTieredStorage

CompactionRealtime

ComputeMaxCompute

HologresE-MapReduceDataworks

IDE

Copilot

open

Lake

TheLake

house

solution

onAli

babacloudApplicationIngestionWorkflowDataGovernanceData

QualityLakeAuthenticationOpenAuthorizationLineageMetaStoreDatabase2.Apache

pai

mon

與開源計算引擎BatchAggregate實時升級streamingpart

ia

updatestreamingAggregateODSDWDDWS?共享存儲,計算平權?流批一體,實時升級?實時離線,極速查詢?性能成本,業(yè)界領先

Apache

Paimon001011OSS

MaxCompute

HologresongoingPaimon

+開源大數據Ingestionit算平臺事業(yè)部COM

PUTING

PLATF○

RMApplication實時OLAP

OLAPstreaming

IngestionBatchLeftJoin01010101010101010101101010阿里云

F

link+

pai

mon:streamingLake

house多表數據打寬Partial-Update;大規(guī)模Lookup

Join流寫更新入湖主鍵表高性能更新;豐富的合并引擎離線數據加速流寫流讀取代隊列;索引查詢加速流讀變更日志生成完整的變更日志,解鎖流讀4545阿里云

spark+

pai

mon:

離線處理一流性能TPC-DSSF1TPerformanceBaseline+DPP+自適應scan并發(fā)+native+ALL2.521.510.50Normalized

Performance(Higher

is

better)阿里云

star

ROCKS

Pai

mon:

離線數據極速阿里云

star

ROCKS

Pai

mon:Deletion

vectors模式3.Apache

pai

mon

與自研計算引擎

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)DLF打通自研計算引擎?MaxCompute:

ExternalSchema

?Hologres:

External

DatabaseMaxCompute

HologresDataLakeInformation:BridgetoMC&Ho

lo

Data

Lake

Formation

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)即將發(fā)布?

內置

Paimon?Native

加速?DeletionVectors支持?

ALIORC格式?

批寫支持MaxComputeMax

compute+

pai

mon

Data

Lake

Formation

Apache

Paimon(Lake

Format)

OSS-HDFS(LakeStorage)即將發(fā)布?Native加速-Append

No

PKTable-

DeletionVectors

Mode

HologresHol

ogres+

pai

mon

Data

Lake

Formation4.Apache

pai

mon

實踐場景ODS

主鍵表streaming異步compactionDWDAppend

表changelog=lookupApache

Paimon00101Data

Lake某新能源汽車公司在阿里云上的實踐

Application

DatabaseStreamingIngestionLSM

Tree

010101010101010101011010101streaming異步compactionBatchDWSAppend

表ODS主鍵表changelog=inputDWD主鍵表deletion-vectorsApache

Paimon00101Data

Lake某游戲公司在阿里云上的實踐

Application

DatabaseStreamingIngestion

實時OLAPLSM

Tree010101010101010101011010101ODSAppend

表Cluster:Z-order索引:

bloomfilter/

bitmapApache

Paimon00101Data

Lake某本地生活公司在阿里云上的實踐

Application

Database

高性能OLAPStreamingIngestionLSM

Tree010101010101010101011010101Thanks李勁松Apache

Paimon

PMC

Chair阿里云實時湖倉及Flink產品技術介紹李魯兵(云覺)阿里云計算平臺1

大數據實時湖倉發(fā)展趨勢洞察2

基于阿里云實時計算F

link構建實時湖倉3

阿里云實時計算F

link

產品能力解讀CONTENTS4

典型落地架構及案例分享01

大數據實時湖倉發(fā)展趨勢洞察3.01.0引入數倉數據湖2023~2020-20222009-2019數據倉庫

流式分析BI>

大數據進入實時化湖倉時代!AI驅動,

公共云優(yōu)先!實時化、AI化!引領原生湖倉實時化AI化2.0融入湖倉融合結構化,半結構化及非結構化數據數據湖數據科學機器學習02

基于阿里云實時計算Flink構建實時湖倉實時湖倉

(streamingLakehouse)

綜合性價比最優(yōu)選擇分鐘級新鮮度秒級查詢響應低成本全鏈路實時具備Lakehouse特性具備Streaming特性StreamingLakehouseStreaming+

Lakehouse:T

+

1mWarehouse:T+

1Lakehouse:T

+

1

/T

+

1h性能

新鮮度Streaming:T+

1s成本EMRLogs①一鍵入湖CTASCDASFlink流

/

批Queries③AD-HOC查詢②流讀流寫Flink流

/

批④批讀批寫調度

工作流方案原理?低成本OSS存儲構建Paimon?深度集成Flink全鏈路實時化核心優(yōu)勢?低成本全鏈路實時化?流批存儲計算統一?一套平臺具備數據管理、調度

、臨時查詢等能力?開放支持多引擎適用場景?離線全鏈路實時加速?實時鏈路降本?流批存儲計算統一Data

Lake

(OSS/OSS-HDFS)實時湖倉整體方案F

link

Max

computeHol

ogresFlink流

/

批DatabaseQueriesQueries實時湖倉全鏈路實時加速端到端,全鏈路實時流動,實時更新,分鐘級新鮮度,

全鏈路可查,

秒級查詢響應!?

開放支持多種Olap引擎?

外表方式查詢秒級響應?也可直接upload到引擎?

基于內存優(yōu)化查詢性能?Upsert/Partial-Update?Real-Time

Ingestion?Changlog

Producing?

TimeTravel?

LookupJoin?BatchOverwrite/Query?Flink流計算事實標準?

開放支持多種計算引擎?

流寫流讀?

批寫批讀?

臨時查詢/點查?

Streaming

ETL?

全增量一體?Schame

Evolution?整庫/分庫分表?

斷點續(xù)傳數據計算Flink及其他引擎數據存儲Paimon(OSS)Table

Format數據攝取Flink

CDC數據查詢OLAP引擎實時入湖入倉-簡化操作CTAS分庫分表合并同步

CDAS整庫同步Mysql

Paimon(OSS)臨時查詢實時入湖入倉

兼容表變更(schemaEvolution)?

支持通過Catalog來實現元數據的自動發(fā)現和管理?

配合CTAS語法,實現數據的同步和表結構變更自動同步?

支持讀取數據變更和表結構變更并同步到下游,數據和表結構變更都可以保證順序?同步到Paimontable時Partitionby可自動兼容有無分區(qū)字段Order_dbPaimon_orderMysqlPaimon(OSS)More

sources

are

on

the

wayHudiIcebergHologresPaimonTiDBClickHouseD

ata

Stream

API實時入湖入倉-多種過程操作Flink

CDCSQ

L

APISELECTG

RO

U

P

BYag

gregateW

H

EREflatM

apm

apTop-NJO

INjo

inIN

SERTkeyByfilter?

基于OSS/HDFS等低成本存儲?

基于LSM讀寫性能平衡?

Lakehouse特性全支持?

changelog機制數據實時流動Paimon

LSMTree000

0

000低延時低成本流批存儲易集成

Distributed

FileSystem(HDFS/OSS/S3)

實時湖倉低成本存儲1

1

11

111$

files

Flink

SQLSink?Apache

Paimon

內置Sink,屏蔽復雜性支持數據流批計算Apache

PaimonFile

Store實時寫入Log

Store

Flink

SQL

Flink

SQL?

LSM支持

Update/Delete?

列存格式,支持壓縮等優(yōu)化?

支持全量批式讀取

?

Table

的操作記錄?

支持插件化實現?通過兩階段提交保證數據Exactly

Once?

Table

的文件存儲形式

Batc

h

Log

Store

St

rea

mFile

Store?

支持增量流式訂閱03

阿里云實時計算Flink產品能力解讀流&批計算多語言多版本動態(tài)CEP統一元數據(catalog)開發(fā)生產隔離測試數據管理測試數據生成快速運營調試臨時查詢對接外部開發(fā)平臺如Git等Flink

CDC?

全增量一體?

整庫整表合并/分庫分表?Yaml模版?

斷點續(xù)傳

數據連接器?

30+種主流數據產品?

自定義connector&Format批任務調度數據血緣智能診斷自動調優(yōu)資源隊列管理狀態(tài)管理變量管理密鑰管理監(jiān)控告警阿里云實時計算Flink產品豐富的企業(yè)級能力安全細粒度權限管理RBAC空間隔離上下游SSL支持運維數據攝取任務開發(fā)&調測試升級企業(yè)級安全能力基礎設施、平臺系統安全多維度,提供全面的安全加固功能來保障數據安全!獨立大規(guī)模集群及網絡隔離環(huán)境阿里云數據中心數據中心保障設施

多層次的服務安全部署設計

數據中心網絡安全訪問控制與權限管控?阿里云賬戶體系身份識別?阿里云賬號體系全面適配,包括阿里云賬號,資源目錄、云

SSO等?RAM權限控制?

集成RAM體系,支持RAM用

戶以及角色登錄鑒權RABC細粒度權限管理支持內置角色以及自定義角色,

實現細粒度操作授權數據安全?

密鑰托管?

支持配置密鑰,避免明文AccessKey帶來的安全風險?

自動備份恢復?

采用存儲計算分離架構,數據以及作業(yè)狀態(tài)備份?

操作審計?

對接ActionTrail實現對事件的監(jiān)控告警、及時審計、問題回溯分析安全隔離?網絡隔離?

VPC專有網絡安全可靠、靈

活可控?

支持上下游服務域名管理?

通過阿里云提供的NAT網關實現VPC網絡與公網網絡互

通?

租戶隔離?

多租戶資源隔離?

用戶數據存儲隔離業(yè)務中斷數據泄露權限控制不足安全攻擊Flink平臺系統安全云上大數據服務如何保障企業(yè)數據和服務安全構建全面、多層次的安全管理能力,持續(xù)保護云上數據及服務安全全鏈路數據集服務高可用設計Flink基礎設施安全Flink服務部署環(huán)境同城容災與恢復數據中心安全管控發(fā)布openAPIv2版本更易集成deploymentTarget改造deployment動態(tài)更新自定義connector管理lineage數據血緣catalog管理UDF

注冊重啟作業(yè)指標分析綜合各指標生成調優(yōu)計劃

執(zhí)行計劃部署集群基于業(yè)務處理復雜度與數據流量,資源動態(tài)調整作業(yè)資源自動調優(yōu)Flink

MetricAutopilot推斷可加入

MiniBatch

confFlink

RestfulAPI動態(tài)更新作業(yè)資源利用率低成本高(

易發(fā)生FailOver作業(yè)吞吐低,延遲高作業(yè)AGG算子處理能力達到瓶頸其他診斷系統作業(yè)管理平臺ll更新作業(yè)配置采集指標Autopilot啟動速度慢過低配置過高配置04

典型落地架構及案例分享?Hologres

、Paimon都具備流式訪問能力,故數倉各層可以根據存儲成本、業(yè)務時效性進行選擇?

數據直接入Hologres:提供秒級時效性+極致OLAP性能?

數據構建在Paimon上+用Hologres進行查詢加速:提供分鐘級時效性+秒級OLAP性能?OLAP引擎可選,支持StarRocks

、Trino等OSS(Paimon)Flink

SQL

Hologres!簡單SQL探查

!

OLAP查詢分析

Flink奧型參考方案架構Paimon(OSS)Binlog

FlinkOSS(Paimon)FlinkDWDHologres

BinlogFlinkDWS

ADSPaimon

(OSS)

Binlog

DashboardsHologresHologresHologresODSFlink開發(fā)效率提升進一倍

,每年節(jié)省存儲成本KW

,查詢效率提升3倍;?從兩條鏈路簡化到一條鏈路,簡化了系統的復雜度;運維工作復雜度大幅減輕;?一套SQL/Table

、一套schema,大幅提升開發(fā)效率;?大量縮減Kafka集群,每年節(jié)省KW成本;?

中間數據可直接查詢,通過starRocks查詢,相比Presto/Impala速度提升3倍以上;

Log

應用庫

databa

CDC

Paimo

Paimon聚合

Paimon

算法庫se

(OSS)

(OSS)

(OSS)n加國內出行知名互聯網企業(yè),月活千萬用戶;

客戶基于開源hadoop體系進行自建,實時業(yè)務比重較大,

實時大數據資源超過離線數據處理;通過Flink+kafka鏈路處理實時數據,通過spark/hive/Trino處理離線數據;過程中,兩條技術棧開發(fā)、維護成本高,存儲成本高,離線實時分別存儲;流處理中間數據查詢困難;Impala/PrestoStarRocksADSkafka增量ADSPresto離線鏈路解決方案背景介紹達到效果典型客戶落地案例Flink

Flink

Flink應用庫報表算法庫ODSkafka

dumpODSHiveDWDkafka

dumpDWDHiveFlink聚合離線聚合Flink加工離線加工Logdataba

seFlink+Paimon+StarRocksODS

DWD

ADS數據集成演進架構原有架構業(yè)務痛點實時鏈路報表Thanks云覺釘釘:

tute2014茶歇Flink

+

Paimon

+

Hologres在阿里巴巴智能引擎的生產實踐王偉駿(鴻歷)阿里巴巴智能引擎事業(yè)部技術專家CONTENTS1、產品背景簡介2、解決方案舉例

---

搜索離線平臺3、生產作業(yè)調優(yōu)及社區(qū)合作4

、

Future1、產品背景簡介BinlogTransactions

Message

QueueAlgorithmdataEventsLogsDatabaseMysqlODPSPaimon…MessageQueueOfflineSystemStreamProcessingBatchProcessingODPSPaimonHologresFileSystem…

SearchEngine

AdvertisingEngine

RecommendationEngine

SampleEngine

…基于該業(yè)務場景我們做了一個提供AI

領域e2e

的ETL

數據處理解決方案的產品1、異構數據源多2、業(yè)務多且邏輯復雜3、性能調優(yōu)難、運維門檻高業(yè)務場景及產品定義…UI&&WebIDE(開發(fā)、配置、運維、監(jiān)控、報警)產品端核心功能依賴組件Hologres分布式

kv

存儲數據集成樣本處理SQLAdHocOLAP流計算批計算流批一體用戶插件調度編排AirflowCatalog(Meta、版本、血緣、

Dataset)天貓本地生活菜鳥高德AE飛豬LazadaOpenSearch…

ASI(支持

K8S

協議的統一調度、統一資源池)Swift消息隊列Pangu(分布式文件系統)Paimon湖格式湖表存儲優(yōu)化服務VVP提作業(yè)、開發(fā)、運維Celeborn統一Shuffle服務Restune作業(yè)彈性資源Embedding計算產品技術架構支持業(yè)務

淘寶

ConnectorCDC圖像檢索樣本平臺HA3ODPSPaimon視覺平臺離線推理…特征

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論