版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
OPENING李鈺
(絕頂)ASF
Member,Apache
Celeborn/Flink/HBase/Paimon
PMC
Member阿里云智能
EMR
負責人Data
TrendsAIGCfurther
promotestheexplosion
of
big
data
DataVolume:AIfurtherdrivesmassivedata
explosion,
far
exceeding
the
data
growth
of
the
previous
era
Data
Diversity:
Multimodaldata
processingwill
becomeastandardforfuture
data
processing,
including
storage,
computation,andmanagement
DataGovernance:Onedataservingdifferent
roles,
including
Data
Engineer/
Data
Analysts
/
Data
Scientists
/
AI
EngineersAnalytic
Data46%PicturesAI
Models
1%Others43%Vedio5%5%Data
WarehouseReportsDatawarehousesETLApplicationsData
LakehousestreamingAnalyticsstructured,
semi
structured
andunstructured
DataData
LakeRealtimeAnalyticsData
Explore
ETL
Data
warehousesData
Lakestructured,
semi
structuredandunstructuredDataThe
EvolutionofDataArchitectureMachineLearningMachineLearningDatascienceDatascienceReportsDatabaseData
WarehouseReportsData
warehousesETLApplicationsStrengthsWeaknessesExcellent
performance·
Data
Format
isnot
openout-of-box,
Easy
to
use·Lack
ofsupport
for
Non/semi
structureFriendly
toData
AnalystsDataAll
Data
notimmediatelyrequiredwill
be
discardedApplication
DataWarehouseTheData
warehouse
ArchitectureETL
PipelineDatabaseDatabaseData
LakeRealtimeAnalyticsData
Explore
ETL
Data
warehousesData
Lakestructured,
semi
structured
andunstructured
DataStrengths
unifiedstoragewith
lowcost·performance
isnotasgood
asDW
openDataand
Meta
FormatDataGovernance
is
notmature Fits
Both
BI
an
d
AI
Hard
to
construct
and
operateAnalyze
See
ResultsData
LakeThe
DataLake
ArchitectureIterateELT
ModelMachineLearningWeaknessesApplicationDatascienceDatabaseReportsstoreAlIData
LakeData
ExploreData
LakeData
LakehousestreamingAnalyticsstructured,
semi
structured
and
unstructured
DataData
WarehouseReportsDatawarehousesETLApplicationsDatabaseData
Lake
+Data
warehouse
=Data
Lake
houseMachineLearningDatascienceDevOpsComputingEnginesManagement
Services
Apache
Gra
vit
inoDataStorageAliba
baC
lo
udOSSGovernance
ServicesData
Formats
Apache
paimoncom
pos
ableopensourceLake
housesolution
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)Data
Lake
FormationTieredStorage
CompactionRealtime
ComputeMaxCompute
HologresE-MapReduceDataworks
IDE
Copilot
open
Lake
TheLake
house
solution
onAli
babacloudApplicationIngestionWorkflowDataGovernanceData
QualityLakeAuthenticationOpenAuthorizationLineageMetaStoreDatabaseBUILDOPENSOURCECOMPATIBLE
LAKEHOUSEONALIBABACLOUD李鈺
(絕頂)ASF
Member,Apache
Celeborn/Flink/HBase/Paimon
PMC
Member阿里云智能
EMR
負責人F
lin
kSQ
LS
tre
a
m
in
g
&
B
a
tchQ
u
erie
sPaimon
Paimon
PaimonLake
houseprocessingpipelinebin
logRD
BM
SLogsHologresF
lin
kSQ
L
LakeGovernanceD
ata
S
erving
System
sA
D
SO
D
S
D
WD
D
WS
Lake
Format
Lake
StorageF
lin
kSQ
LS
tre
a
m
in
g
&
B
a
tchF
lin
kSQ
LS
tre
a
m
in
g
&
B
a
tch
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)Data
Lake
FormationMetaStoreLineageAuthenticationAuthorization
TieredStorageE圖
CompactionRealtime
ComputeMaxCompute
Hologresssa$storrocksE-MapReduceDataworks
IDE
Copilot
DataGovernanceDataQualityRecap
TheLake
house
solution
onAli
babacloudApplication
Ingestion
Open
LakeWorkflowDatabaseResilient?Enterprise
remote
shuffle
service(RSS)solutionto
support
better
elasticity?
On-demandandseamless
rescaling?Native
integration
with
DLF
and
OSSEasyto
Use?
One-stopdataengineering
support?
Visualized
jobandworkflow
monitor?
Convenient
resourceandsession
managementFlexible?Rich
Open
API
supplied
forintegration?100%compatible
with
open
sourceusage,
bothAPIand
binaryaspect?Rich
ecology
supportedFast?Native
Engine
supported,
3X
fasterthanopen
source
Spark?Enhanced
RSS
supplies
1.5Xthroughputfor
IO-intensiveappsServerlessSparkTransforms
Data
ManagementwithOne-Stop,
Fully
ManagedServicesfor
Seamless
Development,
Scheduling,
and
Maintenance.100%CompatiblewithOpen-sourceSpark,
3X
Fasterwith
Fusion,an
Enterprise
Native
Engine.EMR
server
less
sparkApp
ScenarioControl
PlaneRemote
ShuffleSpark
Native
EngineCompute
PlaneData
IOStorage
LayerLake
FormatsObjectStorage
ServiceEnterpriseCache
ServiceSecurityandAuth(DLF)EnterpriseRemoteShuffle
Serviceproduct
ArchitectureDashboard
ReportOperationalAnalyticsData
DiscoveryMeta
ServiceData
EngineerSchedulingIntelligent
MaintenanceVersionControlAccountingData
ScienceConnection
ManagementResource
Usage
MonitoringSession
Management
(Resourcefor
Interactive
Query)Queue
Management
(Resourcefor
ETL)controlplanework
space
AdministrationVersion
ControlSQL
EditorArtifacts
ManagementCatalog
Viewcontrolplane
DataEngineeringIntelligent
DiagnoseJob
ListLogsMetricscontrolplane
Job
Monitor
and
DiagnoseWorkflow
ListWorkflow
Instance
Monitor
–
GlobalViewCanvas
EditorWorkflow
Instance
Monitor
–
Single
ExecutionViewcontrolplane
work
flow
Managementx86
(Intel/AMD)andARMsupportHardware
awareness
optimization?SVE
SIMD
acceleration?
zstd-ptg
compression
accelerationNative
C++Integration?
OSS-HDFSSupport?Deep
Parquet
and
ORC
integration?
Paimon
、Delta
Lake
andIcebergsupportVectorized
Execution
Engine?
Native
Operator?
SIMDJson
OptimizationFastColumnarShuffle?EnterpriseRSS
basedon
ApacheCeleborn?
Datashuffle
reduced
upto
40%computeplane
FusionEngineFusion
isanenterprise
nativeenginewhich
is3X
Fasterthan
the
open
source
Spark
Java
engineTesting
Environment?
6d3s.
16xlargeECSserver?
Alibaba
Cloud
Linux
3?OpenJDK
1.8.0?ApacheTop
Level
Project,donated
byAlibaba
Cloud?De-facto
RSS
choice,
used
byAlibaba,
LinkedIn,
etc.Multi-?Enterprisesecurity
assurancewith
data
encryptionTenancy?Enhanced
IOscheduling,flow
controland
quota
management?Widelyadopted
inAlibaba,
used
by
bothSpark
and
Flink?Successfullysupportsjobwith600TB+shuffle
data?69%
Performance
boostthanYARN
externalshuffle?Performance
gain
increaseswithshuffle
data
scaleFunctionalit?
SupportsSpark
DRAy?
SupportsSparkAQE?8d2s.10xlarge
ECSservers?
AlibabaCloud
Linux
3?OpenJDK
1.8.0?Spark
3.3.1?Shuffle
Partition
=8000computeplan
EnterpriseRemote
shuffle
serviceRSS
removesthedependencyon
localdiskfor
shuffle
data
and
enables
100%
disaggregation
of
compute
and
storageScalabilityPerformanceTestEnvironmentOpen
SourceWorkflow
IntegrationOpenAPI?
Workspace?
Job
Runs?SQL
Editor?
WorkflowsTools?
Spark-submitCompatibleJob
Submission?
Notebook?Git
integration(Planning)Alibaba
Cloud
Product
IntegrationOSS-HDFSMaxCompute
DLF
DataWorksopenAPI
and
EcosystemFunctionDatabricksEMRServerlessSparkNative
EngineYESYESSQL
EditorYESYESWorkflow
ManagementYESYESDebuggingand
MonitorYESYESIntelligent
DiagnoseNOYESCatalogandAuthenticationYESYESData
&
FSYES(DBFS)YES(OSS-HDFS)AuditingYESYESNotebookYESYESCI/CDwith
GitYESNOAssistant/CopilotYESNOML&
Vector
ServingYESNOEMR
server
less
spark
vs.Data
bricks
Function-wise
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)Data
Lake
FormationMetaStoreLineageAuthenticationAuthorization
TieredStorageE圖
CompactionRealtime
ComputeMaxCompute
Hologresssa$storrocksE-MapReduceDataworks
IDE
Copilot
DataGovernanceDataQualityTheLake
house
solution
onAli
babacloudApplication
Ingestion
Open
LakeWorkflowDatabase?Large
scale
data
analytics?
SIMD-Optimizedqueryengine?High
speed
real-time
data
ingestion?Innovative
pipeline
executionengine?Full
stack
vectorized
technology?Innovative
CBO
technology?Multi-dimensional
LakehouseAnalyticswith
rich
lakedataformat
support?Materialized
Views
and
ETL
support?High
concurrency
support
(10k
persec)?Real-time
data
analysis?Diverse
data
model
support?Maintenance
free
with
high
SLA?
Compatiblewith
MySQL
protocol?
Compatiblewith
multiple
BItools?
Supportsslowquery
diagnose?
Visual
metadata
management?Easy
migration
with
cluster
link
tool?
Out-of-box,
minute
level
delivery?Efficient
resilience
support?Deep
integration
with
DLF
and
VVP?DisAgg
and
Virtual
Warehouse
supportServerless
StarRocksOffersa
High-Performance,All-Scenario,
Blazing-Fastand
Unified
Data
LakehouseAnalyticsService.100%CompatiblewithOpen-sourceStarRocks,
3X
Fasterthantraditional
OLAP
(Presto/Trino,
ClickHouse,
Druid..)
providing.Easy-to-use
Cloud-nativeEMR
server
lessFastUnifiedstar
ROCKSApplication
Scenario
Ad-hoc
dashboard
Operation
analytics
User
profile
Real-time
analytics
Self-service
reporting
…
Product
LayerStarRocks-instanceLayerStoragelayerAuto-ScalingLakehouse
Analytics
Shared-NothingArchitecture
HIVEData
LakeTable
FormatStarRocksTable
FormatData
LakeFast
and
unified?Acomprehensivevectorizedexecutionengine,modernizedcost-based
optimizer
(CBO),
with
concurrency
reachingtens
ofthousandsofqueries
persecond
(QPS).?
Fully
compatible
with
datalake
formats,
offering
morethan
a3X
performance
improvement
relative
to
Trino.?Supports
materialized
view
ELT
scenarios,enabling
one-
step
data
tier
processing.Separationofstorageandcompute?Optimizedcomputationalelasticity
for
on-demand
usage,with
the
potentialto
reduce
storagecosts
by
up
to
60%.?
Offers
multi-computing
cluster
capabilities,
ensuring
resourceisolation
between
different
business
unitswithout
interference.?
Various
caching
strategies
available,
allowing
customers
to
flexibly
configure
according
to
their
business
needs.Use
withease?Outofbox,theStarRocksManageroffersa
wide
rangeof
enterprise-level
features.?Intelligent
diagnostics
and
analysis,
providingcomprehensive
analysisinconjunction
withcustomer
business
operations.Data
Loading
Security
SQL
profiling
Audit
log
…Configuration
Monitoring
andManagement
alertVirtualWarehouseVirtualWarehouseVirtualWarehouseproduct
ArchitectFEFEFEData
CacheData
CacheData
CacheHealth
analysis
Upgrading
…InstanceManagementSQL
EditorCNCNCNCNCNCNCNCNCNStarRocks
ManagerStarRocks
ConsoleInstance
MonitorOne-Stop
SQL
Editand
QuerySlowSQL
Profileand
DiagnoseInstance
Diagnosecontrolplanestar
ROCKS
ManagerFully
ManagedExtreme
ElasticityOne-stop
DevandAnalyzeDis-aggregation
SupportHighlightsMaturityADSMVAccelerationcomputeplane
Fastand
stableLakehouse
Hierarchy3x-5xfasterthanTrinoSignificantlyfasterthan
ClickHouseandApache
DorisHive/Paimon/Iceberg/HudiHive/Paimon/Iceberg/HudiSupportexternal
MVand
Lakehouse
HierarchySophisticatedcachingandtieredstoragecapabilityOn-demandSecond-level
Elasticitywith
LowCostComprehensive
loadanalysisanddiagnosticHigh
PerfElasticityLakeQueryAccelerationDWDLocal
CacheCompute
NodeLocal
CacheCompute
NodeODSData
LakeData
LakeQueryAccelerationLakehouseBuild-upStarRocksStarRocksData
IngestionData
IngestionWarehouseWarehouseDWS
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)Realtime
Computessa$storrocksE-MapReduceMaxCompute
HologresDataworks
IDE
Copilot
DataGovernanceDataQualityRecap
TheLake
house
solution
onAli
babacloud
TieredStorageE圖
CompactionApplication
Ingestion
Open
LakeWorkflowData
Lake
FormationAuthenticationAuthorizationLineageMetaStoreDatabaseAPIs?HMS
Compatible?Import/Export
from
/
to
HMS?
MySQL
JDBC?
Open
API
&
SDKsFunctionality?Table
Schema?
TableLineage
(WIP)?
Meta
Retrieval?
MetaStats
forCBOFullyManaged?
Serverless,
Elastic?
High
Available?
HighThroughputs?
OpenAPI
/
SDKLake
Formats?ApachePaimon?
Apache
Iceberg?ApacheHudi?
Databricks
DeltaMetaDataManagementAuditing?
Audit
Log
for
Authorization?
Audit
Log
for
Meta
Operation?
Audit
Log
for
Data
Operation
(WIP)Authorization?
RBAC?Policy&ACL(WIP)Modes?
ApacheRangerCompatibleEnterprise-class
securityAuthentication?
Open
LDAP?
Kerberos
(WIP)?AlibabaCloud
RAMOpen
LakeHot
LayerWarm
LayerCold
LayerIntelligent
optimizationCompaction
ManagerTieredStorage
ManagerMeta
StoreCompactCompactStatsThanksYu
Liliyu@Paimon
+
DLF打通阿里云自研和開源計算引擎李勁松Apache
Paimon
PMC
Chair1.
Open
Lake:
一套存儲對接全生態(tài)2.Apache
Paimon
與開源計算引擎3.Apache
Paimon
與自研計算引擎4.Apache
Paimon
實踐場景CONTENTS1.openLake:一套存儲對接全生態(tài)
+
Kafka
湖格式
SDK
讀寫
湖倉一體元數據湖格式+AITo
Be
Continue…+
內表
+
Parquet
+
Kafka
Hologres
+
內表MaxCompute
+
內表+
內表
+
Parquet
Hologres
+
內表MaxCompute
+
內表0101010101010101010101010100101OSS
數據湖
10101010101010101010101010100101OSS
數據湖
1數據湖到湖倉一體數據交換OSS
文件讀寫數據架構的選擇批式數倉實時湖倉實時數倉
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)Data
Lake
FormationTieredStorage
CompactionRealtime
ComputeMaxCompute
HologresE-MapReduceDataworks
IDE
Copilot
open
Lake
TheLake
house
solution
onAli
babacloudApplicationIngestionWorkflowDataGovernanceData
QualityLakeAuthenticationOpenAuthorizationLineageMetaStoreDatabase2.Apache
pai
mon
與開源計算引擎BatchAggregate實時升級streamingpart
ia
updatestreamingAggregateODSDWDDWS?共享存儲,計算平權?流批一體,實時升級?實時離線,極速查詢?性能成本,業(yè)界領先
Apache
Paimon001011OSS
MaxCompute
HologresongoingPaimon
+開源大數據Ingestionit算平臺事業(yè)部COM
PUTING
PLATF○
RMApplication實時OLAP
OLAPstreaming
IngestionBatchLeftJoin01010101010101010101101010阿里云
F
link+
pai
mon:streamingLake
house多表數據打寬Partial-Update;大規(guī)模Lookup
Join流寫更新入湖主鍵表高性能更新;豐富的合并引擎離線數據加速流寫流讀取代隊列;索引查詢加速流讀變更日志生成完整的變更日志,解鎖流讀4545阿里云
spark+
pai
mon:
離線處理一流性能TPC-DSSF1TPerformanceBaseline+DPP+自適應scan并發(fā)+native+ALL2.521.510.50Normalized
Performance(Higher
is
better)阿里云
star
ROCKS
Pai
mon:
離線數據極速阿里云
star
ROCKS
Pai
mon:Deletion
vectors模式3.Apache
pai
mon
與自研計算引擎
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)DLF打通自研計算引擎?MaxCompute:
ExternalSchema
?Hologres:
External
DatabaseMaxCompute
HologresDataLakeInformation:BridgetoMC&Ho
lo
Data
Lake
Formation
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)即將發(fā)布?
內置
Paimon?Native
加速?DeletionVectors支持?
ALIORC格式?
批寫支持MaxComputeMax
compute+
pai
mon
Data
Lake
Formation
Apache
Paimon(Lake
Format)
OSS-HDFS(LakeStorage)即將發(fā)布?Native加速-Append
No
PKTable-
DeletionVectors
Mode
HologresHol
ogres+
pai
mon
Data
Lake
Formation4.Apache
pai
mon
實踐場景ODS
主鍵表streaming異步compactionDWDAppend
表changelog=lookupApache
Paimon00101Data
Lake某新能源汽車公司在阿里云上的實踐
Application
DatabaseStreamingIngestionLSM
Tree
010101010101010101011010101streaming異步compactionBatchDWSAppend
表ODS主鍵表changelog=inputDWD主鍵表deletion-vectorsApache
Paimon00101Data
Lake某游戲公司在阿里云上的實踐
Application
DatabaseStreamingIngestion
實時OLAPLSM
Tree010101010101010101011010101ODSAppend
表Cluster:Z-order索引:
bloomfilter/
bitmapApache
Paimon00101Data
Lake某本地生活公司在阿里云上的實踐
Application
Database
高性能OLAPStreamingIngestionLSM
Tree010101010101010101011010101Thanks李勁松Apache
Paimon
PMC
Chair阿里云實時湖倉及Flink產品技術介紹李魯兵(云覺)阿里云計算平臺1
大數據實時湖倉發(fā)展趨勢洞察2
基于阿里云實時計算F
link構建實時湖倉3
阿里云實時計算F
link
產品能力解讀CONTENTS4
典型落地架構及案例分享01
大數據實時湖倉發(fā)展趨勢洞察3.01.0引入數倉數據湖2023~2020-20222009-2019數據倉庫
流式分析BI>
大數據進入實時化湖倉時代!AI驅動,
公共云優(yōu)先!實時化、AI化!引領原生湖倉實時化AI化2.0融入湖倉融合結構化,半結構化及非結構化數據數據湖數據科學機器學習02
基于阿里云實時計算Flink構建實時湖倉實時湖倉
(streamingLakehouse)
綜合性價比最優(yōu)選擇分鐘級新鮮度秒級查詢響應低成本全鏈路實時具備Lakehouse特性具備Streaming特性StreamingLakehouseStreaming+
Lakehouse:T
+
1mWarehouse:T+
1Lakehouse:T
+
1
/T
+
1h性能
新鮮度Streaming:T+
1s成本EMRLogs①一鍵入湖CTASCDASFlink流
/
批Queries③AD-HOC查詢②流讀流寫Flink流
/
批④批讀批寫調度
工作流方案原理?低成本OSS存儲構建Paimon?深度集成Flink全鏈路實時化核心優(yōu)勢?低成本全鏈路實時化?流批存儲計算統一?一套平臺具備數據管理、調度
、臨時查詢等能力?開放支持多引擎適用場景?離線全鏈路實時加速?實時鏈路降本?流批存儲計算統一Data
Lake
(OSS/OSS-HDFS)實時湖倉整體方案F
link
Max
computeHol
ogresFlink流
/
批DatabaseQueriesQueries實時湖倉全鏈路實時加速端到端,全鏈路實時流動,實時更新,分鐘級新鮮度,
全鏈路可查,
秒級查詢響應!?
開放支持多種Olap引擎?
外表方式查詢秒級響應?也可直接upload到引擎?
基于內存優(yōu)化查詢性能?Upsert/Partial-Update?Real-Time
Ingestion?Changlog
Producing?
TimeTravel?
LookupJoin?BatchOverwrite/Query?Flink流計算事實標準?
開放支持多種計算引擎?
流寫流讀?
批寫批讀?
臨時查詢/點查?
Streaming
ETL?
全增量一體?Schame
Evolution?整庫/分庫分表?
斷點續(xù)傳數據計算Flink及其他引擎數據存儲Paimon(OSS)Table
Format數據攝取Flink
CDC數據查詢OLAP引擎實時入湖入倉-簡化操作CTAS分庫分表合并同步
CDAS整庫同步Mysql
Paimon(OSS)臨時查詢實時入湖入倉
兼容表變更(schemaEvolution)?
支持通過Catalog來實現元數據的自動發(fā)現和管理?
配合CTAS語法,實現數據的同步和表結構變更自動同步?
支持讀取數據變更和表結構變更并同步到下游,數據和表結構變更都可以保證順序?同步到Paimontable時Partitionby可自動兼容有無分區(qū)字段Order_dbPaimon_orderMysqlPaimon(OSS)More
sources
are
on
the
wayHudiIcebergHologresPaimonTiDBClickHouseD
ata
Stream
API實時入湖入倉-多種過程操作Flink
CDCSQ
L
APISELECTG
RO
U
P
BYag
gregateW
H
EREflatM
apm
apTop-NJO
INjo
inIN
SERTkeyByfilter?
基于OSS/HDFS等低成本存儲?
基于LSM讀寫性能平衡?
Lakehouse特性全支持?
changelog機制數據實時流動Paimon
LSMTree000
0
000低延時低成本流批存儲易集成
Distributed
FileSystem(HDFS/OSS/S3)
實時湖倉低成本存儲1
1
11
111$
files
Flink
SQLSink?Apache
Paimon
內置Sink,屏蔽復雜性支持數據流批計算Apache
PaimonFile
Store實時寫入Log
Store
Flink
SQL
Flink
SQL?
LSM支持
Update/Delete?
列存格式,支持壓縮等優(yōu)化?
支持全量批式讀取
?
Table
的操作記錄?
支持插件化實現?通過兩階段提交保證數據Exactly
Once?
Table
的文件存儲形式
Batc
h
Log
Store
St
rea
mFile
Store?
支持增量流式訂閱03
阿里云實時計算Flink產品能力解讀流&批計算多語言多版本動態(tài)CEP統一元數據(catalog)開發(fā)生產隔離測試數據管理測試數據生成快速運營調試臨時查詢對接外部開發(fā)平臺如Git等Flink
CDC?
全增量一體?
整庫整表合并/分庫分表?Yaml模版?
斷點續(xù)傳
數據連接器?
30+種主流數據產品?
自定義connector&Format批任務調度數據血緣智能診斷自動調優(yōu)資源隊列管理狀態(tài)管理變量管理密鑰管理監(jiān)控告警阿里云實時計算Flink產品豐富的企業(yè)級能力安全細粒度權限管理RBAC空間隔離上下游SSL支持運維數據攝取任務開發(fā)&調測試升級企業(yè)級安全能力基礎設施、平臺系統安全多維度,提供全面的安全加固功能來保障數據安全!獨立大規(guī)模集群及網絡隔離環(huán)境阿里云數據中心數據中心保障設施
多層次的服務安全部署設計
數據中心網絡安全訪問控制與權限管控?阿里云賬戶體系身份識別?阿里云賬號體系全面適配,包括阿里云賬號,資源目錄、云
SSO等?RAM權限控制?
集成RAM體系,支持RAM用
戶以及角色登錄鑒權RABC細粒度權限管理支持內置角色以及自定義角色,
實現細粒度操作授權數據安全?
密鑰托管?
支持配置密鑰,避免明文AccessKey帶來的安全風險?
自動備份恢復?
采用存儲計算分離架構,數據以及作業(yè)狀態(tài)備份?
操作審計?
對接ActionTrail實現對事件的監(jiān)控告警、及時審計、問題回溯分析安全隔離?網絡隔離?
VPC專有網絡安全可靠、靈
活可控?
支持上下游服務域名管理?
通過阿里云提供的NAT網關實現VPC網絡與公網網絡互
通?
租戶隔離?
多租戶資源隔離?
用戶數據存儲隔離業(yè)務中斷數據泄露權限控制不足安全攻擊Flink平臺系統安全云上大數據服務如何保障企業(yè)數據和服務安全構建全面、多層次的安全管理能力,持續(xù)保護云上數據及服務安全全鏈路數據集服務高可用設計Flink基礎設施安全Flink服務部署環(huán)境同城容災與恢復數據中心安全管控發(fā)布openAPIv2版本更易集成deploymentTarget改造deployment動態(tài)更新自定義connector管理lineage數據血緣catalog管理UDF
注冊重啟作業(yè)指標分析綜合各指標生成調優(yōu)計劃
執(zhí)行計劃部署集群基于業(yè)務處理復雜度與數據流量,資源動態(tài)調整作業(yè)資源自動調優(yōu)Flink
MetricAutopilot推斷可加入
MiniBatch
confFlink
RestfulAPI動態(tài)更新作業(yè)資源利用率低成本高(
易發(fā)生FailOver作業(yè)吞吐低,延遲高作業(yè)AGG算子處理能力達到瓶頸其他診斷系統作業(yè)管理平臺ll更新作業(yè)配置采集指標Autopilot啟動速度慢過低配置過高配置04
典型落地架構及案例分享?Hologres
、Paimon都具備流式訪問能力,故數倉各層可以根據存儲成本、業(yè)務時效性進行選擇?
數據直接入Hologres:提供秒級時效性+極致OLAP性能?
數據構建在Paimon上+用Hologres進行查詢加速:提供分鐘級時效性+秒級OLAP性能?OLAP引擎可選,支持StarRocks
、Trino等OSS(Paimon)Flink
SQL
Hologres!簡單SQL探查
!
OLAP查詢分析
Flink奧型參考方案架構Paimon(OSS)Binlog
FlinkOSS(Paimon)FlinkDWDHologres
BinlogFlinkDWS
ADSPaimon
(OSS)
Binlog
DashboardsHologresHologresHologresODSFlink開發(fā)效率提升進一倍
,每年節(jié)省存儲成本KW
,查詢效率提升3倍;?從兩條鏈路簡化到一條鏈路,簡化了系統的復雜度;運維工作復雜度大幅減輕;?一套SQL/Table
、一套schema,大幅提升開發(fā)效率;?大量縮減Kafka集群,每年節(jié)省KW成本;?
中間數據可直接查詢,通過starRocks查詢,相比Presto/Impala速度提升3倍以上;
Log
應用庫
databa
CDC
Paimo
工
Paimon聚合
Paimon
算法庫se
(OSS)
(OSS)
(OSS)n加國內出行知名互聯網企業(yè),月活千萬用戶;
客戶基于開源hadoop體系進行自建,實時業(yè)務比重較大,
實時大數據資源超過離線數據處理;通過Flink+kafka鏈路處理實時數據,通過spark/hive/Trino處理離線數據;過程中,兩條技術棧開發(fā)、維護成本高,存儲成本高,離線實時分別存儲;流處理中間數據查詢困難;Impala/PrestoStarRocksADSkafka增量ADSPresto離線鏈路解決方案背景介紹達到效果典型客戶落地案例Flink
Flink
Flink應用庫報表算法庫ODSkafka
dumpODSHiveDWDkafka
dumpDWDHiveFlink聚合離線聚合Flink加工離線加工Logdataba
seFlink+Paimon+StarRocksODS
DWD
ADS數據集成演進架構原有架構業(yè)務痛點實時鏈路報表Thanks云覺釘釘:
tute2014茶歇Flink
+
Paimon
+
Hologres在阿里巴巴智能引擎的生產實踐王偉駿(鴻歷)阿里巴巴智能引擎事業(yè)部技術專家CONTENTS1、產品背景簡介2、解決方案舉例
---
搜索離線平臺3、生產作業(yè)調優(yōu)及社區(qū)合作4
、
Future1、產品背景簡介BinlogTransactions
Message
QueueAlgorithmdataEventsLogsDatabaseMysqlODPSPaimon…MessageQueueOfflineSystemStreamProcessingBatchProcessingODPSPaimonHologresFileSystem…
SearchEngine
AdvertisingEngine
RecommendationEngine
SampleEngine
…基于該業(yè)務場景我們做了一個提供AI
領域e2e
的ETL
數據處理解決方案的產品1、異構數據源多2、業(yè)務多且邏輯復雜3、性能調優(yōu)難、運維門檻高業(yè)務場景及產品定義…UI&&WebIDE(開發(fā)、配置、運維、監(jiān)控、報警)產品端核心功能依賴組件Hologres分布式
kv
存儲數據集成樣本處理SQLAdHocOLAP流計算批計算流批一體用戶插件調度編排AirflowCatalog(Meta、版本、血緣、
Dataset)天貓本地生活菜鳥高德AE飛豬LazadaOpenSearch…
ASI(支持
K8S
協議的統一調度、統一資源池)Swift消息隊列Pangu(分布式文件系統)Paimon湖格式湖表存儲優(yōu)化服務VVP提作業(yè)、開發(fā)、運維Celeborn統一Shuffle服務Restune作業(yè)彈性資源Embedding計算產品技術架構支持業(yè)務
淘寶
ConnectorCDC圖像檢索樣本平臺HA3ODPSPaimon視覺平臺離線推理…特征
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業(yè)或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 邊界安全技術培訓內容課件
- 數學奧林匹克競賽模擬試題真題及答案
- 神經內科??谱o士試題(四)及答案
- 車隊雨季安全培訓總結課件
- 車間級生產安全培訓課件
- 酒店客房設備維護與故障處理制度
- 酒店設備設施報廢制度
- 車間級別安全培訓內容課件
- 銀行支付清算業(yè)務處理制度
- 2026年度第三季度醫(yī)保知識培訓考試試題及答案
- 2026長治日報社工作人員招聘勞務派遣人員5人備考題庫含答案
- 期末教師大會上校長精彩講話:師者當備三盆水(洗頭洗手洗腳)
- (2025)醫(yī)院醫(yī)療質量安全管控與不良事件防范專項總結(3篇)
- 2026年江西制造職業(yè)技術學院單招職業(yè)適應性考試模擬測試卷附答案
- 《中國特色高水平高職學校和專業(yè)建設計劃(2025-2029年)》深度解讀課件
- 2025耐高壓置入導管增強CT使用與安全專家共識課件
- 內蒙古能源集團招聘筆試題庫2026
- 2025四川雅安市名山區(qū)茗投產業(yè)集團有限公司招聘合同制員工10人參考題庫附答案
- 生產線操作員技能培訓規(guī)范手冊
- 人工智能應用與實踐 課件 -第5章-智能體開發(fā)與應用
- 林草監(jiān)測與保護:空天地一體化體系構建方案
評論
0/150
提交評論