ApacheKylin在大數(shù)據(jù)系統(tǒng)中應(yīng)用課件

上傳人：2*** IP屬地：貴州上傳時間：2023-09-21 格式：PPT 頁數(shù)：36 大小：2.86MB 積分：25 舉報 版權(quán)申訴

ApacheKylin在大數(shù)據(jù)系統(tǒng)中應(yīng)用課件_第2頁

ApacheKylin在大數(shù)據(jù)系統(tǒng)中應(yīng)用課件_第3頁

ApacheKylin在大數(shù)據(jù)系統(tǒng)中應(yīng)用課件_第4頁

ApacheKylin在大數(shù)據(jù)系統(tǒng)中應(yīng)用課件_第5頁

已閱讀5頁，還剩31頁未讀，繼續(xù)免費閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進行舉報或認領(lǐng)

文檔簡介

Apache

KylinOLAP

HadoopApacheKylinOLAPonHadoop1http://kylin.ioAgenda

What’s

Apache

Kylin?Tech

HighlightsPerformanceRoadmapQ&Ahttp://kylin.ioAgendaWhat’sA2Extreme

OLAP

Engine

for

Big

DataKylin

open

source

Distributed

Analytics

Engine

from

eBay

thatprovides

SQL

interface

and

multi-dimensional

analysis

(OLAP)

onHadoop

supporting

extremely

large

datasetsWhat’s

Kylinkylin

?ki??l?n

麒麟--n.

(in

Chinese

art)

mythical

animal

composite

form?

Open

Sourced

Oct

1st,

2014?

accepted

Apache

Incubator

Project

Nov

25th,

2014ExtremeOLAPEngineforBigDa3

Big

Data

Era

and

data

becoming

available

Hadoop

Limitations

existing

Business

Intelligence

(BI)

Tools

Limited

support

for

HadoopData

size

growing

exponentiallyHigh

latency

interactive

queriesScale-Up

architecture

Challenges

adopt

Hadoop

interactive

analysis

system

Majority

analyst

groups

are

SQL

savvyNo

mature

SQL

interface

HadoopOLAP

capability

Hadoop

ecosystem

not

ready

yetBigDataEraLimitedsupport45

Why

notBuild

engine

from

scratch?5 Whynot5

Extreme

Scale

OLAP

EngineKylin

designed

query

10+

billions

rows

Hadoop

ANSI

SQL

Interface

HadoopKylin

offers

ANSI

SQL

Hadoop

and

supports

most

ANSI

SQL

query

functions

Seamless

Integration

with

ToolsKylin

currently

offers

integration

capability

with

Tools

Tableau.

Interactive

Query

CapabilityUsers

can

interact

with

Hive

tables

sub-second

latency

MOLAP

CubeDefine

data

model

from

Hive

tables

and

pre-build

Kylin

Scale

Out

ArchitectureQuery

server

cluster

supports

thousands

concurrent

users

and

provide

high

availabilityFeatures

HighlightsExtremeScaleOLAPEngineKyli6

Compression

and

Encoding

SupportIncremental

Refresh

CubesApproximate

Query

Capability

for

distinct

count

(HyperLogLog)Leverage

HBase

Coprocessor

for

query

latencyJob

Management

and

MonitoringEasy

Web

interface

manage,

build,

monitor

and

query

cubesSecurity

capability

set

ACL

Cube/Project

LevelSupport

LDAP

IntegrationFeatures

Highlights…CompressionandEncodingSupp7Cube

DesignerCubeDesigner8Job

ManagementJobManagement9Query

and

VisualizationQueryandVisualization10Tableau

IntegrationTableauIntegration11CaseCubeSizeRawRecordsUserSessionAnalysis26TB28+billionrowsClassifiedTrafficAnalysis21TB20+billionrowsGeoXBehaviorAnalysis560GB1.2+billionrows

eBay

90%

query

seconds

Baidu

Map

internal

analysis

Many

other

Proof

Concepts

Bloomberg

Law,

British

GAS,

JD,

Microsoft,

StubHub,

Tableau

…Who

are

using

KylinCaseCubeSizeRawRecordsUserSess12http://kylin.ioAgenda

What’s

Apache

Kylin?Tech

HighlightsPerformanceRoadmapQ&Ahttp://kylin.ioAgendaWhat’sA13OLAPCubeKylin

Architecture

Overview15SQL-Based

Tool

(BI

Tools:

Tableau…)

JDBC/ODBC

SQL

Online

AnalysisData

Flow

Offline

Data

Flow

Clients/Users

interactive

with

Kylin

via

SQL

OLAP

Cube

transparent

users

Mid

Latency

MinutesHadoop

Hive

Star

Schema

DataLow

Latency

-Seconds

Data

Cube

(HBase)

Key

Value

Data3rd

Party

App(Web

App,

Mobile…)

REST

API

SQL

REST

Server

Query

Engine

Routing

MetadataCube

Build

Engine

(MapReduce…)OLAPCubeKylinArchitectureOve14Cube:

…Fact

Table:

…Dimensions:

…Measures:

…Storage(HBase):

…DimDimDimFact

SourceStar

SchemaColumn

FamilyRow

Key

row

CColumn

Val

TargetHBase

Storage

MappingCube

MetadataData

Modeling

End

UserCube

ModelerAdminCube:…FactTable:…DimDimDim 15time,

itemtime,

item,

locationtime,

item,

location,

suppliertimeitemlocationsuppliertime,

locationTime,

supplieritem,

locationitem,

supplierlocation,

suppliertime,

item,

suppliertime,

location,

supplieritem,

location,

supplier1-D

cuboids2-D

cuboids3-D

cuboids4-D(base)

cuboid?Base

vs.

aggregate

cells;

ancestor

vs.

descendant

cells;

parent

vs.

child

cells1.2.3.4.5.(9/15,

milk,

Urbana,

Dairy_land)

<time,

item,

location,

supplier>(9/15,

milk,

Urbana,

<time,

item,

location>(*,

milk,

Urbana,

<item,

location>(*,

milk,

Chicago,

<item,

location>(*,

milk,

<item>??OLAP

Cube

–

Balance

between

Space

and

Time

Cuboid

one

combination

dimensions

Cube

all

combination

dimensions

(all

cuboids)

0-D(apex)

cuboidtime,itemtime,item,location16Cube

Build

Job

FlowCubeBuildJobFlow17How

Store

Cube?

–

HBase

SchemaHowToStoreCube?–HBaseSch18

Dynamic

data

management

framework.Formerly

known

Optiq,

Calcite

Apache

incubator

project,

used

byApache

Drill

and

Apache

Hive,

among

others.How

Query

Cube?Query

Engine

–

CalciteDynamicdatamanagementframe19?????Metadata

SPI

–

Provide

table

schema

from

Kylin

metadataOptimize

Rule

–

Translate

the

logic

operator

into

Kylin

operatorRelational

Operator

–

Find

right

cube

–

Translate

SQL

into

storage

engine

API

call

–

Generate

physical

execute

plan

linq4j

java

implementationResult

Enumerator

–

Translate

storage

engine

result

into

java

implementation

result.SQL

Function

–

Add

HyperLogLog

for

distinct

count

–

Implement

date

time

functions

(i.e.

Quarter)How

Query

Cube?Kylin

Extensions

Calcite?MetadataSPIHowtoQueryCube20Query

Engine

–

Kylin

Explain

PlanSELECT

test_cal_dt.week_beg_dt,test_category.category_name,

test_category.lvl2_name,

test_category.lvl3_name,test_kylin_fact.lstg_format_name,

test_sites.site_name,

SUM(test_kylin_fact.price)

GMV,

COUNT(*)

TRANS_CNTFROM

test_kylin_factLEFT

JOIN

test_cal_dt

test_kylin_fact.cal_dt

test_cal_dt.cal_dtLEFT

JOIN

test_category

test_kylin_fact.leaf_categ_id

test_category.leaf_categ_id

AND

test_kylin_fact.lstg_site_id

=test_category.site_idLEFT

JOIN

test_sites

test_kylin_fact.lstg_site_id

test_sites.site_idWHERE

test_kylin_fact.seller_id

123456OR

test_kylin_fact.lstg_format_name

’New'GROUP

test_cal_dt.week_beg_dt,

test_category.category_name,

test_category.lvl2_name,

test_category.lvl3_name,test_kylin_fact.lstg_format_name,test_sites.site_nameOLAPToEnumerableConverterOLAPProjectRel(WEEK_BEG_DT=[$0],

category_name=[$1],

CATEG_LVL2_NAME=[$2],CATEG_LVL3_NAME=[$3],LSTG_FORMAT_NAME=[$4],

SITE_NAME=[$5],

GMV=[CASE(=($7,

0),

null,

$6)],

TRANS_CNT=[$8])OLAPAggregateRel(group=[{0,

5}],

agg#0=[$SUM0($6)],

agg#1=[COUNT($6)],

TRANS_CNT=[COUNT()])

OLAPProjectRel(WEEK_BEG_DT=[$13],

category_name=[$21],

CATEG_LVL2_NAME=[$15],

CATEG_LVL3_NAME=[$14],LSTG_FORMAT_NAME=[$5],

SITE_NAME=[$23],

PRICE=[$0])

OLAPFilterRel(condition=[OR(=($3,

123456),

=($5,

’New'))])OLAPJoinRel(condition=[=($2,

$25)],

joinType=[left])OLAPJoinRel(condition=[AND(=($6,

$22),

=($2,

$17))],

joinType=[left])OLAPJoinRel(condition=[=($4,$12)],

joinType=[left])OLAPTableScan(table=[[DEFAULT,

TEST_KYLIN_FACT]],

fields=[[0,1,

10,

11]])OLAPTableScan(table=[[DEFAULT,

TEST_CAL_DT]],

fields=[[0,1]])OLAPTableScan(table=[[DEFAULT,

test_category]],

fields=[[0,1,

8]])OLAPTableScan(table=[[DEFAULT,

TEST_SITES]],

fields=[[0,1,

2]])QueryEngine–KylinExplainP21

Plugin-able

storage

engine

Common

iterator

interface

for

storage

engineIsolate

query

engine

from

underline

storage

Translate

cube

query

into

HBase

table

scan

Columns,

Groups

Cuboid

IDFilters

Scan

Range

(Row

Key)Aggregations

Measure

Columns

(Row

Values)

Scan

HBase

table

and

translate

HBase

result

into

cube

result

HBase

Result

(key

value)

Cube

Result

(dimensions

measures)How

Query

Cube?Storage

EnginePlugin-ablestorageengineCo22

Curse

dimensionality:

dimension

cube

has

cuboid

Full

Cube

vs.

Partial

Cube

Hugh

data

volume

Dictionary

EncodingIncremental

BuildingHow

Optimize

Cube?Cube

OptimizationCurseofdimensionality:Ndi23

Full

Cube

Pre-aggregate

all

dimension

combinations“Curse

dimensionality”:

dimension

cube

has

cuboid.

Partial

Cube

avoid

dimension

explosion,

divide

the

dimensions

intodifferent

aggregation

groups

2N+M+L

For

cube

with

dimensions,

divide

these

dimensions

into

3group,

the

cuboid

number

will

reduce

from

Billion

Thousands

230

210

Tradeoff

between

online

aggregation

and

offline

pre-aggregationHow

Optimize

Cube?Full

Cube

vs.

Partial

CubeFullCubePre-aggregatealld24How

Optimize

Cube?Partial

CubeHowtoOptimizeCube?PartialC25

Data

cube

has

lost

duplicated

dimension

valuesDictionary

maps

dimension

values

into

IDs

that

will

reduce

the

memory

and

storagefootprint.Dictionary

based

TrieHow

Optimize

Cube?Dictionary

EncodingDatacubehaslostofduplica26How

Optimize

Cube?Incremental

BuildHowtoOptimizeCube?Increment27CubeInvertedIndexStorageformatPre-aggregatedcuboidsSharding,columnarstorage,withinvertedindexonrowblocksQuerymethodCuboidscanningMassiveparallelprocessingStrengthPre-aggregatehugehistoricdatatosmallsummariesSwiftresponsetoreal-timedataWeaknessTaketimetobuildSlowatscanninglargedatavolumeStreaming,

ongoing

effort

Cube

great,

but…

Sometimes

want

drill

down

row

level

informationCube

takes

time

build,

how

about

real-time

analysis?

Streaming

with

inverted

indexCubeInvertedIndexStorageformat28streamingKarfkahourly/dailybatchminutes

batch

Inverted

IndexReal-time

StoreKylin

0.8,

Lambda

Architecture

SQL

Query

Hybrid

Storage

Interface

CubeHistoric

StorestreamingKarfkahourly/dailybat29http://kylin.ioAgenda

What’s

Apache

Kylin?Tech

HighlightsPerformanceRoadmapQ&Ahttp://kylin.ioAgendaWhat’sA30Kylin

vs.

Hive#QueryTypeReturn

DatasetQueryOn

Kylin

(s)QueryOn

Hive

(s)Comments1High

LevelAggregation40.129157.4371,217

times23Analysis

QueryDrill

Down

toDetail22,669325,0291.61512.058109.206113.12368

times9

times4Drill

Down

toDetail524,78022.426383.21278

times5Data

Dump972,00249.054N/A100

0200150SQL

#1SQL

#2SQL

#3HiveKylinHighLevelAggregatio

nAnalysis

QueryDrillDownto

DetailLow

LevelAggregatio

nTransactio

LevelBased

12+B

records

caseKylinvs.Hive#QueryTypeReturn31Performance

ConcurrencyLinear

scale

out

with

nodesPerformance--ConcurrencyLine32Performance

Query

Latency90%

queries

<5sGreen

Line:

90%tile

queriesGray

Line:

95%tile

queriesPerformance-QueryLatency90%33http://kylin.ioAgenda

What’s

Apache

Kylin

人人文庫> 全部分類> 教育資料 > 輔導(dǎo)培訓(xùn)

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責(zé)。
6. 下載文件中如有侵權(quán)或不適當內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

ApacheKylin在大數(shù)據(jù)系統(tǒng)中應(yīng)用課件

文檔簡介

溫馨提示

最新文檔

評論

ApacheKylin在大數(shù)據(jù)系統(tǒng)中應(yīng)用課件

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔