1. 漏斗分析定義
2. 有序漏斗windowFunnel
Clickhouse數(shù)據(jù)模型之有序漏斗分析
windowFunnel(window, [mode])(timestamp, cond1, cond2, ..., condN)
參數(shù)
window — 滑動窗戶的大小,單位是秒。
mode - 這是一個可選的參數(shù)。
‘strict’ - 當(dāng) ‘strict’ 設(shè)置時,windowFunnel()僅對唯一值應(yīng)用匹配條件。
timestamp — 包含時間的列。 數(shù)據(jù)類型支持: 日期, 日期時間 和其他無符號整數(shù)類型(請注意,即使時間戳支持 UInt64 類型,它的值不能超過Int64最大值,即2^63-1)。
cond — 事件鏈的約束條件。 UInt8 類型。
如果數(shù)據(jù)在不同的完成點具有多個事件鏈,則該函數(shù)將僅輸出最長鏈的大小
數(shù)據(jù)
CREATE TABLE nonodb.log_action_tf
( uid Int32,
event_type String,
eventTime DateTime )
ENGINE = MergeTree
PARTITION BY uid
ORDER BY (uid, eventTime)
SETTINGS index_granularity = 8192;
insert into nonodb.log_action_tf values(1,'瀏覽','2022-01-02 11:00:00');
insert into nonodb.log_action_tf values(1,'點擊','2022-01-02 11:10:00');
insert into nonodb.log_action_tf values(1,'下單','2022-01-02 11:20:00');
insert into nonodb.log_action_tf values(1,'支付','2022-01-02 11:30:00');
insert into nonodb.log_action_tf values(2,'下單','2022-01-02 11:00:00');
insert into nonodb.log_action_tf values(2,'支付','2022-01-02 11:10:00');
insert into nonodb.log_action_tf values(1,'瀏覽','2022-01-02 11:00:00');
insert into nonodb.log_action_tf values(3,'瀏覽','2022-01-02 11:20:00');
insert into nonodb.log_action_tf values(3,'點擊','2022-01-02 12:00:00');
insert into nonodb.log_action_tf values(4,'瀏覽','2022-01-02 11:50:00');
insert into nonodb.log_action_tf values(4,'點擊','2022-01-02 12:00:00');
insert into nonodb.log_action_tf values(5,'瀏覽','2022-01-02 11:50:00');
insert into nonodb.log_action_tf values(5,'點擊','2022-01-02 12:00:00');
insert into nonodb.log_action_tf values(5,'下單','2022-01-02 11:10:00');
insert into nonodb.log_action_tf values(6,'瀏覽','2022-01-02 11:50:00');
insert into nonodb.log_action_tf values(6,'點擊','2022-01-02 12:00:00');
insert into nonodb.log_action_tf values(6,'下單','2022-01-02 12:10:00');
示例
SELECT
uid,
windowFunnel(1800)(eventTime, event_type = '瀏覽', event_type = '點擊', event_type = '下單', event_type = '支付') AS level
FROM
(
SELECT
eventTime,
event_type,
uid
FROM nonodb.log_action_tf
)
GROUP BY uid
user level
4 2
3 1
2 0
5 2
1 4
6 3
分析"2022-01-02"這天 路徑為“瀏覽->點擊->下單->支付”的轉(zhuǎn)化情況
select
uid,
arrayWithConstant(level, 1) levels,
arrayJoin(arrayEnumerate( levels )) level_index
from (
SELECT
uid,
windowFunnel(1800)(
eventTime,
event_type = '瀏覽',
event_type = '點擊' ,
event_type = '下單',
event_type = '支付'
) AS level
FROM (
SELECT eventTime, event_type , uid
FROM nonodb.log_action_tf
WHERE toDate(eventTime) = '2022-01-02'
) t1
group by uid)
4 [1,1] 1
4 [1,1] 2
3 [1] 1
5 [1,1] 1
5 [1,1] 2
1 [1,1,1,1] 1
1 [1,1,1,1] 2
1 [1,1,1,1] 3
1 [1,1,1,1] 4
6 [1,1,1] 1
6 [1,1,1] 2
6 [1,1,1] 3
--- ---
select
level_index,count(1) as ct
from
(
select
uid,
arrayWithConstant(level, 1) levels,
arrayJoin(arrayEnumerate(levels)) level_index
from (
SELECT
uid,
windowFunnel(1800)(
eventTime,
event_type = '瀏覽',
event_type = '點擊' ,
event_type = '下單',
event_type = '支付'
) AS level
FROM (
SELECT eventTime, event_type , uid
FROM nonodb.log_action_tf
WHERE toDate(eventTime) = '2022-01-02'
) t1
group by uid) t2
)t3 group by level_index order by level_index
--- ---
SELECT transform(level_index,[1,2,3,4],['瀏覽','點擊','下單','支付'],'其他') as event,
count(1)
FROM (
select
uid,
arrayWithConstant(level, 1) levels,
arrayJoin(arrayEnumerate(levels)) level_index
from (
SELECT
uid,
windowFunnel(1800)(
eventTime,
event_type = '瀏覽',
event_type = '點擊' ,
event_type = '下單',
event_type = '支付'
) AS level
FROM (
SELECT eventTime, event_type , uid
FROM nonodb.log_action_tf
WHERE toDate(eventTime) = '2022-01-02'
) t1
group by uid) t2 )
group by level_index
ORDER BY level_index ;
這個函數(shù)看起來很強大,但是少了點什么,我理解的流量分析滑動窗口不太一樣
痛點:很顯然,如果數(shù)據(jù)量超過100億往上,clickhouse大概就拉了,比較好的方法還是結(jié)合bitmap進(jìn)行編碼,這里有篇文章可以參考一下的
每天數(shù)百億用戶行為數(shù)據(jù),美團(tuán)點評怎么實現(xiàn)秒級轉(zhuǎn)化分析?
轉(zhuǎn)化率計算
neighbor
uniqCombined | ClickHouse Docs
housepower/olap2018: 易觀第二屆OLAP漏斗算法大賽
select
level,
uniqCombined(uid) AS pv,
neighbor(pv, -1) AS prev_pv,
if (prev_pv = 0, -1, round(pv / prev_pv * 100, 3)) AS rate
from (
SELECT
uid,
windowFunnel(1800)(
eventTime,
event_type = '瀏覽',
event_type = '點擊' ,
event_type = '下單',
event_type = '支付'
) AS level
FROM (
SELECT eventTime, event_type , uid
FROM nonodb.log_action_tf
WHERE toDate(eventTime) = '2022-01-02'
) t1
group by uid ) WHERE level > 0 GROUP BY level order by level ;
上面這種是這樣的,如果一個uid路徑是4層,那么他可能走了第一層,著看產(chǎn)品的口徑了,如果只算走了第一步的是1,走了四步的只算作4,那么就是上面這種口徑
select
level,
sum(uid) AS pv,
neighbor(pv, 1) AS prev_pv,
if (prev_pv = 0, -1, round(pv / prev_pv * 100, 3)) AS rate
from (
select
level_index as level,count(1) as uid
from
(
select
uid,
arrayWithConstant(level, 1) levels,
arrayJoin(arrayEnumerate(levels)) level_index
from (
SELECT
uid,
windowFunnel(1800)(
eventTime,
event_type = '瀏覽',
event_type = '點擊' ,
event_type = '下單',
event_type = '支付'
) AS level
FROM (
SELECT eventTime, event_type , uid
FROM nonodb.log_action_tf
WHERE toDate(eventTime) = '2022-01-02'
) t1
group by uid) t2
)t3 group by level_index order by level_index)WHERE level > 0 GROUP BY level order by level ;
3. 無序漏斗分析
groupArray | ClickHouse Docs
Array Functions | ClickHouse Docs
Clickhouse中的Array類型
案例:
select groupArray(num)
from (
select 1 as num union all select 2 union all select 3
)
select arrayCount(x -> x=1,[1,1,2,2] )
SELECT day AS day,
sum(level1_pv) AS level1_pv,
sum(level2_pv) AS level2_pv,
sum(level1_uv) as level1_uv,
sum(level2_uv) as level2_uv
from (
select toDate(eventTime) as day,
uid,
groupArray(event_type) as events,
arrayCount(x-> x = '瀏覽', events) as level1_pv,
if(has(events, '瀏覽'), arrayCount(x-> x = '點擊', events),0) as level2_pv,
hasAll(events, ['瀏覽']) as level1_uv,
hasAll(events, ['瀏覽','點擊']) as level2_uv
from nonodb.log_action_tf
where toDate(eventTime) >= '2021-01-01'
group by uid,toDate(eventTime) ) group by day order by day;