Clickhouse(流量分析(一).漏斗分析案例)

1. 漏斗分析定義

神策用戶分析模型——漏斗分析的使用方法

2. 有序漏斗windowFunnel

Clickhouse數(shù)據(jù)模型之有序漏斗分析

Hologres漏斗分析函數(shù)

Java UDF StarRocks Docs

windowFunnel(window, [mode])(timestamp, cond1, cond2, ..., condN)
參數(shù)

window — 滑動窗戶的大小,單位是秒。
mode - 這是一個可選的參數(shù)。
‘strict’ - 當(dāng) ‘strict’ 設(shè)置時,windowFunnel()僅對唯一值應(yīng)用匹配條件。
timestamp — 包含時間的列。 數(shù)據(jù)類型支持: 日期, 日期時間 和其他無符號整數(shù)類型(請注意,即使時間戳支持 UInt64 類型,它的值不能超過Int64最大值,即2^63-1)。
cond — 事件鏈的約束條件。 UInt8 類型。

如果數(shù)據(jù)在不同的完成點具有多個事件鏈,則該函數(shù)將僅輸出最長鏈的大小

數(shù)據(jù)
CREATE TABLE nonodb.log_action_tf
( uid Int32,
event_type String,
eventTime DateTime )
ENGINE = MergeTree
PARTITION BY uid
ORDER BY (uid, eventTime)
SETTINGS index_granularity = 8192;



insert into nonodb.log_action_tf values(1,'瀏覽','2022-01-02 11:00:00');
insert into nonodb.log_action_tf values(1,'點擊','2022-01-02 11:10:00');
insert into nonodb.log_action_tf values(1,'下單','2022-01-02 11:20:00');
insert into nonodb.log_action_tf values(1,'支付','2022-01-02 11:30:00');
insert into nonodb.log_action_tf values(2,'下單','2022-01-02 11:00:00');
insert into nonodb.log_action_tf values(2,'支付','2022-01-02 11:10:00');
insert into nonodb.log_action_tf values(1,'瀏覽','2022-01-02 11:00:00');
insert into nonodb.log_action_tf values(3,'瀏覽','2022-01-02 11:20:00');
insert into nonodb.log_action_tf values(3,'點擊','2022-01-02 12:00:00');
insert into nonodb.log_action_tf values(4,'瀏覽','2022-01-02 11:50:00');
insert into nonodb.log_action_tf values(4,'點擊','2022-01-02 12:00:00');
insert into nonodb.log_action_tf values(5,'瀏覽','2022-01-02 11:50:00');
insert into nonodb.log_action_tf values(5,'點擊','2022-01-02 12:00:00');
insert into nonodb.log_action_tf values(5,'下單','2022-01-02 11:10:00');
insert into nonodb.log_action_tf values(6,'瀏覽','2022-01-02 11:50:00');
insert into nonodb.log_action_tf values(6,'點擊','2022-01-02 12:00:00');
insert into nonodb.log_action_tf values(6,'下單','2022-01-02 12:10:00');
示例
SELECT
    uid,
    windowFunnel(1800)(eventTime, event_type = '瀏覽', event_type = '點擊', event_type = '下單', event_type = '支付') AS level
FROM
(
    SELECT
        eventTime,
        event_type,
        uid
    FROM nonodb.log_action_tf
)
GROUP BY uid

user  level
4   2
3   1
2   0
5   2
1   4
6   3

ClickHouse數(shù)組函數(shù)

漏斗分析模型

分析"2022-01-02"這天 路徑為“瀏覽->點擊->下單->支付”的轉(zhuǎn)化情況

select
        uid,
        arrayWithConstant(level, 1) levels,
        arrayJoin(arrayEnumerate( levels )) level_index
       from (
 SELECT
          uid,
          windowFunnel(1800)(
            eventTime,
            event_type = '瀏覽',
            event_type = '點擊' ,
            event_type = '下單',
            event_type = '支付'
          ) AS level
        FROM (
          SELECT  eventTime,  event_type , uid
          FROM nonodb.log_action_tf
          WHERE toDate(eventTime) = '2022-01-02'
        ) t1
group by uid)

4   [1,1]   1
4   [1,1]   2
3   [1] 1
5   [1,1]   1
5   [1,1]   2
1   [1,1,1,1]   1
1   [1,1,1,1]   2
1   [1,1,1,1]   3
1   [1,1,1,1]   4
6   [1,1,1] 1
6   [1,1,1] 2
6   [1,1,1] 3

--- ---
select
       level_index,count(1) as ct
       from
           (
               select
        uid,
               arrayWithConstant(level, 1) levels,
               arrayJoin(arrayEnumerate(levels)) level_index
       from (
 SELECT
          uid,
          windowFunnel(1800)(
            eventTime,
            event_type = '瀏覽',
            event_type = '點擊' ,
            event_type = '下單',
            event_type = '支付'
          ) AS level
        FROM (
          SELECT  eventTime,  event_type , uid
          FROM nonodb.log_action_tf
          WHERE toDate(eventTime) = '2022-01-02'
        ) t1
group by uid) t2
           )t3 group by level_index order by  level_index

--- ---
SELECT  transform(level_index,[1,2,3,4],['瀏覽','點擊','下單','支付'],'其他') as event,
        count(1)
FROM (
select
        uid,
               arrayWithConstant(level, 1) levels,
               arrayJoin(arrayEnumerate(levels)) level_index
       from (
 SELECT
          uid,
          windowFunnel(1800)(
            eventTime,
            event_type = '瀏覽',
            event_type = '點擊' ,
            event_type = '下單',
            event_type = '支付'
          ) AS level
        FROM (
          SELECT  eventTime,  event_type , uid
          FROM nonodb.log_action_tf
          WHERE toDate(eventTime) = '2022-01-02'
        ) t1
group by uid) t2 )
group by level_index
ORDER BY level_index ;

這個函數(shù)看起來很強大,但是少了點什么,我理解的流量分析滑動窗口不太一樣

痛點:很顯然,如果數(shù)據(jù)量超過100億往上,clickhouse大概就拉了,比較好的方法還是結(jié)合bitmap進(jìn)行編碼,這里有篇文章可以參考一下的

每天數(shù)百億用戶行為數(shù)據(jù),美團(tuán)點評怎么實現(xiàn)秒級轉(zhuǎn)化分析?

轉(zhuǎn)化率計算

neighbor
uniqCombined | ClickHouse Docs
housepower/olap2018: 易觀第二屆OLAP漏斗算法大賽

select
    level,
    uniqCombined(uid) AS pv,
    neighbor(pv, -1) AS prev_pv,
    if (prev_pv = 0, -1, round(pv / prev_pv * 100, 3)) AS rate
        from (
SELECT
          uid,
          windowFunnel(1800)(
            eventTime,
            event_type = '瀏覽',
            event_type = '點擊' ,
            event_type = '下單',
            event_type = '支付'
          ) AS level
        FROM (
          SELECT  eventTime,  event_type , uid
          FROM nonodb.log_action_tf
          WHERE toDate(eventTime) = '2022-01-02'
        ) t1
group by uid ) WHERE level > 0  GROUP BY level order by  level ;

上面這種是這樣的,如果一個uid路徑是4層,那么他可能走了第一層,著看產(chǎn)品的口徑了,如果只算走了第一步的是1,走了四步的只算作4,那么就是上面這種口徑

select
    level,
    sum(uid) AS pv,
    neighbor(pv, 1) AS prev_pv,
    if (prev_pv = 0, -1, round(pv / prev_pv * 100, 3)) AS rate
        from (
select
       level_index as level,count(1) as uid
       from
           (
               select
        uid,
               arrayWithConstant(level, 1) levels,
               arrayJoin(arrayEnumerate(levels)) level_index
       from (
 SELECT
          uid,
          windowFunnel(1800)(
            eventTime,
            event_type = '瀏覽',
            event_type = '點擊' ,
            event_type = '下單',
            event_type = '支付'
          ) AS level
        FROM (
          SELECT  eventTime,  event_type , uid
          FROM nonodb.log_action_tf
          WHERE toDate(eventTime) = '2022-01-02'
        ) t1
group by uid) t2
           )t3 group by level_index order by  level_index)WHERE level > 0  GROUP BY level order by  level ;

3. 無序漏斗分析

groupArray | ClickHouse Docs
Array Functions | ClickHouse Docs
Clickhouse中的Array類型

案例:
select  groupArray(num)
from (
     select 1 as num union all  select  2 union all  select  3
         )

select  arrayCount(x -> x=1,[1,1,2,2] )

SELECT day           AS day,
       sum(level1_pv) AS level1_pv,
       sum(level2_pv) AS level2_pv,
       sum(level1_uv) as level1_uv,
       sum(level2_uv) as level2_uv
from (
select toDate(eventTime) as day,
       uid,
       groupArray(event_type) as events,
       arrayCount(x-> x = '瀏覽', events)  as level1_pv,
       if(has(events, '瀏覽'), arrayCount(x-> x = '點擊', events),0) as level2_pv,
       hasAll(events, ['瀏覽'])  as level1_uv,
       hasAll(events, ['瀏覽','點擊'])  as level2_uv
from nonodb.log_action_tf
where toDate(eventTime) >= '2021-01-01'
group by uid,toDate(eventTime) ) group by day order by  day;
最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容