問(wèn)題起因
最近在做一個(gè)基于時(shí)間的統(tǒng)計(jì)功能,大體需求是統(tǒng)計(jì)按照 1min、10min、2h、24h 為窗口大小進(jìn)行數(shù)據(jù)統(tǒng)計(jì)。原始數(shù)據(jù)的時(shí)間字段是 ms 時(shí)間戳,思路很簡(jiǎn)單就是直接用時(shí)間戳減去窗口大小余數(shù),這種方式對(duì) 1min、10min、2h 的處理都沒(méi)有問(wèn)題,但是對(duì) 24h 的窗口處理就會(huì)有問(wèn)題,可以見(jiàn)下面的測(cè)試。
/**
* 獲取指定時(shí)間時(shí)間戳歸屬的時(shí)間窗口,盡量刻度到當(dāng)天
*
* @param timestamp 時(shí)間戳
* @param scale 刻度,ms
* @return {@link long}
*/
public static long getTimestampWindow(long timestamp, long scale) {
long remain = timestamp % scale;
return timestamp - remain ;
}
上面就是使用的時(shí)間戳窗口計(jì)算算法。
public static void main(String[] args) throws ParseException {
long scale = TimeUnit.HOURS.toMillis(24);
for (int i = 0; i < 24; i++) {
Date date = DateUtils.addHours(DateUtils.parseDate("2022-06-24", DateUtil.DATE_PATTERN), i);
Date scaledDate = new Date(getTimestampWindow(date.getTime(), scale));
System.out.printf("當(dāng)前時(shí)間:%s,窗口時(shí)間:%s%n",
DateFormatUtils.format(date, DateUtil.LONG_DATE_PATTERN),
DateFormatUtils.format(scaledDate, DateUtil.LONG_DATE_PATTERN));
}
}
當(dāng)使用上面的測(cè)試程序進(jìn)行測(cè)試的時(shí)候發(fā)現(xiàn),窗口并未預(yù)期的顯示是“2022-06-24 00:00:00”,而是輸出了“2022-06-23 08:00:00”及“2022-06-24 08:00:00”兩種窗口,一天的數(shù)據(jù)出現(xiàn)了歸屬跨天問(wèn)題。
當(dāng)然如果使用 Java 里面的日期函數(shù)可以很簡(jiǎn)單的解決這個(gè)問(wèn)題,但是我們返回時(shí)間數(shù)據(jù)是時(shí)間戳,直接做算術(shù)運(yùn)算肯定是效率最高的。
當(dāng)前時(shí)間:2022-06-24 00:00:00,窗口時(shí)間:2022-06-23 08:00:00
當(dāng)前時(shí)間:2022-06-24 01:00:00,窗口時(shí)間:2022-06-23 08:00:00
當(dāng)前時(shí)間:2022-06-24 02:00:00,窗口時(shí)間:2022-06-23 08:00:00
當(dāng)前時(shí)間:2022-06-24 03:00:00,窗口時(shí)間:2022-06-23 08:00:00
當(dāng)前時(shí)間:2022-06-24 04:00:00,窗口時(shí)間:2022-06-23 08:00:00
當(dāng)前時(shí)間:2022-06-24 05:00:00,窗口時(shí)間:2022-06-23 08:00:00
當(dāng)前時(shí)間:2022-06-24 06:00:00,窗口時(shí)間:2022-06-23 08:00:00
當(dāng)前時(shí)間:2022-06-24 07:00:00,窗口時(shí)間:2022-06-23 08:00:00
當(dāng)前時(shí)間:2022-06-24 08:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 09:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 10:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 11:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 12:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 13:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 14:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 15:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 16:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 17:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 18:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 19:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 20:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 21:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 22:00:00,窗口時(shí)間:2022-06-24 08:00:00
當(dāng)前時(shí)間:2022-06-24 23:00:00,窗口時(shí)間:2022-06-24 08:00:00
問(wèn)題分析
這是為什么呢?核心原因是因?yàn)槲覀兲幵诘臅r(shí)區(qū)是東 8 區(qū),在比格林尼治時(shí)間早 8 小時(shí),時(shí)間戳 0 在零時(shí)區(qū)(UTC/GMT 0 )表示的是“1970-01-01 00:00:00”,而在東 8 區(qū)(UTC/GMT +8.00)則表示的是“1970-01-01 08:00:00”。
public static void main(String[] args) {
System.out.println(TimeZone.getDefault());
System.out.println(DateFormatUtils.format(new Date(0), DateUtil.LONG_DATE_PATTERN));
System.out.println(TimeZone.getTimeZone("UTC"));
System.out.println(DateFormatUtils.format(new Date(0), DateUtil.LONG_DATE_PATTERN, TimeZone.getTimeZone("UTC")));
}
sun.util.calendar.ZoneInfo[id="Asia/Shanghai",offset=28800000,dstSavings=0,useDaylight=false,transitions=19,lastRule=null]
1970-01-01 08:00:00
sun.util.calendar.ZoneInfo[id="UTC",offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
1970-01-01 00:00:00
我們這里只討論能被 24 小時(shí)整除的窗口,也就是 2h、12h 這樣的,而不討論 5h,7h 這樣的窗口,因?yàn)閷?duì)后面的窗口,勢(shì)必存在跨天問(wèn)題。也就是說(shuō)如果用取余的方式來(lái)計(jì)算時(shí)間窗口的話,當(dāng)時(shí)間能被 24 整除但是如果大于 8 小時(shí)(12h、24h)或者不能被 8 整除(3h、6h)時(shí)候就會(huì)出現(xiàn)歸屬窗口跨天問(wèn)題。



如上圖所示,其中帶 - 號(hào)的代表上一天的時(shí)間,大家可以想下是不是這樣?這個(gè)有點(diǎn)繞,一定要記得起始時(shí)間戳 0 代表的時(shí)間是 8 點(diǎn)。
如果能理解這個(gè),其實(shí)就會(huì)發(fā)現(xiàn)對(duì)一個(gè) 24h 的窗口來(lái)說(shuō),今天 1 點(diǎn)的數(shù)據(jù)歸屬到昨天的 8 小時(shí)這個(gè)窗口是正常的,但是這個(gè)的確看起來(lái)很怪。如果時(shí)間窗口是 24h,對(duì)我們的思維來(lái)說(shuō),今天所有產(chǎn)生的數(shù)據(jù)就應(yīng)該是歸屬到今天。
問(wèn)題解決
既然知道了問(wèn)題的原因,也知道了需求方式,也就比較容易解決這個(gè)問(wèn)題。思想其實(shí)很簡(jiǎn)單,就是先把時(shí)間戳向后拉 8 小時(shí),讓“時(shí)間戳 0 代表的時(shí)間是 0 點(diǎn)”。在算完窗口之后,再將時(shí)間窗口向前拉 8 小時(shí),獲得真實(shí)的歸屬窗口。
/**
* 獲取指定時(shí)間時(shí)間戳歸屬的時(shí)間窗口,盡量刻度到當(dāng)天
*
* @param timestamp 時(shí)間戳
* @param scale 刻度,ms
* @return {@link long}
*/
public static long getTimestampWindow(long timestamp, long scale) {
timestamp = timestamp + TIMESTAMP_8H;
long remain = timestamp % scale;
return timestamp - remain - TIMESTAMP_8H;
}
算法變成如上所示,再運(yùn)行測(cè)試程序,就會(huì)發(fā)現(xiàn)時(shí)間窗口歸屬符合我們的期望了。
當(dāng)前時(shí)間:2022-06-24 00:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 01:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 02:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 03:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 04:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 05:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 06:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 07:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 08:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 09:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 10:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 11:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 12:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 13:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 14:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 15:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 16:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 17:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 18:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 19:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 20:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 21:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 22:00:00,窗口時(shí)間:2022-06-24 00:00:00
當(dāng)前時(shí)間:2022-06-24 23:00:00,窗口時(shí)間:2022-06-24 00:00:00