使用map轉(zhuǎn)換每個元素
In [1]: import pandas as pd
...: import datetime
...: from operator import methodcaller
In [2]: pd.options.display.max_rows = 10
In [3]: s = pd.Series(pd.date_range(pd.Timestamp('now'), periods=5))
In [4]: s
Out[4]:
0 2019-03-01 14:44:40.030313
1 2019-03-02 14:44:40.030313
2 2019-03-03 14:44:40.030313
3 2019-03-04 14:44:40.030313
4 2019-03-05 14:44:40.030313
dtype: datetime64[ns]
In [5]: s.map(lambda x: x.strftime('%d-%m-%Y'))
Out[5]:
0 01-03-2019
1 02-03-2019
2 03-03-2019
3 04-03-2019
4 05-03-2019
dtype: object
In [6]: s.map(methodcaller('strftime', '%d-%m-%Y'))
Out[6]:
0 01-03-2019
1 02-03-2019
2 03-03-2019
3 04-03-2019
4 05-03-2019
dtype: object
對Series中的每個Timestamp元素調(diào)用date方法獲得Datetime.date的raw對象
In [7]: s.map(methodcaller('date'))
Out[7]:
0 2019-03-01
1 2019-03-02
2 2019-03-03
3 2019-03-04
4 2019-03-05
dtype: object
In [8]: s.map(methodcaller('date')).values
Out[8]:
array([datetime.date(2019, 3, 1), datetime.date(2019, 3, 2),
datetime.date(2019, 3, 3), datetime.date(2019, 3, 4),
datetime.date(2019, 3, 5)], dtype=object)
等價方法是調(diào)用五綁定的Timestamp.date方法
In [9]: s.map(pd.Timestamp.date)
Out[9]:
0 2019-03-01
1 2019-03-02
2 2019-03-03
3 2019-03-04
4 2019-03-05
dtype: object
Timestamp.date方法高效且易讀。Timestamp方法可以在pandas頂級方法,即pandas.Timestamp。
DatetimeIndex的date屬性也可做類似的事。返回一個dtype=object的numpy對象。
In [10]: idx = pd.DatetimeIndex(s)
In [11]: idx
Out[11]:
DatetimeIndex(['2019-03-01 14:44:40.030313', '2019-03-02 14:44:40.030313',
'2019-03-03 14:44:40.030313', '2019-03-04 14:44:40.030313',
'2019-03-05 14:44:40.030313'],
dtype='datetime64[ns]', freq=None)
In [12]: idx.date
Out[12]:
array([datetime.date(2019, 3, 1), datetime.date(2019, 3, 2),
datetime.date(2019, 3, 3), datetime.date(2019, 3, 4),
datetime.date(2019, 3, 5)], dtype=object)
對于數(shù)據(jù)量大的datetime64[ns] Series,Timestamp.date性能好于operator.methodcaller,略微比lambda快。
In [13]: f1 = methodcaller('date')
...: f2 = lambda x: x.date()
...: f3 = pd.Timestamp.date
...: s2 = pd.Series(pd.date_range('20010101', periods=1000000, freq='T'))
...: s2
Out[13]:
0 2001-01-01 00:00:00
1 2001-01-01 00:01:00
2 2001-01-01 00:02:00
3 2001-01-01 00:03:00
4 2001-01-01 00:04:00
...
999995 2002-11-26 10:35:00
999996 2002-11-26 10:36:00
999997 2002-11-26 10:37:00
999998 2002-11-26 10:38:00
999999 2002-11-26 10:39:00
Length: 1000000, dtype: datetime64[ns]
In [14]: timeit s2.map(f1)
2.97 s ± 127 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [15]: timeit s2.map(f2)
2.9 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [16]: timeit s2.map(f3)
2.98 s ± 177 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
pandas的目標(biāo)之一是在numpy之上提供一個操作層,這樣就不必處理ndarray的底層細(xì)節(jié)。獲取原始的datetime.date對象的用途有限,因?yàn)闆]有與之對應(yīng)的numpy dtype且被pandas支持。Pandas僅支持datetime64[ns]類型,這是納秒級的。