1. Timedelta的生成
正如在 第一节 中所说,时间差可以理解为两个时间戳的差,这里也可以通过 pd.Timedelta
来构造:
In [57]: pd.Timestamp('20200102 08:00:00')-pd.Timestamp('20200101 07:35:00')
Out[57]: Timedelta('1 days 00:25:00')
In [58]: pd.Timedelta(days=1, minutes=25) # 需要注意加s
Out[58]: Timedelta('1 days 00:25:00')
In [59]: pd.Timedelta('1 days 25 minutes') # 字符串生成
Out[59]: Timedelta('1 days 00:25:00')
生成时间差序列的主要方式是 pd.to_timedelta
,其类型为 timedelta64[ns]
:
In [60]: s = pd.to_timedelta(df.Time_Record)
In [61]: s.head()
Out[61]:
0 0 days 00:04:34
1 0 days 00:04:20
2 0 days 00:05:22
3 0 days 00:04:08
4 0 days 00:05:22
Name: Time_Record, dtype: timedelta64[ns]
与 date_range
一样,时间差序列也可以用 timedelta_range
来生成,它们两者具有一致的参数:
In [62]: pd.timedelta_range('0s', '1000s', freq='6min')
Out[62]: TimedeltaIndex(['0 days 00:00:00', '0 days 00:06:00', '0 days 00:12:00'], dtype='timedelta64[ns]', freq='6T')
In [63]: pd.timedelta_range('0s', '1000s', periods=3)
Out[63]: TimedeltaIndex(['0 days 00:00:00', '0 days 00:08:20', '0 days 00:16:40'], dtype='timedelta64[ns]', freq=None)
对于 Timedelta
序列,同样也定义了 dt
对象,上面主要定义了的属性包括 days, seconds, mircroseconds, nanoseconds
,它们分别返回了对应的时间差特征。需要注意的是,这里的 seconds
不是指单纯的秒,而是对天数取余后剩余的秒数:
In [64]: s.dt.seconds.head()
Out[64]:
0 274
1 260
2 322
3 248
4 322
Name: Time_Record, dtype: int64
如果不想对天数取余而直接对应秒数,可以使用 total_seconds
In [65]: s.dt.total_seconds().head()
Out[65]:
0 274.0
1 260.0
2 322.0
3 248.0
4 322.0
Name: Time_Record, dtype: float64
与时间戳序列类似,取整函数也是可以在 dt
对象上使用的:
In [66]: pd.to_timedelta(df.Time_Record).dt.round('min').head()
Out[66]:
0 0 days 00:05:00
1 0 days 00:04:00
2 0 days 00:05:00
3 0 days 00:04:00
4 0 days 00:05:00
Name: Time_Record, dtype: timedelta64[ns]
2. Timedelta的运算
时间差支持的常用运算有三类:与标量的乘法运算、与时间戳的加减法运算、与时间差的加减法与除法运算:
In [67]: td1 = pd.Timedelta(days=1)
In [68]: td2 = pd.Timedelta(days=3)
In [69]: ts = pd.Timestamp('20200101')
In [70]: td1 * 2
Out[70]: Timedelta('2 days 00:00:00')
In [71]: td2 - td1
Out[71]: Timedelta('2 days 00:00:00')
In [72]: ts + td1
Out[72]: Timestamp('2020-01-02 00:00:00')
In [73]: ts - td1
Out[73]: Timestamp('2019-12-31 00:00:00')
这些运算都可以移植到时间差的序列上:
In [74]: td1 = pd.timedelta_range(start='1 days', periods=5)
In [75]: td2 = pd.timedelta_range(start='12 hours',
....: freq='2H',
....: periods=5)
....:
In [76]: ts = pd.date_range('20200101', '20200105')
In [77]: td1 * 5
Out[77]: TimedeltaIndex(['5 days', '10 days', '15 days', '20 days', '25 days'], dtype='timedelta64[ns]', freq='5D')
In [78]: td1 * pd.Series(list(range(5))) # 逐个相乘
Out[78]:
0 0 days
1 2 days
2 6 days
3 12 days
4 20 days
dtype: timedelta64[ns]
In [79]: td1 - td2
Out[79]:
TimedeltaIndex(['0 days 12:00:00', '1 days 10:00:00', '2 days 08:00:00',
'3 days 06:00:00', '4 days 04:00:00'],
dtype='timedelta64[ns]', freq=None)
In [80]: td1 + pd.Timestamp('20200101')
Out[80]:
DatetimeIndex(['2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05',
'2020-01-06'],
dtype='datetime64[ns]', freq='D')
In [81]: td1 + ts # 逐个相加
Out[81]:
DatetimeIndex(['2020-01-02', '2020-01-04', '2020-01-06', '2020-01-08',
'2020-01-10'],
dtype='datetime64[ns]', freq=None)