三、时间差

1. Timedelta的生成

正如在 第一节 中所说,时间差可以理解为两个时间戳的差,这里也可以通过 pd.Timedelta 来构造:

In [57]: pd.Timestamp('20200102 08:00:00')-pd.Timestamp('20200101 07:35:00')
Out[57]: Timedelta('1 days 00:25:00')
In [58]: pd.Timedelta(days=1, minutes=25) # 需要注意加s
Out[58]: Timedelta('1 days 00:25:00')
In [59]: pd.Timedelta('1 days 25 minutes') # 字符串生成
Out[59]: Timedelta('1 days 00:25:00')

生成时间差序列的主要方式是 pd.to_timedelta ,其类型为 timedelta64[ns]

In [60]: s = pd.to_timedelta(df.Time_Record)
In [61]: s.head()
Out[61]: 
0   0 days 00:04:34
1   0 days 00:04:20
2   0 days 00:05:22
3   0 days 00:04:08
4   0 days 00:05:22
Name: Time_Record, dtype: timedelta64[ns]

date_range 一样,时间差序列也可以用 timedelta_range 来生成,它们两者具有一致的参数:

In [62]: pd.timedelta_range('0s', '1000s', freq='6min')
Out[62]: TimedeltaIndex(['0 days 00:00:00', '0 days 00:06:00', '0 days 00:12:00'], dtype='timedelta64[ns]', freq='6T')
In [63]: pd.timedelta_range('0s', '1000s', periods=3)
Out[63]: TimedeltaIndex(['0 days 00:00:00', '0 days 00:08:20', '0 days 00:16:40'], dtype='timedelta64[ns]', freq=None)

对于 Timedelta 序列,同样也定义了 dt 对象,上面主要定义了的属性包括 days, seconds, mircroseconds, nanoseconds ,它们分别返回了对应的时间差特征。需要注意的是,这里的 seconds 不是指单纯的秒,而是对天数取余后剩余的秒数:

In [64]: s.dt.seconds.head()
Out[64]: 
0    274
1    260
2    322
3    248
4    322
Name: Time_Record, dtype: int64

如果不想对天数取余而直接对应秒数,可以使用 total_seconds

In [65]: s.dt.total_seconds().head()
Out[65]: 
0    274.0
1    260.0
2    322.0
3    248.0
4    322.0
Name: Time_Record, dtype: float64

与时间戳序列类似,取整函数也是可以在 dt 对象上使用的:

In [66]: pd.to_timedelta(df.Time_Record).dt.round('min').head()
Out[66]: 
0   0 days 00:05:00
1   0 days 00:04:00
2   0 days 00:05:00
3   0 days 00:04:00
4   0 days 00:05:00
Name: Time_Record, dtype: timedelta64[ns]

2. Timedelta的运算

时间差支持的常用运算有三类:与标量的乘法运算、与时间戳的加减法运算、与时间差的加减法与除法运算:

In [67]: td1 = pd.Timedelta(days=1)
In [68]: td2 = pd.Timedelta(days=3)
In [69]: ts = pd.Timestamp('20200101')
In [70]: td1 * 2
Out[70]: Timedelta('2 days 00:00:00')
In [71]: td2 - td1
Out[71]: Timedelta('2 days 00:00:00')
In [72]: ts + td1
Out[72]: Timestamp('2020-01-02 00:00:00')
In [73]: ts - td1
Out[73]: Timestamp('2019-12-31 00:00:00')

这些运算都可以移植到时间差的序列上:

In [74]: td1 = pd.timedelta_range(start='1 days', periods=5)
In [75]: td2 = pd.timedelta_range(start='12 hours',
   ....:                          freq='2H',
   ....:                          periods=5)
   ....: 
In [76]: ts = pd.date_range('20200101', '20200105')
In [77]: td1 * 5
Out[77]: TimedeltaIndex(['5 days', '10 days', '15 days', '20 days', '25 days'], dtype='timedelta64[ns]', freq='5D')
In [78]: td1 * pd.Series(list(range(5))) # 逐个相乘
Out[78]: 
0    0 days
1    2 days
2    6 days
3   12 days
4   20 days
dtype: timedelta64[ns]
In [79]: td1 - td2
Out[79]: 
TimedeltaIndex(['0 days 12:00:00', '1 days 10:00:00', '2 days 08:00:00',
                '3 days 06:00:00', '4 days 04:00:00'],
               dtype='timedelta64[ns]', freq=None)
In [80]: td1 + pd.Timestamp('20200101')
Out[80]: 
DatetimeIndex(['2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05',
               '2020-01-06'],
              dtype='datetime64[ns]', freq='D')
In [81]: td1 + ts # 逐个相加
Out[81]: 
DatetimeIndex(['2020-01-02', '2020-01-04', '2020-01-06', '2020-01-08',
               '2020-01-10'],
              dtype='datetime64[ns]', freq=None)