需要在最后 5 个日期时间(时间戳)之前 select 行 pandas.DataFrame
Need to select rows of pandas.DataFrame by last 5 datetime(Timestamp)
我有一个数据框:
|- | USER | Timestamp |
| 0 | A | 2019-10-01 08:32:29.608000 |
| 1 | A | 2019-10-01 08:32:34.541000 |
| 2 | A | 2019-10-01 08:32:35.863001 |
| 3 | A | 2019-10-01 08:32:35.864002 |
| 4 | A | 2019-10-01 08:32:36.398003 |
| 5 | A | 2019-10-01 08:32:39.517000 |
| 6 | A | 2019-10-01 08:32:39.567005 |
| 7 | A | 2019-10-01 08:32:41.039000 |
...
| 130 | B | 2019-10-01 22:12:21.966022 |
| 131 | B | 2019-10-01 22:12:23.549023 |
| 132 | B | 2019-10-01 22:12:24.977024 |
| 133 | B | 2019-10-01 22:12:25.922025 |
| 134 | B | 2019-10-01 22:12:26.935026 |
| 135 | B | 2019-10-01 22:12:28.487027 |
| 136 | B | 2019-10-01 22:12:29.593028 |
| 137 | B | 2019-10-01 22:12:31.926029 |
从数据框中我需要只为每个用户.
留下最后5个时间戳行
我试过索引,将 dtype 更改为 datetime64[ns]。
这是我的期望:每个用户只有 5 个最后的时间戳
| | USER | Timestamp |
| 3 | A | 2019-10-01 08:32:35.864002 |
| 4 | A | 2019-10-01 08:32:36.398003 |
| 5 | A | 2019-10-01 08:32:39.517000 |
| 6 | A | 2019-10-01 08:32:39.567005 |
| 7 | A | 2019-10-01 08:32:41.039000 |
| ...
| 133 | B | 2019-10-01 22:12:25.922025 |
| 134 | B | 2019-10-01 22:12:26.935026 |
| 135 | B | 2019-10-01 22:12:28.487027 |
| 136 | B | 2019-10-01 22:12:29.593028 |
| 137 | B | 2019-10-01 22:12:31.926029 |
P.S。您也可以按升序提及时间戳。我曾想过按索引尝试使用,不幸的是 pandas type - object.
使用DataFrame.sort_values
with GroupBy.tail
:
df = df.sort_values('Timestamp')
df = df.groupby('USER').tail(5)
print (df)
USER Timestamp
3 A 2019-10-01 08:32:35.864002
4 A 2019-10-01 08:32:36.398003
5 A 2019-10-01 08:32:39.517000
6 A 2019-10-01 08:32:39.567005
7 A 2019-10-01 08:32:41.039000
133 B 2019-10-01 22:12:25.922025
134 B 2019-10-01 22:12:26.935026
135 B 2019-10-01 22:12:28.487027
136 B 2019-10-01 22:12:29.593028
137 B 2019-10-01 22:12:31.926029
我有一个数据框:
|- | USER | Timestamp |
| 0 | A | 2019-10-01 08:32:29.608000 |
| 1 | A | 2019-10-01 08:32:34.541000 |
| 2 | A | 2019-10-01 08:32:35.863001 |
| 3 | A | 2019-10-01 08:32:35.864002 |
| 4 | A | 2019-10-01 08:32:36.398003 |
| 5 | A | 2019-10-01 08:32:39.517000 |
| 6 | A | 2019-10-01 08:32:39.567005 |
| 7 | A | 2019-10-01 08:32:41.039000 |
...
| 130 | B | 2019-10-01 22:12:21.966022 |
| 131 | B | 2019-10-01 22:12:23.549023 |
| 132 | B | 2019-10-01 22:12:24.977024 |
| 133 | B | 2019-10-01 22:12:25.922025 |
| 134 | B | 2019-10-01 22:12:26.935026 |
| 135 | B | 2019-10-01 22:12:28.487027 |
| 136 | B | 2019-10-01 22:12:29.593028 |
| 137 | B | 2019-10-01 22:12:31.926029 |
从数据框中我需要只为每个用户.
留下最后5个时间戳行我试过索引,将 dtype 更改为 datetime64[ns]。
这是我的期望:每个用户只有 5 个最后的时间戳
| | USER | Timestamp |
| 3 | A | 2019-10-01 08:32:35.864002 |
| 4 | A | 2019-10-01 08:32:36.398003 |
| 5 | A | 2019-10-01 08:32:39.517000 |
| 6 | A | 2019-10-01 08:32:39.567005 |
| 7 | A | 2019-10-01 08:32:41.039000 |
| ...
| 133 | B | 2019-10-01 22:12:25.922025 |
| 134 | B | 2019-10-01 22:12:26.935026 |
| 135 | B | 2019-10-01 22:12:28.487027 |
| 136 | B | 2019-10-01 22:12:29.593028 |
| 137 | B | 2019-10-01 22:12:31.926029 |
P.S。您也可以按升序提及时间戳。我曾想过按索引尝试使用,不幸的是 pandas type - object.
使用DataFrame.sort_values
with GroupBy.tail
:
df = df.sort_values('Timestamp')
df = df.groupby('USER').tail(5)
print (df)
USER Timestamp
3 A 2019-10-01 08:32:35.864002
4 A 2019-10-01 08:32:36.398003
5 A 2019-10-01 08:32:39.517000
6 A 2019-10-01 08:32:39.567005
7 A 2019-10-01 08:32:41.039000
133 B 2019-10-01 22:12:25.922025
134 B 2019-10-01 22:12:26.935026
135 B 2019-10-01 22:12:28.487027
136 B 2019-10-01 22:12:29.593028
137 B 2019-10-01 22:12:31.926029