拆分/提取数据框中的部分列 - python

Question

我正在尝试 split/extract "Time" 列的一部分，因此它只会显示小时数和分钟，例如18:15 与 18:15:34 相对。

我在网上看到很多使用 .str.split() 函数并突出显示冒号作为分隔符的示例。但这会将时间列拆分为三列：小时、分钟、秒。

输入数据帧：

df =

Index   Time
0       18:15:21
1       19:15:21
2       20:15:21
3       21:15:21
4       22:15:21

输出数据帧

df =

Index   Time
0       18:15
1       19:15
2       20:15
3       21:15
4       22:15

谢谢 :)

Answer 1

您可以使用：

df['Time'].apply(lambda x : ':'.join(x.split(':')[0:2]))

Answer 2

您可以使用正则表达式：

df.Time.str.replace(':\d\d$', '')

或反向拆分：

df.Time.str.rsplit(':', 1).str[0]

Answer 3

你在这里有公平的选择 replace ， extract 或 split 与 pandas.series.str

首先，这只是基于案例的解决方案..

下面的解决方案确实替换了最后两个数字以及 Time 列中的 :。

>>> df['Time'] = df['Time'].str.replace(':\d{2}$', '')
>>> df
    Time
0  18:15
1  19:15
2  20:15
3  21:15
4  22:15

第二种方法 str.extract 和正则表达式..

>>> df['Time'] = df['Time'].str.extract('(\d{2}:\d{2})')
>>> df
    Time
0  18:15
1  19:15
2  20:15
3  21:15
4  22:15

\d{2} to hold initial two numbers

: next to match this immediately after first match

\d{2} again next two number followed by colon

$ asserts position at the end of a line

拆分/提取数据框中的部分列 - python

Splitting/ Extracting part of Column in a Dataframe - python

python

split

extract

dataframe