Pandas 在不同的列中拆分日期和时间

Pandas split date and time in different columns

我有这样的日期列

0     Feb-23-21 10:35AM
1               10:18AM
2               10:13AM
3               10:10AM
4               09:15AM
5               09:02AM
6               08:13AM
7               08:07AM
8               05:34AM
9               12:52AM
10    Feb-22-21 07:00PM
11              07:00PM
12              06:22PM
13              05:56PM
14              05:18PM
15              05:07PM
16              05:00PM
17              04:31PM
18              04:11PM
19              04:05PM

期望的输出是我想在不同的列中拆分日期和时间,如下所示:

    0  Feb-23-21 10:35AM
    1  Feb-23-21 10:18AM
    2  Feb-23-21 10:13AM
    3  Feb-23-21 10:10AM
    4  Feb-23-21 09:15AM
    5  Feb-23-21 09:02AM
    6  Feb-23-21 08:13AM
    7  Feb-23-21 08:07AM
    8  Feb-23-21 05:34AM
    9  Feb-23-21 12:52AM
    10 Feb-22-21 07:00PM
    11 Feb-22-21 07:00PM
    12 Feb-22-21 06:22PM
    13 Feb-22-21 05:56PM
    14 Feb-22-21 05:18PM
    15 Feb-22-21 05:07PM
    16 Feb-22-21 05:00PM
    17 Feb-22-21 04:31PM
    18 Feb-22-21 04:11PM
    19 Feb-22-21 04:05PM

可能,我想在不同的列中显示日期和时间。实际上,我正在从 here 中抓取新闻,而编写的代码是这样的:

news = pd.read_html(str(response.body), attrs={'class': 'fullview-news-outer'})[0]
links = []
for a in response.css('a[class="tab-link-news"]::attr(href)').getall():
    links.append(a)

news.columns = ['Date', 'News Headline']
news['Article Link'] = links

使用给定的 date/time 格式,您可以

  • 拆分 space 日期和时间
    • 将倒数第二个元素放入“日期”列并向前填充空白
    • 将最后一个元素放入“time”列

前:

df = pd.DataFrame({'input': ["Feb-23-21 10:35AM", "10:18AM", "10:13AM", "Feb-22-21 07:00PM", "07:00PM", "06:22PM"]})

df['date'] = df['input'].str.split(' ').str[-2].fillna(method='ffill')
df['time'] = df['input'].str.split(' ').str[-1]

# df
#                input       date     time
# 0  Feb-23-21 10:35AM  Feb-23-21  10:35AM
# 1            10:18AM  Feb-23-21  10:18AM
# 2            10:13AM  Feb-23-21  10:13AM
# 3  Feb-22-21 07:00PM  Feb-22-21  07:00PM
# 4            07:00PM  Feb-22-21  07:00PM
# 5            06:22PM  Feb-22-21  06:22PM

现在您还可以从字符串转换为 datetime,例如

df['datetime'] = pd.to_datetime(df['date']+' '+df['time'])

# df['datetime']
# 0   2021-02-23 10:35:00
# 1   2021-02-23 10:18:00
# 2   2021-02-23 10:13:00
# 3   2021-02-22 19:00:00
# 4   2021-02-22 19:00:00
# 5   2021-02-22 18:22:00
# Name: datetime, dtype: datetime64[ns]

为您提供进一步处理数据的更多可能性。