根据时差计算一天是否已经过去,如果已经过去,则插入更新日期

Calculating if day has passed based on time difference and insert updated date if it has

编辑:无法解决,需要考虑更好的解决方法。

我正在抓取此网页 (http://www.oddsportal.com/american-football/usa/nfl-2017-2018/results/#/page/6/) 并尝试将比赛日期(页面上的灰色)插入到每个相应的比赛时间行中。

我正在纠结如何实现这个逻辑。

本页抓取日期列表如下...

file_days=[['17 Sep 2017'],['15 Sep 2017'],['12 Sep 2017'], ['11 Sep 2017'],['10 Sep 2017'], ['08 Sep 2017'],['01 Sep 2017'],['31 Aug 2017'],
           ['28 Aug 2017'],['27 Aug 2017'],['26 Aug 2017'],['25 Aug 2017'],['24 Aug 2017']]

file_days=file_days[::-1]

我正在尝试将这些日期插入到以下包含每个抓取的游戏开始时间的数据框中。

import pandas as pd
data = {'game_time': ['23:00','23:30','23:00','00:00','23:00','23:00','23:00','23:30','23:30','00:00','00:00','00:00','01:00','17:00','20:30','00:00','23:00','23:00','23:00','23:00',                 '23:00','23:30','23:30','23:30','00:00','00:00','00:00','00:00','00:30','01:00','02:00','02:00','00:30','17:00','17:00','17:00','17:00','17:00','17:00','17:00','17:00','20:05','20:25','20:25','00:30','23:10','02:20','00:25','17:00','17:00']}
df = pd.DataFrame.from_dict(data)

到目前为止我有以下代码,但我似乎无法弄清楚如果时间已经过去了,尝试插入新日期的逻辑。

df.game_time = pd.to_datetime(df.game_time)
df['game'] = df.game_time.dt.strftime('%H:%M')
df['previous_game'] = df.game_time.dt.strftime('%H:%M').shift(1)
df['previous_game'] = df['previous_game'].fillna(str('00:00'))

matchup_day = []

for a,b in zip(df['game'],df['previous_game']):
    if a >= b:
        matchup_day.append(file_days[0]) #if time of current game is greater than time of previous game than use the current date

    else:
        matchup_day.append(file_days[1]) #if time of current game is less than time of previous game, then use the next date and delete the most recently used date
        file_days.pop(0)  

输出如下...

 matchup_day
 [['24 Aug 2017'],
 ['24 Aug 2017'],
 ['25 Aug 2017'],
 ['26 Aug 2017'],
 ['26 Aug 2017'],
 ['26 Aug 2017'],
 ['26 Aug 2017'],
 ['26 Aug 2017'],
 ['26 Aug 2017'],
 ['27 Aug 2017'],
 ['27 Aug 2017'],
 ['27 Aug 2017'],
 ['27 Aug 2017'],
 ['27 Aug 2017'],
 ['27 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['28 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['31 Aug 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['01 Sep 2017'],
 ['08 Sep 2017'],
 ['08 Sep 2017'],
 ['10 Sep 2017'],
 ['11 Sep 2017'],
 ['11 Sep 2017'],
 ['11 Sep 2017']]

此输出显然不正确,因为它在数据框的第 15 行或网站上的 8 月 28 日出错。有没有人对如何改进这个逻辑有任何想法?

对于如何实现这一点,我也乐于接受完全不同的想法。 提前谢谢你,因为我一直被这个难住了。

您在这里不需要手动循环。您可以将系列与自身的移位版本进行比较,然后使用 pd.Series.cumsum 并通过字典进行映射。

这是一个演示:

from itertools import chain

file_days = [['17 Sep 2017'], ['15 Sep 2017'], ['12 Sep 2017'], ['11 Sep 2017'],
             ['10 Sep 2017'], ['08 Sep 2017'], ['01 Sep 2017'], ['31 Aug 2017'],
             ['28 Aug 2017'], ['27 Aug 2017'], ['26 Aug 2017'], ['25 Aug 2017'],
             ['24 Aug 2017']]

d = dict(enumerate(chain.from_iterable(file_days[::-1])))

df['date'] = (df['game'] < df['game'].shift()).cumsum().map(d)

结果:

print(df[['game', 'date']])

     game         date
0   23:00  24 Aug 2017
1   23:30  24 Aug 2017
2   23:00  25 Aug 2017
3   00:00  26 Aug 2017
4   23:00  26 Aug 2017
5   23:00  26 Aug 2017
6   23:00  26 Aug 2017
7   23:30  26 Aug 2017
8   23:30  26 Aug 2017
9   00:00  27 Aug 2017
10  00:00  27 Aug 2017
11  00:00  27 Aug 2017
12  01:00  27 Aug 2017
13  17:00  27 Aug 2017
14  20:30  27 Aug 2017
15  00:00  28 Aug 2017
16  23:00  28 Aug 2017
17  23:00  28 Aug 2017
18  23:00  28 Aug 2017
19  23:00  28 Aug 2017
20  23:00  28 Aug 2017
21  23:30  28 Aug 2017
22  23:30  28 Aug 2017
23  23:30  28 Aug 2017
24  00:00  31 Aug 2017
25  00:00  31 Aug 2017
26  00:00  31 Aug 2017
27  00:00  31 Aug 2017
28  00:30  31 Aug 2017
29  01:00  31 Aug 2017
30  02:00  31 Aug 2017
31  02:00  31 Aug 2017
32  00:30  01 Sep 2017
33  17:00  01 Sep 2017
34  17:00  01 Sep 2017
35  17:00  01 Sep 2017
36  17:00  01 Sep 2017
37  17:00  01 Sep 2017
38  17:00  01 Sep 2017
39  17:00  01 Sep 2017
40  17:00  01 Sep 2017
41  20:05  01 Sep 2017
42  20:25  01 Sep 2017
43  20:25  01 Sep 2017
44  00:30  08 Sep 2017
45  23:10  08 Sep 2017
46  02:20  10 Sep 2017
47  00:25  11 Sep 2017
48  17:00  11 Sep 2017
49  17:00  11 Sep 2017