Getting error "ValueError: time data '' does not match format '%Y-%m-%d %H:%M:%S'"
Getting error "ValueError: time data '' does not match format '%Y-%m-%d %H:%M:%S'"
这是 df 的示例:
pId tPS tLL dZ
129 2019-12-02 15:04:09 2019-12-02 15:06:31 5f723
129 2019-12-02 15:04:15 2019-12-02 15:06:37 5f723
129 2019-12-02 15:05:15 2019-12-02 15:07:37 5f723
129 2019-12-02 15:05:18 2019-12-02 15:07:40 5f723
129 2019-12-02 15:05:24 2019-12-02 15:07:46 5f723
pID是一个人的ID,我正在尝试检查每个ID的进入、退出和持续时间。
代码如下:
from datetime import datetime
stats=df.sort_values(by=['pId', 'tPS', 'tLL'])[['pId', 'tPS', 'tLL', 'dZ']]
pid = ''
enter_t = ''
exit_t = ''
enter_exit_times=[]
for ind, row in stats.iterrows():
if pid =='':
enter_t = row['tPS']
print(enter_t)
if row['pId']!= pid or ((datetime.strftime(row['tLL'], "%Y-%m-%d %H:%M:%S")
- datetime.strftime(exit_t, "%Y-%m-%d %H:%M:%S")).total_seconds()>2*60*60):
duration = (datetime.strptime(exit_t, "%Y-%m-%d %H:%M:%S") -
datetime.strptime(enter_t, "%Y-%m-%d %H:%M:%S"))
enter_exit_times.append([pid, enter_t, exit_t, duration.total_seconds()])
pid = row['pId']
enter_t = row['tPS']
enter_exit_times.append([pid, enter_t, exit_t])
enter_exit_times_df = pd.DataFrame(enter_exit_times)
所以这里
pid
是id
enter_t
为进入时间
exit_t
是退出时间
tPS
是时间
tLL
是out时间
然后我正在创建一个列表,我正在为其编写一个循环。最初,我通过一个 for
循环 运行 来遍历数据框的行。所以有两个 if
循环,一个带有 pid
,其中一个空值意味着它需要采用 row[tPS]
,如果没有则它必须 运行 通过 not 循环。然后我计算持续时间,然后将值附加到进出时间。
我收到这个错误:
2019-12-02 15:04:09
---------------------------------------------------------------------------
ValueError Traceback (most recent callast)
<ipython-input-411-fd8f6f998cc8> in <module>
12 if row['pId']!= pid or ((datetime.strftime(row['tLL'], "%Y-%m-%d %H:%M:%S")
13 - datetime.strftime(exit_t, "%Y-%m-%d %H:%M:%S")).total_seconds()>2*60*60):
---> 14 duration = (datetime.strptime(exit_t, "%Y-%m-%d %H:%M:%S") -
15 datetime.strptime(enter_t, "%Y-%m-%d %H:%M:%S"))
16 enter_exit_times.append([pid, enter_t, exit_t, duration.total_seconds()])
~/opt/anaconda3/lib/python3.7/_strptime.py in _strptime_datetime(cls, data_string, format)
575 """Return a class cls instance based on the input string and the
576 format string."""
--> 577 tt, fraction, gmtoff_fraction = _strptime(data_string, format)
578 tzname, gmtoff = tt[-2:]
579 args = tt[:6] + (fraction,)
~/opt/anaconda3/lib/python3.7/_strptime.py in _strptime(data_string, format)
357 if not found:
358 raise ValueError("time data %r does not match format %r" %
--> 359 (data_string, format))
360 if len(data_string) != found.end():
361 raise ValueError("unconverted data remains: %s" %
**ValueError: time data '' does not match format '%Y-%m-%d %H:%M:%S'**
错误原因是exit_t
没有在循环中设置anywhere。它是一个空字符串。您在循环之前将其设置为 exit_t = ''
但之后就再也不会设置了。这就是 strptime
在这里抛出错误的原因:
>>> datetime.strptime(' ', "%Y-%m-%d %H:%M:%S")
Traceback (most recent call last):
...
File "/usr/local/Cellar/python/3.7.6/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 359, in _strptime
(data_string, format))
ValueError: time data ' ' does not match format '%Y-%m-%d %H:%M:%S'
解决方案是将其正确设置为 "tLL"
(如果我理解正确的话)。
但我想更进一步地说,我认为您使代码变得比应有的更复杂。我的理解是,您只想计算 "tPS"
(输入时间)和 "tLL"
(输出时间)之间的持续时间。由于您已经遍历每一行,因此您只需要适当地分配值
pid = row['pId']
enter_t_str = row['tPS'] # strings
exit_t_str = row['tLL'] # strings
然后使用 strptime
将日期时间字符串转换为日期时间对象
enter_t_dt = datetime.strptime(enter_t_str, "%Y-%m-%d %H:%M:%S")
exit_t_dt = datetime.strptime(exit_t_str, "%Y-%m-%d %H:%M:%S")
然后计算时长
duration = exit_t_dt - enter_t_dt
然后最终将其添加到您的列表中
enter_exit_times.append([pid, enter_t_str, exit_t_str, duration.total_seconds()])
无需跟踪 "pId"
。
完整代码如下:
stats = df.sort_values(by=['pId', 'tPS', 'tLL'])[['pId', 'tPS', 'tLL', 'dZ']]
pid = ''
enter_t = ''
exit_t = ''
enter_exit_times = []
for ind, row in stats.iterrows():
pid = row['pId']
enter_t_str = row['tPS']
exit_t_str = row['tLL']
enter_t_dt = datetime.strptime(enter_t_str, "%Y-%m-%d %H:%M:%S")
exit_t_dt = datetime.strptime(exit_t_str, "%Y-%m-%d %H:%M:%S")
duration = exit_t_dt - enter_t_dt
enter_exit_times.append([pid, enter_t_str, exit_t_str, duration.total_seconds()])
enter_exit_times_df = pd.DataFrame(enter_exit_times)
print(enter_exit_times_df)
并且输出DataFrame:
0 1 2 3
0 129 2019-12-02 15:04:09 2019-12-02 15:06:31 142.0
1 129 2019-12-02 15:04:15 2019-12-02 15:06:37 142.0
2 129 2019-12-02 15:05:15 2019-12-02 15:07:37 142.0
3 129 2019-12-02 15:05:18 2019-12-02 15:07:40 142.0
4 129 2019-12-02 15:05:24 2019-12-02 15:07:46 142.0
如果只想获取一天中特定时间段的 enter/exit 次,可以为开始时间和结束时间创建 datetime
对象,并进行定期比较:
>>> dt_beg = datetime(2019,12,2,8,0,0) #8AM
>>> dt_beg
datetime.datetime(2019, 12, 2, 8, 0)
>>> dt_end = datetime(2019,12,2,10,0,0) #10AM
>>> dt_end
datetime.datetime(2019, 12, 2, 10, 0)
>>> dt = datetime(2019,12,2,9,34,0) #9:34AM
>>> dt_beg < dt < dt_end
True
>>> dt = datetime(2019,12,2,14,34,0) #2:34PM
>>> dt_beg < dt < dt_end
False
因此,您可以添加一个过滤器来筛选要附加到 enter_exit_times
:
的内容
if (enter_t_dt > start_dt and exit_t_dt < end_dt):
enter_exit_times.append(...)
这是 df 的示例:
pId tPS tLL dZ
129 2019-12-02 15:04:09 2019-12-02 15:06:31 5f723
129 2019-12-02 15:04:15 2019-12-02 15:06:37 5f723
129 2019-12-02 15:05:15 2019-12-02 15:07:37 5f723
129 2019-12-02 15:05:18 2019-12-02 15:07:40 5f723
129 2019-12-02 15:05:24 2019-12-02 15:07:46 5f723
pID是一个人的ID,我正在尝试检查每个ID的进入、退出和持续时间。
代码如下:
from datetime import datetime
stats=df.sort_values(by=['pId', 'tPS', 'tLL'])[['pId', 'tPS', 'tLL', 'dZ']]
pid = ''
enter_t = ''
exit_t = ''
enter_exit_times=[]
for ind, row in stats.iterrows():
if pid =='':
enter_t = row['tPS']
print(enter_t)
if row['pId']!= pid or ((datetime.strftime(row['tLL'], "%Y-%m-%d %H:%M:%S")
- datetime.strftime(exit_t, "%Y-%m-%d %H:%M:%S")).total_seconds()>2*60*60):
duration = (datetime.strptime(exit_t, "%Y-%m-%d %H:%M:%S") -
datetime.strptime(enter_t, "%Y-%m-%d %H:%M:%S"))
enter_exit_times.append([pid, enter_t, exit_t, duration.total_seconds()])
pid = row['pId']
enter_t = row['tPS']
enter_exit_times.append([pid, enter_t, exit_t])
enter_exit_times_df = pd.DataFrame(enter_exit_times)
所以这里
pid
是identer_t
为进入时间exit_t
是退出时间tPS
是时间tLL
是out时间
然后我正在创建一个列表,我正在为其编写一个循环。最初,我通过一个 for
循环 运行 来遍历数据框的行。所以有两个 if
循环,一个带有 pid
,其中一个空值意味着它需要采用 row[tPS]
,如果没有则它必须 运行 通过 not 循环。然后我计算持续时间,然后将值附加到进出时间。
我收到这个错误:
2019-12-02 15:04:09
---------------------------------------------------------------------------
ValueError Traceback (most recent callast)
<ipython-input-411-fd8f6f998cc8> in <module>
12 if row['pId']!= pid or ((datetime.strftime(row['tLL'], "%Y-%m-%d %H:%M:%S")
13 - datetime.strftime(exit_t, "%Y-%m-%d %H:%M:%S")).total_seconds()>2*60*60):
---> 14 duration = (datetime.strptime(exit_t, "%Y-%m-%d %H:%M:%S") -
15 datetime.strptime(enter_t, "%Y-%m-%d %H:%M:%S"))
16 enter_exit_times.append([pid, enter_t, exit_t, duration.total_seconds()])
~/opt/anaconda3/lib/python3.7/_strptime.py in _strptime_datetime(cls, data_string, format)
575 """Return a class cls instance based on the input string and the
576 format string."""
--> 577 tt, fraction, gmtoff_fraction = _strptime(data_string, format)
578 tzname, gmtoff = tt[-2:]
579 args = tt[:6] + (fraction,)
~/opt/anaconda3/lib/python3.7/_strptime.py in _strptime(data_string, format)
357 if not found:
358 raise ValueError("time data %r does not match format %r" %
--> 359 (data_string, format))
360 if len(data_string) != found.end():
361 raise ValueError("unconverted data remains: %s" %
**ValueError: time data '' does not match format '%Y-%m-%d %H:%M:%S'**
错误原因是exit_t
没有在循环中设置anywhere。它是一个空字符串。您在循环之前将其设置为 exit_t = ''
但之后就再也不会设置了。这就是 strptime
在这里抛出错误的原因:
>>> datetime.strptime(' ', "%Y-%m-%d %H:%M:%S")
Traceback (most recent call last):
...
File "/usr/local/Cellar/python/3.7.6/Frameworks/Python.framework/Versions/3.7/lib/python3.7/_strptime.py", line 359, in _strptime
(data_string, format))
ValueError: time data ' ' does not match format '%Y-%m-%d %H:%M:%S'
解决方案是将其正确设置为 "tLL"
(如果我理解正确的话)。
但我想更进一步地说,我认为您使代码变得比应有的更复杂。我的理解是,您只想计算 "tPS"
(输入时间)和 "tLL"
(输出时间)之间的持续时间。由于您已经遍历每一行,因此您只需要适当地分配值
pid = row['pId']
enter_t_str = row['tPS'] # strings
exit_t_str = row['tLL'] # strings
然后使用 strptime
enter_t_dt = datetime.strptime(enter_t_str, "%Y-%m-%d %H:%M:%S")
exit_t_dt = datetime.strptime(exit_t_str, "%Y-%m-%d %H:%M:%S")
然后计算时长
duration = exit_t_dt - enter_t_dt
然后最终将其添加到您的列表中
enter_exit_times.append([pid, enter_t_str, exit_t_str, duration.total_seconds()])
无需跟踪 "pId"
。
完整代码如下:
stats = df.sort_values(by=['pId', 'tPS', 'tLL'])[['pId', 'tPS', 'tLL', 'dZ']]
pid = ''
enter_t = ''
exit_t = ''
enter_exit_times = []
for ind, row in stats.iterrows():
pid = row['pId']
enter_t_str = row['tPS']
exit_t_str = row['tLL']
enter_t_dt = datetime.strptime(enter_t_str, "%Y-%m-%d %H:%M:%S")
exit_t_dt = datetime.strptime(exit_t_str, "%Y-%m-%d %H:%M:%S")
duration = exit_t_dt - enter_t_dt
enter_exit_times.append([pid, enter_t_str, exit_t_str, duration.total_seconds()])
enter_exit_times_df = pd.DataFrame(enter_exit_times)
print(enter_exit_times_df)
并且输出DataFrame:
0 1 2 3
0 129 2019-12-02 15:04:09 2019-12-02 15:06:31 142.0
1 129 2019-12-02 15:04:15 2019-12-02 15:06:37 142.0
2 129 2019-12-02 15:05:15 2019-12-02 15:07:37 142.0
3 129 2019-12-02 15:05:18 2019-12-02 15:07:40 142.0
4 129 2019-12-02 15:05:24 2019-12-02 15:07:46 142.0
如果只想获取一天中特定时间段的 enter/exit 次,可以为开始时间和结束时间创建 datetime
对象,并进行定期比较:
>>> dt_beg = datetime(2019,12,2,8,0,0) #8AM
>>> dt_beg
datetime.datetime(2019, 12, 2, 8, 0)
>>> dt_end = datetime(2019,12,2,10,0,0) #10AM
>>> dt_end
datetime.datetime(2019, 12, 2, 10, 0)
>>> dt = datetime(2019,12,2,9,34,0) #9:34AM
>>> dt_beg < dt < dt_end
True
>>> dt = datetime(2019,12,2,14,34,0) #2:34PM
>>> dt_beg < dt < dt_end
False
因此,您可以添加一个过滤器来筛选要附加到 enter_exit_times
:
if (enter_t_dt > start_dt and exit_t_dt < end_dt):
enter_exit_times.append(...)