pandas 数据框中的日期时间不会相互减去
Datetime in pandas dataframe will not subtract from each other
我试图找出 pandas 数据帧中两列之间的时间差异,两者都是日期时间格式。
下面是我数据框中的一些数据和我一直在使用的代码。我已经三重检查了这两列数据类型是否为 datetime64。
我的数据:
date_updated date_scored
2016-03-30 08:00:00.000 2016-03-30 08:00:57.416
2016-04-07 23:50:00.000 2016-04-07 23:50:12.036
我的代码:
data['date_updated'] = pd.to_datetime(data['date_updated'],
format='%Y-%m-%d %H:%M:%S')
data['date_scored'] = pd.to_datetime(data['date_scored'],
format='%Y-%m-%d %H:%M:%S')
data['Diff'] = data['date_updated'] - data['date_scored']
我收到的错误信息:
TypeError: data type "datetime" not understood
任何帮助将不胜感激,谢谢!
我的解决方案:
for i in raw_data[:10]:
scored = i.date_scored
scored_date = pd.to_datetime(scored, format='%Y-%m-%d %H:%M:%S')
if type(scored_date) == "NoneType":
pass
elif scored_date.year >= 2016:
extracted = i.date_extracted
extracted = pd.to_datetime(extracted, format='%Y-%m-%d %H:%M:%S')
bank = i.bank.name
diff = scored - extracted
datum = [str(bank), str(extracted), str(scored), str(diff)]
data.append(datum)
else:
pass
它就像一个魅力。您甚至可以简化您的代码,因为 to_datetime
足够聪明,可以为您猜测格式。
import io
import pandas as pd
# Paste the text by using of triple-quotes to span String literals on multiple lines
zz = """date_updated,date_scored
2016-03-30 08:00:00.000, 2016-03-30 08:00:57.416
2016-04-07 23:50:00.000, 2016-04-07 23:50:12.036"""
data = pd.read_table(io.StringIO(zz), delim_whitespace=False, delimiter=',')
data['date_updated'] = pd.to_datetime(data['date_updated'])
data['date_scored'] = pd.to_datetime(data['date_scored'])
data['Diff'] = data['date_updated'] - data['date_scored']
print(data)
# date_updated date_scored Diff
# 0 2016-03-30 08:00:00 2016-03-30 08:00:57.416 -1 days +23:59:02.584000
# 1 2016-04-07 23:50:00 2016-04-07 23:50:12.036 -1 days +23:59:47.964000
您需要更新 pandas。
我刚刚 运行 遇到了同样的问题,旧代码曾经 运行 没有问题。
将 pandas (0.18.1-np111py35_0) 更新到较新版本 (0.20.2-np113py35_0) 后,问题已解决。
我在使用上述语法时遇到了同样的错误(虽然在另一台机器上工作):
data['Diff'] = data['date_updated'] - data['date_scored']
它在我的新机器上运行:
data['Diff'] = data['date_updated'].subtract(data['date_scored'])
我试图找出 pandas 数据帧中两列之间的时间差异,两者都是日期时间格式。
下面是我数据框中的一些数据和我一直在使用的代码。我已经三重检查了这两列数据类型是否为 datetime64。
我的数据:
date_updated date_scored
2016-03-30 08:00:00.000 2016-03-30 08:00:57.416
2016-04-07 23:50:00.000 2016-04-07 23:50:12.036
我的代码:
data['date_updated'] = pd.to_datetime(data['date_updated'],
format='%Y-%m-%d %H:%M:%S')
data['date_scored'] = pd.to_datetime(data['date_scored'],
format='%Y-%m-%d %H:%M:%S')
data['Diff'] = data['date_updated'] - data['date_scored']
我收到的错误信息:
TypeError: data type "datetime" not understood
任何帮助将不胜感激,谢谢!
我的解决方案:
for i in raw_data[:10]:
scored = i.date_scored
scored_date = pd.to_datetime(scored, format='%Y-%m-%d %H:%M:%S')
if type(scored_date) == "NoneType":
pass
elif scored_date.year >= 2016:
extracted = i.date_extracted
extracted = pd.to_datetime(extracted, format='%Y-%m-%d %H:%M:%S')
bank = i.bank.name
diff = scored - extracted
datum = [str(bank), str(extracted), str(scored), str(diff)]
data.append(datum)
else:
pass
它就像一个魅力。您甚至可以简化您的代码,因为 to_datetime
足够聪明,可以为您猜测格式。
import io
import pandas as pd
# Paste the text by using of triple-quotes to span String literals on multiple lines
zz = """date_updated,date_scored
2016-03-30 08:00:00.000, 2016-03-30 08:00:57.416
2016-04-07 23:50:00.000, 2016-04-07 23:50:12.036"""
data = pd.read_table(io.StringIO(zz), delim_whitespace=False, delimiter=',')
data['date_updated'] = pd.to_datetime(data['date_updated'])
data['date_scored'] = pd.to_datetime(data['date_scored'])
data['Diff'] = data['date_updated'] - data['date_scored']
print(data)
# date_updated date_scored Diff
# 0 2016-03-30 08:00:00 2016-03-30 08:00:57.416 -1 days +23:59:02.584000
# 1 2016-04-07 23:50:00 2016-04-07 23:50:12.036 -1 days +23:59:47.964000
您需要更新 pandas。 我刚刚 运行 遇到了同样的问题,旧代码曾经 运行 没有问题。 将 pandas (0.18.1-np111py35_0) 更新到较新版本 (0.20.2-np113py35_0) 后,问题已解决。
我在使用上述语法时遇到了同样的错误(虽然在另一台机器上工作):
data['Diff'] = data['date_updated'] - data['date_scored']
它在我的新机器上运行:
data['Diff'] = data['date_updated'].subtract(data['date_scored'])