为每一行解析 Dataframe
Parsing in Dataframe for each Row
我想执行一个计算,其中每个字母的结束日期都需要减去 Start_Date,
,然后除以 365 以获得以年为单位的持续时间。然后每个字母列的持续时间需要分别用作其字母列值的'power of'。然后需要将每个字母的结果相加得到总数。
我尝试使用下面的代码进行解析,我设法得到了答案。
import pandas as pd
dataset = [['01-01-2015', 234, '25-05-2017', 633, '03-06-2016', 935, '30-10-2019', 673, '16-12-2020', 825, '06-07-2019'],
['01-01-2015', 664, '25-05-2017', 663, '03-06-2016', 665, '30-10-2019', 663, '16-12-2020', 665, '06-07-2019']]
ds = pd.DataFrame(dataset, columns = ['Start_Date', 'A', 'End_Date_A', 'B', 'End_Date_B', 'C', 'End_Date_C',
'D', 'End_Date_D', 'E', 'End_Date_E'])
Start_Date A End_Date_A B End_Date_B C End_Date_C D End_Date_D E End_Date_E
0 01-01-2015 234 25-05-2017 633 03-06-2016 935 30-10-2019 673 16-12-2020 825 06-07-2019
1 01-01-2015 664 25-05-2017 663 03-06-2016 665 30-10-2019 663 16-12-2020 665 06-07-2019
from dateutil import parser
import math
letters = ["A", "B", "C", "D", "E"]
total = 0
for i in ds.index:
for letter in letters:
start_date = parser.parse(ds["Start_Date"][i])
end_date = parser.parse(ds["End_Date_" + letter][i])
years = (end_date - start_date).days / 365
power = math.pow(int(ds[letter][i]), years)
total+= power
ds['Overall'] = total
但是,它对每一行显示相同的结果。
Start_Date A End_Date_A B End_Date_B C End_Date_C D End_Date_D E End_Date_E Overall
0 01-01-2015 234 25-05-2017 633 03-06-2016 935 30-10-2019 673 16-12-2020 825 06-07-2019 1.388585e+17
1 01-01-2015 664 25-05-2017 663 03-06-2016 665 30-10-2019 663 16-12-2020 665 06-07-2019 1.388585e+17
是否有任何其他提示可以执行此操作并根据每行值获取总数?
这里不需要使用for-loop,我们可以使用基于向量化pandas的方法:
letters = pd.Index(['A', 'B', 'C', 'D', 'E'])
start = pd.to_datetime(ds['Start_Date'], dayfirst=True)
dates = ds['End_Date_' + letters].apply(pd.to_datetime, dayfirst=True)
years = dates.sub(start, axis=0).astype('timedelta64[D]').div(365)
ds['Overall'] = ds[letters].pow(years.values).sum(1)
结果
print(ds)
Start_Date A End_Date_A B End_Date_B C End_Date_C D End_Date_D E End_Date_E Overall
0 01-01-2015 234 25-05-2017 633 03-06-2016 935 30-10-2019 673 16-12-2020 825 06-07-2019 7.261803e+16
1 01-01-2015 664 25-05-2017 663 03-06-2016 665 30-10-2019 663 16-12-2020 665 06-07-2019 6.624869e+16
或者,如果您仍想使用现有代码,这里有一个简单的修复方法:
for i in ds.index:
total = 0 # Moved inside outer for-loop
for letter in letters:
start_date = parser.parse(ds["Start_Date"][i])
end_date = parser.parse(ds["End_Date_" + letter][i])
years = (end_date - start_date).days / 365
power = math.pow(int(ds[letter][i]), years)
total+= power
ds.loc[i, 'Overall'] = total # Notice the change here
我想执行一个计算,其中每个字母的结束日期都需要减去 Start_Date,
,然后除以 365 以获得以年为单位的持续时间。然后每个字母列的持续时间需要分别用作其字母列值的'power of'。然后需要将每个字母的结果相加得到总数。
我尝试使用下面的代码进行解析,我设法得到了答案。
import pandas as pd
dataset = [['01-01-2015', 234, '25-05-2017', 633, '03-06-2016', 935, '30-10-2019', 673, '16-12-2020', 825, '06-07-2019'],
['01-01-2015', 664, '25-05-2017', 663, '03-06-2016', 665, '30-10-2019', 663, '16-12-2020', 665, '06-07-2019']]
ds = pd.DataFrame(dataset, columns = ['Start_Date', 'A', 'End_Date_A', 'B', 'End_Date_B', 'C', 'End_Date_C',
'D', 'End_Date_D', 'E', 'End_Date_E'])
Start_Date A End_Date_A B End_Date_B C End_Date_C D End_Date_D E End_Date_E
0 01-01-2015 234 25-05-2017 633 03-06-2016 935 30-10-2019 673 16-12-2020 825 06-07-2019
1 01-01-2015 664 25-05-2017 663 03-06-2016 665 30-10-2019 663 16-12-2020 665 06-07-2019
from dateutil import parser
import math
letters = ["A", "B", "C", "D", "E"]
total = 0
for i in ds.index:
for letter in letters:
start_date = parser.parse(ds["Start_Date"][i])
end_date = parser.parse(ds["End_Date_" + letter][i])
years = (end_date - start_date).days / 365
power = math.pow(int(ds[letter][i]), years)
total+= power
ds['Overall'] = total
但是,它对每一行显示相同的结果。
Start_Date A End_Date_A B End_Date_B C End_Date_C D End_Date_D E End_Date_E Overall
0 01-01-2015 234 25-05-2017 633 03-06-2016 935 30-10-2019 673 16-12-2020 825 06-07-2019 1.388585e+17
1 01-01-2015 664 25-05-2017 663 03-06-2016 665 30-10-2019 663 16-12-2020 665 06-07-2019 1.388585e+17
是否有任何其他提示可以执行此操作并根据每行值获取总数?
这里不需要使用for-loop,我们可以使用基于向量化pandas的方法:
letters = pd.Index(['A', 'B', 'C', 'D', 'E'])
start = pd.to_datetime(ds['Start_Date'], dayfirst=True)
dates = ds['End_Date_' + letters].apply(pd.to_datetime, dayfirst=True)
years = dates.sub(start, axis=0).astype('timedelta64[D]').div(365)
ds['Overall'] = ds[letters].pow(years.values).sum(1)
结果
print(ds)
Start_Date A End_Date_A B End_Date_B C End_Date_C D End_Date_D E End_Date_E Overall
0 01-01-2015 234 25-05-2017 633 03-06-2016 935 30-10-2019 673 16-12-2020 825 06-07-2019 7.261803e+16
1 01-01-2015 664 25-05-2017 663 03-06-2016 665 30-10-2019 663 16-12-2020 665 06-07-2019 6.624869e+16
或者,如果您仍想使用现有代码,这里有一个简单的修复方法:
for i in ds.index:
total = 0 # Moved inside outer for-loop
for letter in letters:
start_date = parser.parse(ds["Start_Date"][i])
end_date = parser.parse(ds["End_Date_" + letter][i])
years = (end_date - start_date).days / 365
power = math.pow(int(ds[letter][i]), years)
total+= power
ds.loc[i, 'Overall'] = total # Notice the change here