如何使用访问当前行和上一行数据的函数向数据框添加新列?
How can I add a new column to a data frame using a function that accesses data from both current and previous row?
我有一个包含几天数据的数据框:代码
import pandas
[...]
daily_data_f = pandas.DataFrame(daily_data, columns = ['Day', 'Total TODO/TODOE count'])
print(daily_data_f)
生成以下输出:
Day Total TODO/TODOE count
0 2020-05-16 35
1 2020-05-17 35
2 2020-05-18 35
3 2020-05-19 35
4 2020-05-20 35
.. ... ...
64 2020-07-18 35
65 2020-07-19 35
66 2020-07-20 35
68 2020-07-21 151
我想计算 Total TODO/TODOE count
在随后两天的值之间的差异。该值从 2020-06-28 的 35 跃升至 2020-07-21 的 151。我要为 2020-07-21 151-35=116
.
计算的值
建议采用这种方法:
df['new_column_name'] = df.apply(lambda x: my_function(x['value_1'], x['value_2']), axis=1)
我必须写这样的东西:
daily_data_f['First Derivative'] = daily_data_f.apply(lambda x:diff(daily_data_f['Total TODO/TODOE count'], <PREVIOUS_VALUE>), axis=1)
其中 <PREVIOUS_VALUE>
是前一行(天)中 'Total TODO/TODOE count'
的值。
问题:如何为 <PREVIOUS_VALUE>
(上一行的 'Total TODO/TODOE count'
的值)编写表达式?
这应该有效:
df['day_before']= np.nan
df['diff']= np.nan
df['day_before'][0] = df['Total TODO/TODOE count'][0] #to avoid null in the first row
df['day_before'] = df['Total TODO/TODOE count'].shift(1)
df['diff'] = df['Total TODO/TODOE count'] - df['day_before']
您将在 diff 列中看到差异。
您可以使用 numpy.diff
或 pandas.DataFrame.diff
,如下所示,numpy 方法应该稍微快一些:
numpy:
import numpy as np
df['diff'] = np.diff(df['Total TODO/TODOE count'], prepend=np.nan)
pandas:
import pandas as pd
df['diff'] = df['Total TODO/TODOE count'].diff()
输出:
Day Total TODO/TODOE count diff
0 2020-05-16 35 NaN
1 2020-05-17 35 0.0
2 2020-05-18 35 0.0
3 2020-05-19 35 0.0
4 2020-05-20 35 0.0
64 2020-07-18 35 0.0
65 2020-07-19 35 0.0
66 2020-07-20 35 0.0
68 2020-07-21 151 116.0
我有一个包含几天数据的数据框:代码
import pandas
[...]
daily_data_f = pandas.DataFrame(daily_data, columns = ['Day', 'Total TODO/TODOE count'])
print(daily_data_f)
生成以下输出:
Day Total TODO/TODOE count
0 2020-05-16 35
1 2020-05-17 35
2 2020-05-18 35
3 2020-05-19 35
4 2020-05-20 35
.. ... ...
64 2020-07-18 35
65 2020-07-19 35
66 2020-07-20 35
68 2020-07-21 151
我想计算 Total TODO/TODOE count
在随后两天的值之间的差异。该值从 2020-06-28 的 35 跃升至 2020-07-21 的 151。我要为 2020-07-21 151-35=116
.
df['new_column_name'] = df.apply(lambda x: my_function(x['value_1'], x['value_2']), axis=1)
我必须写这样的东西:
daily_data_f['First Derivative'] = daily_data_f.apply(lambda x:diff(daily_data_f['Total TODO/TODOE count'], <PREVIOUS_VALUE>), axis=1)
其中 <PREVIOUS_VALUE>
是前一行(天)中 'Total TODO/TODOE count'
的值。
问题:如何为 <PREVIOUS_VALUE>
(上一行的 'Total TODO/TODOE count'
的值)编写表达式?
这应该有效:
df['day_before']= np.nan
df['diff']= np.nan
df['day_before'][0] = df['Total TODO/TODOE count'][0] #to avoid null in the first row
df['day_before'] = df['Total TODO/TODOE count'].shift(1)
df['diff'] = df['Total TODO/TODOE count'] - df['day_before']
您将在 diff 列中看到差异。
您可以使用 numpy.diff
或 pandas.DataFrame.diff
,如下所示,numpy 方法应该稍微快一些:
numpy:
import numpy as np
df['diff'] = np.diff(df['Total TODO/TODOE count'], prepend=np.nan)
pandas:
import pandas as pd
df['diff'] = df['Total TODO/TODOE count'].diff()
输出:
Day Total TODO/TODOE count diff
0 2020-05-16 35 NaN
1 2020-05-17 35 0.0
2 2020-05-18 35 0.0
3 2020-05-19 35 0.0
4 2020-05-20 35 0.0
64 2020-07-18 35 0.0
65 2020-07-19 35 0.0
66 2020-07-20 35 0.0
68 2020-07-21 151 116.0