从每一行的最大值中减去最小值,Python Pandas DataFrame

Subtract the minimum value from the maximum value across each row, Python Pandas DataFrame

我有一个按州名和县名索引的人口普查数据集,我想遍历每一行以找到标记为 'population estimate in each year' 的所有列的最大值和最小值,然后减去这两个值。我希望函数 return 具有索引和值的 Pandas 系列。

这是我当前的代码:

columns_to_keep=[
    'STNAME',
    'CTYNAME',
    'POPESTIMATE2010',
    'POPESTIMATE2011',
    'POPESTIMATE2012',
    'POPESTIMATE2013',
    'POPESTIMATE2014',
    'POPESTIMATE2015' 
]
df=census_df[columns_to_keep]

def answer_seven(lst):
    lst=[df['POPESTIMATE2010'],df['POPESTIMATE2011'],df['POPESTIMATE2012'],
             df['POPESTIMATE2013'],df['POPESTIMATE2014'],df['POPESTIMATE2015']]

    return max(lst)-min(lst)

answer_seven(lst)

错误信息:

ValueError                                Traceback (most recent call last)
<ipython-input-110-845350b0b5f7> in <module>()
     18     return max(lst)-min(lst)
     19 
---> 20 answer_seven(lst)
     21 

<ipython-input-110-845350b0b5f7> in answer_seven(lst)
     16              df['POPESTIMATE2013'],df['POPESTIMATE2014'],df['POPESTIMATE2015']]
     17 
---> 18     return max(lst)-min(lst)
     19 
     20 answer_seven(lst)

/opt/conda/lib/python3.5/site-packages/pandas/core/generic.py in __nonzero__(self)
    890         raise ValueError("The truth value of a {0} is ambiguous. "
    891                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 892                          .format(self.__class__.__name__))
    893 
    894     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

或者考虑numpy.ptp速度:

Range of values (maximum - minimum) along an axis.

np.ptp(df[cols_of_interest].values, axis=1)

Pandas可以直接这样做:

cols_of_interest = ['POPESTIMATE2010', 'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013', 'POPESTIMATE2014' , 'POPESTIMATE2015']
df[cols_of_interest].max(axis=1) - df[cols_of_interest].min(axis=1)

这 return 将是一个由数据框的原始索引索引的系列,每行的最大值减去最小值

我在处理需要保留的 NaN 值时遇到了问题,并使用了以下内容:

x = {}
for col in df_count:
    x[col] = df_count[col].max()- df_count[col].min()
pd.Series(x)