显示不包括零的最小值以及每年的相邻列值 + Python 3+, dataframe
Display minimum value excluding zero along with adjacent column value from each year + Python 3+, dataframe
我有一个包含三列的数据框:年份、产品、价格。我想计算每年价格中不包括零的最小值。还想将 Product 列中的相邻值填充到最小值。
数据:
Year Product Price
2000 Grapes 0
2000 Apple 220
2000 pear 185
2000 Watermelon 172
2001 Orange 0
2001 Muskmelon 90
2001 Pear 165
2001 Watermelon 99
新数据帧中的理想输出:
Year Minimum Price Product
2000 172 Watermelon
2001 90 Muskmelon
首先通过boolean indexing
过滤掉0
行:
df1 = df[df['Price'] != 0]
然后使用DataFrameGroupBy.idxmin
for indices for minimal Price
per groups with selecting by loc
:
df2 = df1.loc[df1.groupby('Year')['Price'].idxmin()]
替代方法是使用 sort_values
with drop_duplicates
:
df2 = df1.sort_values(['Year', 'Price']).drop_duplicates('Year')
print (df2)
Year Product Price
3 2000 Watermelon 172
5 2001 Muskmelon 90
如果可能,多个最小值并且每个组都需要它们:
print (df)
Year Product Price
0 2000 Grapes 0
1 2000 Apple 220
2 2000 pear 172
3 2000 Watermelon 172
4 2001 Orange 0
5 2001 Muskmelon 90
6 2001 Pear 165
7 2001 Watermelon 99
df1 = df[df['Price'] != 0]
df = df1[df1['Price'].eq(df1.groupby('Year')['Price'].transform('min'))]
print (df)
Year Product Price
2 2000 pear 172
3 2000 Watermelon 172
5 2001 Muskmelon 90
编辑:
print (df)
Year Product Price
0 2000 Grapes 0
1 2000 Apple 220
2 2000 pear 185
3 2000 Watermelon 172
4 2001 Orange 0
5 2001 Muskmelon 90
6 2002 Pear 0
7 2002 Watermelon 0
df['Price'] = df['Price'].replace(0, np.nan)
df2 = df.sort_values(['Year', 'Price']).drop_duplicates('Year')
df2['Product'] = df2['Product'].mask(df2['Price'].isnull(), 'No data')
print (df2)
Year Product Price
3 2000 Watermelon 172.0
5 2001 Muskmelon 90.0
6 2002 No data NaN
我有一个包含三列的数据框:年份、产品、价格。我想计算每年价格中不包括零的最小值。还想将 Product 列中的相邻值填充到最小值。
数据:
Year Product Price
2000 Grapes 0
2000 Apple 220
2000 pear 185
2000 Watermelon 172
2001 Orange 0
2001 Muskmelon 90
2001 Pear 165
2001 Watermelon 99
新数据帧中的理想输出:
Year Minimum Price Product
2000 172 Watermelon
2001 90 Muskmelon
首先通过boolean indexing
过滤掉0
行:
df1 = df[df['Price'] != 0]
然后使用DataFrameGroupBy.idxmin
for indices for minimal Price
per groups with selecting by loc
:
df2 = df1.loc[df1.groupby('Year')['Price'].idxmin()]
替代方法是使用 sort_values
with drop_duplicates
:
df2 = df1.sort_values(['Year', 'Price']).drop_duplicates('Year')
print (df2)
Year Product Price
3 2000 Watermelon 172
5 2001 Muskmelon 90
如果可能,多个最小值并且每个组都需要它们:
print (df)
Year Product Price
0 2000 Grapes 0
1 2000 Apple 220
2 2000 pear 172
3 2000 Watermelon 172
4 2001 Orange 0
5 2001 Muskmelon 90
6 2001 Pear 165
7 2001 Watermelon 99
df1 = df[df['Price'] != 0]
df = df1[df1['Price'].eq(df1.groupby('Year')['Price'].transform('min'))]
print (df)
Year Product Price
2 2000 pear 172
3 2000 Watermelon 172
5 2001 Muskmelon 90
编辑:
print (df)
Year Product Price
0 2000 Grapes 0
1 2000 Apple 220
2 2000 pear 185
3 2000 Watermelon 172
4 2001 Orange 0
5 2001 Muskmelon 90
6 2002 Pear 0
7 2002 Watermelon 0
df['Price'] = df['Price'].replace(0, np.nan)
df2 = df.sort_values(['Year', 'Price']).drop_duplicates('Year')
df2['Product'] = df2['Product'].mask(df2['Price'].isnull(), 'No data')
print (df2)
Year Product Price
3 2000 Watermelon 172.0
5 2001 Muskmelon 90.0
6 2002 No data NaN