如何根据升序过滤列表?

How to filter a list based on ascending values?

我有以下 3 个列表:

minimal_values = ['0,32', '0,35', '0,45']
maximal_values = ['0,78', '0,85', '0,72']

my_list = [
    ['Morocco', 'Meat', '190,00', '0,15'], 
    ['Morocco', 'Meat', '189,90', '0,32'], 
    ['Morocco', 'Meat', '189,38', '0,44'],
    ['Morocco', 'Meat', '188,94', '0,60'],
    ['Morocco', 'Meat', '188,49', '0,78'],
    ['Morocco', 'Meat', '187,99', '0,70'],
    ['Spain', 'Meat', '190,76', '0,10'], 
    ['Spain', 'Meat', '190,16', '0,20'], 
    ['Spain', 'Meat', '189,56', '0,35'],
    ['Spain', 'Meat', '189,01', '0,40'],
    ['Spain', 'Meat', '188,13', '0,75'],
    ['Spain', 'Meat', '187,95', '0,85'],
    ['Italy', 'Meat', '190,20', '0,11'],
    ['Italy', 'Meat', '190,10', '0,31'], 
    ['Italy', 'Meat', '189,32', '0,45'],
    ['Italy', 'Meat', '188,61', '0,67'],
    ['Italy', 'Meat', '188,01', '0,72'],
    ['Italy', 'Meat', '187,36', '0,55']]

如果 index [-1] 介于 minimal_values 中的值和 maximal_values 中的值之间,我正在尝试过滤 my_list。这些值是最小值和最大按国家。我还在列表中做减法。所以对于摩洛哥我只想要 index[-1]0,320,78 等之间的行。问题是在 0,78 之后值下降到 0,70 这意味着该行也满足 if 语句。

注意:my_list-1中的值是先升后降。我只想要上升部分的行,而不是下降部分的行。我不确定如何解决这个问题。

这是我的代码:

price = 500

# Convert values to float.
minimal_values = [float(i.replace(',', '.')) for i in minimal_values]
maximal_values = [float(i.replace(',', '.')) for i in maximal_values]

# Collect all unique countries in a list.
countries = list(set(country[0] for country in my_list))

results = []
for l in my_list:
    i = countries.index(l[0])
    if minimal_values[i] <= float(l[-1].replace(',', '.')) <= maximal_values[i]:
        new_index_2 = price - float(l[-2].replace(',', '.'))
        l[-2] = new_index_2
        results.append(l)

print(results)

这是我当前的输出:

[['Morocco', 'Meat', '189.90', '0,32'], 
['Morocco', 'Meat', 310.62, '0,44'], 
['Morocco', 'Meat', 311.06, '0,60'], 
['Morocco', 'Meat', 311.51, '0,78'], 
['Morocco', 'Meat', 312.01, '0,70'], 
['Spain', 'Meat', 310.44, '0,35'], 
['Spain', 'Meat', 310.99, '0,40'], 
['Spain', 'Meat', 311.87, '0,75'], 
['Spain', 'Meat', '312.05', '0,85'],
['Italy', 'Meat', 310.68, '0,45'], 
['Italy', 'Meat', 311.39, '0,67'], 
['Italy', 'Meat', 311.99, '0,72'], 
['Italy', 'Meat', 312.64, '0,55']]

这是我想要的输出:

 [['Morocco', 'Meat', '189.90', '0,32'], 
    ['Morocco', 'Meat', 310.62, '0,44'], 
    ['Morocco', 'Meat', 311.06, '0,60'], 
    ['Morocco', 'Meat', 311.51, '0,78'], 
    ['Spain', 'Meat', 310.44, '0,35'], 
    ['Spain', 'Meat', 310.99, '0,40'], 
    ['Spain', 'Meat', 311.87, '0,75'],
    ['Spain', 'Meat', '312.05', '0,85'], 
    ['Italy', 'Meat', 310.68, '0,45'], 
    ['Italy', 'Meat', 311.39, '0,67'], 
    ['Italy', 'Meat', 311.99, '0,72']]

*****Pandas也欢迎相关回答


minimal_values = [float(i.replace(',', '.')) for i in minimal_values]
maximal_values = [float(i.replace(',', '.')) for i in maximal_values]

countries_largest = {}
filtered_list = []
for row in my_list:
    country_name = row[0]
    value = float(row[-1].replace(',','.'))
    if country_name in countries_largest and value < countries_largest[country_name]:
        continue
    countries_largest[country_name] = value
    if not (minimal_values[len(countries_largest)-1] <= value <= maximal_values[len(countries_largest)-1]):
        continue
    filtered_list.append(row)
[['Morocco', 'Meat', '189,90', '0,32'],
 ['Morocco', 'Meat', '189,38', '0,44'],
 ['Morocco', 'Meat', '188,94', '0,60'],
 ['Morocco', 'Meat', '188,49', '0,78'],
 ['Spain', 'Meat', '189,56', '0,35'],
 ['Spain', 'Meat', '189,01', '0,40'],
 ['Spain', 'Meat', '188,13', '0,75'],
 ['Spain', 'Meat', '187,95', '0,85'],
 ['Italy', 'Meat', '189,32', '0,45'],
 ['Italy', 'Meat', '188,61', '0,67'],
 ['Italy', 'Meat', '188,01', '0,72']]


请注意,您的代码存在问题,因为 countries 的元素顺序不一定与 my_list 中的国家/地区顺序相同。在处理列表时处理国家更容易,在国家名称更改时记下。然后,您可以在循环中添加一个标志,指示该国家/地区的处理已完成(当当前值小于先前值时),如果是这样,则忽略该国家/地区的剩余值:

# Convert values to float.
minimal_values = [float(i.replace(',', '.')) for i in minimal_values]
maximal_values = [float(i.replace(',', '.')) for i in maximal_values]

# Collect all unique countries in a list.
results = []
finished_country = -1
country_index = -1
last_country = ''
for l in my_list:
    country = l[0]
    if country != last_country:
        country_index += 1
    last_country = country
    value = float(l[-1].replace(',', '.'))
    if finished_country == country_index or value < minimal_values[country_index]:
        last_value = 0
        continue
    if value < last_value:
        finished_country = country_index
    elif value <= maximal_values[country_index]:
        new_index_2 = price - float(l[-2].replace(',', '.'))
        l[-2] = new_index_2
        results.append(l)
    last_value = value

示例数据的输出:

[
 ['Morocco', 'Meat', 310.1, '0,32'],
 ['Morocco', 'Meat', 310.62, '0,44'],
 ['Morocco', 'Meat', 311.06, '0,60'],
 ['Morocco', 'Meat', 311.51, '0,78'],
 ['Spain', 'Meat', 310.44, '0,35'],
 ['Spain', 'Meat', 310.99, '0,40'],
 ['Spain', 'Meat', 311.87, '0,75'],
 ['Spain', 'Meat', 312.05, '0,85'],
 ['Italy', 'Meat', 310.68, '0,45'],
 ['Italy', 'Meat', 311.39, '0,67'],
 ['Italy', 'Meat', 311.99, '0,72']
]

pandas 解决方案:

import pandas as pd
import numpy as np

# create input dataframe
my_list = [
    ['Morocco', 'Meat', '190,00', '0,15'], 
    ['Morocco', 'Meat', '189,90', '0,32'], 
    ['Morocco', 'Meat', '189,38', '0,44'],
    ['Morocco', 'Meat', '188,94', '0,60'],
    ['Morocco', 'Meat', '188,49', '0,78'],
    ['Morocco', 'Meat', '187,99', '0,70'],
    ['Spain', 'Meat', '190,76', '0,10'], 
    ['Spain', 'Meat', '190,16', '0,20'], 
    ['Spain', 'Meat', '189,56', '0,35'],
    ['Spain', 'Meat', '189,01', '0,40'],
    ['Spain', 'Meat', '188,13', '0,75'],
    ['Spain', 'Meat', '187,95', '0,85'],
    ['Italy', 'Meat', '190,20', '0,11'],
    ['Italy', 'Meat', '190,10', '0,31'], 
    ['Italy', 'Meat', '189,32', '0,45'],
    ['Italy', 'Meat', '188,61', '0,67'],
    ['Italy', 'Meat', '188,01', '0,72'],
    ['Italy', 'Meat', '187,36', '0,55']]

dfi = pd.DataFrame(my_list).applymap(lambda x: x.replace(',', '.'))
dfi[[2, 3]] = dfi[[2, 3]].astype(float)
print(dfi)

#         0     1       2     3
# 0   Morocco  Meat  190.00  0.15
# 1   Morocco  Meat  189.90  0.32
# 2   Morocco  Meat  189.38  0.44
# 3   Morocco  Meat  188.94  0.60
# 4   Morocco  Meat  188.49  0.78
# 5   Morocco  Meat  187.99  0.70
# 6     Spain  Meat  190.76  0.10
# 7     Spain  Meat  190.16  0.20
# 8     Spain  Meat  189.56  0.35
# 9     Spain  Meat  189.01  0.40
# 10    Spain  Meat  188.13  0.75
# 11    Spain  Meat  187.95  0.85
# 12    Italy  Meat  190.20  0.11
# 13    Italy  Meat  190.10  0.31
# 14    Italy  Meat  189.32  0.45
# 15    Italy  Meat  188.61  0.67
# 16    Italy  Meat  188.01  0.72
# 17    Italy  Meat  187.36  0.55

# create df_filter with contry and min_v, max_v
minimal_values = ['0,32', '0,35', '0,45']
maximal_values = ['0,78', '0,85', '0,72']
minimal_values = [float(i.replace(',', '.')) for i in minimal_values]
maximal_values = [float(i.replace(',', '.')) for i in maximal_values]

df_filter = pd.DataFrame(list(zip(dfi[0].unique().tolist(),
                                  minimal_values,
                                  maximal_values)))
df_filter.columns = [0, 'min_v', 'max_v']
print(df_filter)
#          0  min_v  max_v
# 0  Morocco   0.32   0.78
# 1    Spain   0.35   0.85
# 2    Italy   0.45   0.72

# merge dfi and fi_filter
dfm = pd.merge(dfi, df_filter, on=0, how='left')
print(dfm)

#          0     1       2     3  min_v  max_v
# 0   Morocco  Meat  190.00  0.15   0.32   0.78
# 1   Morocco  Meat  189.90  0.32   0.32   0.78
# 2   Morocco  Meat  189.38  0.44   0.32   0.78
# 3   Morocco  Meat  188.94  0.60   0.32   0.78
# 4   Morocco  Meat  188.49  0.78   0.32   0.78
# 5   Morocco  Meat  187.99  0.70   0.32   0.78
# 6     Spain  Meat  190.76  0.10   0.35   0.85
# 7     Spain  Meat  190.16  0.20   0.35   0.85
# 8     Spain  Meat  189.56  0.35   0.35   0.85
# 9     Spain  Meat  189.01  0.40   0.35   0.85
# 10    Spain  Meat  188.13  0.75   0.35   0.85
# 11    Spain  Meat  187.95  0.85   0.35   0.85
# 12    Italy  Meat  190.20  0.11   0.45   0.72
# 13    Italy  Meat  190.10  0.31   0.45   0.72
# 14    Italy  Meat  189.32  0.45   0.45   0.72
# 15    Italy  Meat  188.61  0.67   0.45   0.72
# 16    Italy  Meat  188.01  0.72   0.45   0.72
# 17    Italy  Meat  187.36  0.55   0.45   0.72

# filter min_v <= column 3 <= max_v
cond = dfm[3].ge(dfm.min_v) & dfm[3].le(dfm.max_v)
dfm = dfm[cond].copy()

# filter 3 that is not ascending
cond = dfm.groupby(0)[3].diff() < 0
dfo = dfm.loc[~cond, [0,1,2,3]].reset_index(drop=True)

# outut result
price = 500
dfo[2] = price - dfo[2]

print(dfo)

#           0     1       2     3
# 0   Morocco  Meat  310.10  0.32
# 1   Morocco  Meat  310.62  0.44
# 2   Morocco  Meat  311.06  0.60
# 3   Morocco  Meat  311.51  0.78
# 4     Spain  Meat  310.44  0.35
# 5     Spain  Meat  310.99  0.40
# 6     Spain  Meat  311.87  0.75
# 7     Spain  Meat  312.05  0.85
# 8     Italy  Meat  310.68  0.45
# 9     Italy  Meat  311.39  0.67
# 10    Italy  Meat  311.99  0.72

给定:

minimal_values = ['0,32', '0,35', '0,45']
maximal_values = ['0,78', '0,85', '0,72']

my_list = [
    ['Morocco', 'Meat', '190,00', '0,15'], 
    ['Morocco', 'Meat', '189,90', '0,32'], 
    ['Morocco', 'Meat', '189,38', '0,44'],
    ['Morocco', 'Meat', '188,94', '0,60'],
    ['Morocco', 'Meat', '188,49', '0,78'],
    ['Morocco', 'Meat', '187,99', '0,70'],
    ['Spain', 'Meat', '190,76', '0,10'], 
    ['Spain', 'Meat', '190,16', '0,20'], 
    ['Spain', 'Meat', '189,56', '0,35'],
    ['Spain', 'Meat', '189,01', '0,40'],
    ['Spain', 'Meat', '188,13', '0,75'],
    ['Spain', 'Meat', '187,95', '0,85'],
    ['Italy', 'Meat', '190,20', '0,11'],
    ['Italy', 'Meat', '190,10', '0,31'], 
    ['Italy', 'Meat', '189,32', '0,45'],
    ['Italy', 'Meat', '188,61', '0,67'],
    ['Italy', 'Meat', '188,01', '0,72'],
    ['Italy', 'Meat', '187,36', '0,55']]

首先,由于我们将大量使用它,所以让我们编写一个小的转换例程来标准化我们在您的情况下 'float' 的含义:

def conv(s):
    try:
        return float(s.replace(',','.'))
    except ValueError:
        return s

现在看来,您的两个字符串列表 minimal_valuesmaximal_values 是按国家/地区映射到最小值和最大值。如果是这样,您对 countries = list(set(country[0] for country in my_list)) 的使用将不起作用,因为集合在 Python.

的所有版本中都是任意顺序的

如果你有 Python 3.6+,你可以:

countries = list({}.fromkeys(country[0] for country in my_list))

因为字典在 Python 3.6+ 中保留了插入顺序。假设您想要适用于所有版本的 Python,您可以改为:

def uniqs_in_order(li):
    seen=set()
    return [e for e in li if not (e in seen or seen.add(e))]
    # Python 3.6+: return list({}.fromkeys(li))

现在您可以为该国家/地区创建 min/max 值的 country:tuple 映射:

mapping={k:(min_, max_) for k,min_,max_ in 
    zip(uniqs_in_order([sl[0] for sl in my_list]), 
                        [conv(s) for s in minimal_values], 
                        [conv(s) for s in maximal_values])}

>>> mapping
{'Morocco': (0.32, 0.78), 'Spain': (0.35, 0.85), 'Italy': (0.45, 0.72)}

现在,我们终于可以过滤了。由于您只想采用以下值:

  1. 在国家/地区的最小值和最大值内,并且;
  2. 当国家/地区的值不再上升时停止。

我们可以使用 itertools 中的 groupby 来按国家划分列表列表并执行这两个测试:

from itertools import groupby

filt=[]
price = 500
for k,v in groupby(my_list, key=lambda sl: sl[0]):
    section=list(v)
    for i, row in enumerate(section):
        if i and conv(row[-1])<conv(section[i-1][-1]):
            break
        if mapping[row[0]][0]<=conv(row[-1])<=mapping[row[0]][1]:
            row[-2]=price-conv(row[-2])
            filt.append(row)        

>>> filt
[['Morocco', 'Meat', 310.1, '0,32'],
['Morocco', 'Meat', 310.62, '0,44'],
['Morocco', 'Meat', 311.06, '0,60'],
['Morocco', 'Meat', 311.51, '0,78'],
['Spain', 'Meat', 310.44, '0,35'],
['Spain', 'Meat', 310.99, '0,40'],
['Spain', 'Meat', 311.87, '0,75'],
['Spain', 'Meat', 312.05, '0,85'],
['Italy', 'Meat', 310.68, '0,45'],
['Italy', 'Meat', 311.39, '0,67'],
['Italy', 'Meat', 311.99, '0,72']]