如何根据 table 范围内的值从数据框中提取 select 行

Question

我有 table 个范围 (start,end):

name     blue         green          yellow        purple              
a        1,5                         654,678       11,15
b                     88761,88776  
c        1211,1215                   38,47    
d        89,95                                     1567,1578

像这样的数据框：

Supplier        colour                   
Abi             1                               
John            678          
Smith           120               
Tim             1570 
Don             87560

如何筛选 df 以仅包含 colour 列中的值在 table 中提供的范围内的行？我希望最后的 df 看起来像这样：

Supplier        colour                   
Abi             1                               
John            678                         
Tim             1570

谢谢！

Answer 1

尝试使用列表理解和 loc:

l = [x for i in df1[df1.columns[1:]].values.flatten().tolist() if ',' in str(i) for x in range(int(i.split(',')[0]), int(i.split(',')[1]) + 1)]
print(df2.loc[df2['colour'].isin(l)].reset_index(drop=True))

输出：

     Supplier        colour                   
0    Abi             1                               
1    John            678                         
2    Tim             1570

Answer 2

尝试：

首先通过replace()方法将' '替换为NaN：

df1=df1.replace(r'\s+',float('NaN'),regex=True)
                  #^ it will replace one or more occurence of ' '

想法是使字符串范围成为组合范围值的实际列表：

s=df1.set_index('name').stack().dropna() 
s=s.str.split(',').map(lambda x:range(int(x[0]),int(x[1])+1)).explode().unique()

最后：

out=df2[df2['colour'].isin(s)]
#OR
out=df2.loc[df2['colour'].isin(s)]

out的输出：

    Supplier    colour
0   Abi          1
1   John        678
3   Tim         1570

Answer 3

使用pd.cut和pd.IntervalIndex:

tups = table.set_index('name').unstack() \
            .replace(r'\s+', float('nan'), regex=True).dropna() \
            .apply(lambda x: tuple([int(i) for i in x.split(',')])).values

ii = pd.IntervalIndex.from_tuples(tups, closed='both')

out = supplier.loc[pd.cut(supplier['colour'], ii).dropna().index]

>>> out
  Supplier  colour
0      Abi       1
1     John     678
3      Tim    1570

如何根据 table 范围内的值从数据框中提取 select 行

How to select rows from a data frame based on the values in a table of ranges

python

range

dataframe

pandas

pandas-groupby