如何根据 table 范围内的值从数据框中提取 select 行
How to select rows from a data frame based on the values in a table of ranges
我有 table 个范围 (start,end)
:
name blue green yellow purple
a 1,5 654,678 11,15
b 88761,88776
c 1211,1215 38,47
d 89,95 1567,1578
像这样的数据框:
Supplier colour
Abi 1
John 678
Smith 120
Tim 1570
Don 87560
如何筛选 df
以仅包含 colour
列中的值在 table 中提供的范围内的行?我希望最后的 df
看起来像这样:
Supplier colour
Abi 1
John 678
Tim 1570
谢谢!
尝试使用列表理解和 loc
:
l = [x for i in df1[df1.columns[1:]].values.flatten().tolist() if ',' in str(i) for x in range(int(i.split(',')[0]), int(i.split(',')[1]) + 1)]
print(df2.loc[df2['colour'].isin(l)].reset_index(drop=True))
输出:
Supplier colour
0 Abi 1
1 John 678
2 Tim 1570
尝试:
首先通过replace()
方法将' '
替换为NaN
:
df1=df1.replace(r'\s+',float('NaN'),regex=True)
#^ it will replace one or more occurence of ' '
想法是使字符串范围成为组合范围值的实际列表:
s=df1.set_index('name').stack().dropna()
s=s.str.split(',').map(lambda x:range(int(x[0]),int(x[1])+1)).explode().unique()
最后:
out=df2[df2['colour'].isin(s)]
#OR
out=df2.loc[df2['colour'].isin(s)]
out
的输出:
Supplier colour
0 Abi 1
1 John 678
3 Tim 1570
使用pd.cut
和pd.IntervalIndex
:
tups = table.set_index('name').unstack() \
.replace(r'\s+', float('nan'), regex=True).dropna() \
.apply(lambda x: tuple([int(i) for i in x.split(',')])).values
ii = pd.IntervalIndex.from_tuples(tups, closed='both')
out = supplier.loc[pd.cut(supplier['colour'], ii).dropna().index]
>>> out
Supplier colour
0 Abi 1
1 John 678
3 Tim 1570
我有 table 个范围 (start,end)
:
name blue green yellow purple
a 1,5 654,678 11,15
b 88761,88776
c 1211,1215 38,47
d 89,95 1567,1578
像这样的数据框:
Supplier colour
Abi 1
John 678
Smith 120
Tim 1570
Don 87560
如何筛选 df
以仅包含 colour
列中的值在 table 中提供的范围内的行?我希望最后的 df
看起来像这样:
Supplier colour
Abi 1
John 678
Tim 1570
谢谢!
尝试使用列表理解和 loc
:
l = [x for i in df1[df1.columns[1:]].values.flatten().tolist() if ',' in str(i) for x in range(int(i.split(',')[0]), int(i.split(',')[1]) + 1)]
print(df2.loc[df2['colour'].isin(l)].reset_index(drop=True))
输出:
Supplier colour
0 Abi 1
1 John 678
2 Tim 1570
尝试:
首先通过replace()
方法将' '
替换为NaN
:
df1=df1.replace(r'\s+',float('NaN'),regex=True)
#^ it will replace one or more occurence of ' '
想法是使字符串范围成为组合范围值的实际列表:
s=df1.set_index('name').stack().dropna()
s=s.str.split(',').map(lambda x:range(int(x[0]),int(x[1])+1)).explode().unique()
最后:
out=df2[df2['colour'].isin(s)]
#OR
out=df2.loc[df2['colour'].isin(s)]
out
的输出:
Supplier colour
0 Abi 1
1 John 678
3 Tim 1570
使用pd.cut
和pd.IntervalIndex
:
tups = table.set_index('name').unstack() \
.replace(r'\s+', float('nan'), regex=True).dropna() \
.apply(lambda x: tuple([int(i) for i in x.split(',')])).values
ii = pd.IntervalIndex.from_tuples(tups, closed='both')
out = supplier.loc[pd.cut(supplier['colour'], ii).dropna().index]
>>> out
Supplier colour
0 Abi 1
1 John 678
3 Tim 1570