Pandas DataFrame : 选择多个列中的多个元素
Pandas DataFrame : selection of multiple elements in several columns
我有这个 Python Pandas DataFrame DF
:
DICT = { 'letter': ['A','B','C','A','B','C','A','B','C'],
'number': [1,1,1,2,2,2,3,3,3],
'word' : ['one','two','three','three','two','one','two','one','three']}
DF = pd.DataFrame(DICT)
看起来像:
letter number word
0 A 1 one
1 B 1 two
2 C 1 three
3 A 2 three
4 B 2 two
5 C 2 one
6 A 3 two
7 B 3 one
8 C 3 three
我想提取行
letter number word
A 1 one
B 2 two
C 3 three
首先我累了:
DF[(DF['letter'].isin(("A","B","C"))) &
DF['number'].isin((1,2,3)) &
DF['word'].isin(('one','two','three'))]
当然不行,都选好了
然后我测试了:
Bool = DF[['letter','number','word']].isin(("A",1,"one"))
DF[np.all(Bool,axis=1)]
很好,有效!但只有一行......
如果我们采取下一步并给 .isin()
一个可迭代的:
Bool = DF[['letter','number','word']].isin((("A",1,"one"),
("B",2,"two"),
("C",3,"three")))
然后就失败了,布尔数组全是False ...
我做错了什么?有没有更优雅的方法来根据多列进行此选择?
(无论如何,我想避免 for
循环,因为我使用的真实 DataFrame 非常大,所以我正在寻找最快的最佳方式来完成这项工作)
想法是使用所有三重值创建新的 DataFrame
,然后使用原始 DataFrame
:
创建 merge
L = [("A",1,"one"),
("B",2,"two"),
("C",3,"three")]
df1 = pd.DataFrame(L, columns=['letter','number','word'])
print (df1)
letter number word
0 A 1 one
1 B 2 two
2 C 3 three
df = DF.merge(df1)
print (df)
letter number word
0 A 1 one
1 B 2 two
2 C 3 three
另一个想法是创建元组列表,转换为 Series
然后比较 isin
:
s = pd.Series(list(map(tuple, DF[['letter','number','word']].values.tolist())),index=DF.index)
df1 = DF[s.isin(L)]
print (df1)
letter number word
0 A 1 one
4 B 2 two
8 C 3 three
我有这个 Python Pandas DataFrame DF
:
DICT = { 'letter': ['A','B','C','A','B','C','A','B','C'],
'number': [1,1,1,2,2,2,3,3,3],
'word' : ['one','two','three','three','two','one','two','one','three']}
DF = pd.DataFrame(DICT)
看起来像:
letter number word
0 A 1 one
1 B 1 two
2 C 1 three
3 A 2 three
4 B 2 two
5 C 2 one
6 A 3 two
7 B 3 one
8 C 3 three
我想提取行
letter number word
A 1 one
B 2 two
C 3 three
首先我累了:
DF[(DF['letter'].isin(("A","B","C"))) &
DF['number'].isin((1,2,3)) &
DF['word'].isin(('one','two','three'))]
当然不行,都选好了
然后我测试了:
Bool = DF[['letter','number','word']].isin(("A",1,"one"))
DF[np.all(Bool,axis=1)]
很好,有效!但只有一行......
如果我们采取下一步并给 .isin()
一个可迭代的:
Bool = DF[['letter','number','word']].isin((("A",1,"one"),
("B",2,"two"),
("C",3,"three")))
然后就失败了,布尔数组全是False ...
我做错了什么?有没有更优雅的方法来根据多列进行此选择?
(无论如何,我想避免 for
循环,因为我使用的真实 DataFrame 非常大,所以我正在寻找最快的最佳方式来完成这项工作)
想法是使用所有三重值创建新的 DataFrame
,然后使用原始 DataFrame
:
merge
L = [("A",1,"one"),
("B",2,"two"),
("C",3,"three")]
df1 = pd.DataFrame(L, columns=['letter','number','word'])
print (df1)
letter number word
0 A 1 one
1 B 2 two
2 C 3 three
df = DF.merge(df1)
print (df)
letter number word
0 A 1 one
1 B 2 two
2 C 3 three
另一个想法是创建元组列表,转换为 Series
然后比较 isin
:
s = pd.Series(list(map(tuple, DF[['letter','number','word']].values.tolist())),index=DF.index)
df1 = DF[s.isin(L)]
print (df1)
letter number word
0 A 1 one
4 B 2 two
8 C 3 three