如何根据 groupby.groups.keys() 筛选 pandas groupby 对象
How to filter pandas groupby object based on groupby.groups.keys()
我有 pandas 个数据帧 df1 和 df2
df1:
City Pop Homes Other
0 City_1 100 1 0
1 City_1 100 2 6
2 City_1 100 2 2
3 City_1 100 3 9
4 City_1 200 1 6
5 City_1 200 2 6
6 City_1 200 3 7
7 City_1 300 1 0
df2:
City Pop Homes Other
0 City_1 100 1 0
1 City_1 100 2 6
2 City_1 100 2 2
3 City_1 100 8 9
4 City_1 200 1 6
5 City_1 200 2 6
6 City_1 800 3 7
7 City_1 800 8 0
我想创建 df3,它具有与 df1 和 df2 相同的列,但只包含成对的 Pop 和 Homes 值相同的行。
df3:
City Pop Homes Other
0 City_1 100 1 0
1 City_1 100 2 6
2 City_1 100 2 2
4 City_1 200 1 6
5 City_1 200 2 6
为了得到 df1 和 df2 中的对,我做了:
df1_string = """
City_1 100 1 0
City_1 100 2 6
City_1 100 2 2
City_1 100 3 9
City_1 200 1 6
City_1 200 2 6
City_1 200 3 7
City_1 300 1 0"""
df2_string = """
City_1 100 1 0
City_1 100 2 6
City_1 100 2 2
City_1 100 8 9
City_1 200 1 6
City_1 200 2 6
City_1 800 3 7
City_1 800 8 0"""
df1 = pd.DataFrame([x.split() for x in df1_string.split('\n')], columns=['City', 'Pop', 'Homes', 'Other'])
df2 = pd.DataFrame([x.split() for x in df2_string.split('\n')], columns=['City', 'Pop', 'Homes', 'Other'])
df1_keys = [x for x in df1.groupby(['Pop', 'Homes']).groups.keys()]
df2_keys = [x for x in df2.groupby(['Pop', 'Homes']).groups.keys()]
print(df1_keys)
[('100', '1'), ('100', '2'), ('100', '3'), ('200', '1'), ('200', '2'), ('200', '3'), ('300', '1')]
print(df2_keys)
[('100', '1'), ('100', '2'), ('100', '8'), ('200', '1'), ('200', '2'), ('800', '3'), ('800', '8')]
但我不知道如何从这里过滤 df1。我以为会是这样的:
df1 = df1[df1.groupby(['Pop', 'Homes']).groups.keys().isin(df2.groupby(['Pop', 'Homes']).groups.keys())]
但这不起作用。
我还要提一下,df1 和 df2 的长度并不总是相同。
解决方案
df1.set_index(['Pop', 'Homes'], inplace=True)
df2.set_index(['Pop', 'Homes'], inplace=True)
df1 = df1[df2.index.isin(df1.index)]
df1.reset_index(inplace=True)
将索引设置为 Pop 和 Home 生成值 'pairs' 并使用 isin() 应用所需的过滤器:
df1.set_index(['Pop', 'Homes'], inplace=True)
df2.set_index(['Pop', 'Homes'], inplace=True)
df1 = df1[df2.index.isin(df1.index)]
df1.reset_index(inplace=True)
print(df1)
我有 pandas 个数据帧 df1 和 df2
df1:
City Pop Homes Other
0 City_1 100 1 0
1 City_1 100 2 6
2 City_1 100 2 2
3 City_1 100 3 9
4 City_1 200 1 6
5 City_1 200 2 6
6 City_1 200 3 7
7 City_1 300 1 0
df2:
City Pop Homes Other
0 City_1 100 1 0
1 City_1 100 2 6
2 City_1 100 2 2
3 City_1 100 8 9
4 City_1 200 1 6
5 City_1 200 2 6
6 City_1 800 3 7
7 City_1 800 8 0
我想创建 df3,它具有与 df1 和 df2 相同的列,但只包含成对的 Pop 和 Homes 值相同的行。
df3:
City Pop Homes Other
0 City_1 100 1 0
1 City_1 100 2 6
2 City_1 100 2 2
4 City_1 200 1 6
5 City_1 200 2 6
为了得到 df1 和 df2 中的对,我做了:
df1_string = """
City_1 100 1 0
City_1 100 2 6
City_1 100 2 2
City_1 100 3 9
City_1 200 1 6
City_1 200 2 6
City_1 200 3 7
City_1 300 1 0"""
df2_string = """
City_1 100 1 0
City_1 100 2 6
City_1 100 2 2
City_1 100 8 9
City_1 200 1 6
City_1 200 2 6
City_1 800 3 7
City_1 800 8 0"""
df1 = pd.DataFrame([x.split() for x in df1_string.split('\n')], columns=['City', 'Pop', 'Homes', 'Other'])
df2 = pd.DataFrame([x.split() for x in df2_string.split('\n')], columns=['City', 'Pop', 'Homes', 'Other'])
df1_keys = [x for x in df1.groupby(['Pop', 'Homes']).groups.keys()]
df2_keys = [x for x in df2.groupby(['Pop', 'Homes']).groups.keys()]
print(df1_keys)
[('100', '1'), ('100', '2'), ('100', '3'), ('200', '1'), ('200', '2'), ('200', '3'), ('300', '1')]
print(df2_keys)
[('100', '1'), ('100', '2'), ('100', '8'), ('200', '1'), ('200', '2'), ('800', '3'), ('800', '8')]
但我不知道如何从这里过滤 df1。我以为会是这样的:
df1 = df1[df1.groupby(['Pop', 'Homes']).groups.keys().isin(df2.groupby(['Pop', 'Homes']).groups.keys())]
但这不起作用。
我还要提一下,df1 和 df2 的长度并不总是相同。
解决方案
df1.set_index(['Pop', 'Homes'], inplace=True)
df2.set_index(['Pop', 'Homes'], inplace=True)
df1 = df1[df2.index.isin(df1.index)]
df1.reset_index(inplace=True)
将索引设置为 Pop 和 Home 生成值 'pairs' 并使用 isin() 应用所需的过滤器:
df1.set_index(['Pop', 'Homes'], inplace=True)
df2.set_index(['Pop', 'Homes'], inplace=True)
df1 = df1[df2.index.isin(df1.index)]
df1.reset_index(inplace=True)
print(df1)