如何按位置查找另一列在不同行中具有多个值的列值的总长度
How to find the total length of a column value that has multiple values in different rows for another column BY LOCATION
这是 的 第 2 部分 问题。
有没有办法找到同时有Apple和Strawberry的ID,然后求出总长度?和只有 Apple 的 ID,以及只有 Strawberry 的 IDS? 基于位置
df:
ID Fruit Location
0 ABC Apple NY <-ABC has Apple and Strawberry
1 ABC Strawberry NY <-ABC has Apple and Strawberry
2 EFG Apple LA <-EFG has Apple only
3 XYZ Apple HOUSTON <-XYZ has Apple and Strawberry
4 XYZ Strawberry HOUSTON <-XYZ has Apple and Strawberry
5 CDF Strawberry BOSTON <-CDF has Strawberry
6 AAA Apple CHICAGO <-AAA has Apple only
期望的输出:
IDs that has Apple and Strawberry:
NY 1
HOUSTON 1
IDs that has Apple only:
LA 1
CHICAGO 1
IDs that has Strawberry only:
BOSTON 1
之前的代码是:
v = ['Apple','Strawberry']
out = df.groupby('ID')['Fruit'].apply(lambda x: set(x) == set(v)).sum()
print(out)
>>> 2
我尝试了以下方法,但没有用,结果相同
v = ['Apple','Strawberry']
out = df.groupby('ID', 'LOCATION')['Fruit'].apply(lambda x: set(x) == set(v)).sum()
print(out)
>>> 2
谢谢!
使用 groupby
和 apply
的低效解决方案
x = df.groupby('ID').agg({ 'Fruit': lambda x: tuple(x), 'Location': 'first'})
y=x.groupby('Fruit')['Location'].value_counts()
y:
Fruit Location
(Apple,) CHICAGO 1
LA 1
(Apple, Strawberry) HOUSTON 1
NY 1
(Strawberry,) BOSTON 1
Name: Location, dtype: int64
for index in set(y.index.get_level_values(0)):
if len(index)==2:
print(f"IDs that has {index[0]} and {index[1]}:")
print(y.loc[index].to_string())
else:
print(f"IDs that has {index[0]} only:")
print(y.loc[index].to_string())
IDs that has Apple only:
Location
CHICAGO 1
LA 1
IDs that has Apple and Strawberry:
Location
HOUSTON 1
NY 1
IDs that has Strawberry only:
Location
BOSTON 1
这是
有没有办法找到同时有Apple和Strawberry的ID,然后求出总长度?和只有 Apple 的 ID,以及只有 Strawberry 的 IDS? 基于位置
df:
ID Fruit Location
0 ABC Apple NY <-ABC has Apple and Strawberry
1 ABC Strawberry NY <-ABC has Apple and Strawberry
2 EFG Apple LA <-EFG has Apple only
3 XYZ Apple HOUSTON <-XYZ has Apple and Strawberry
4 XYZ Strawberry HOUSTON <-XYZ has Apple and Strawberry
5 CDF Strawberry BOSTON <-CDF has Strawberry
6 AAA Apple CHICAGO <-AAA has Apple only
期望的输出:
IDs that has Apple and Strawberry:
NY 1
HOUSTON 1
IDs that has Apple only:
LA 1
CHICAGO 1
IDs that has Strawberry only:
BOSTON 1
之前的代码是:
v = ['Apple','Strawberry']
out = df.groupby('ID')['Fruit'].apply(lambda x: set(x) == set(v)).sum()
print(out)
>>> 2
我尝试了以下方法,但没有用,结果相同
v = ['Apple','Strawberry']
out = df.groupby('ID', 'LOCATION')['Fruit'].apply(lambda x: set(x) == set(v)).sum()
print(out)
>>> 2
谢谢!
使用 groupby
和 apply
x = df.groupby('ID').agg({ 'Fruit': lambda x: tuple(x), 'Location': 'first'})
y=x.groupby('Fruit')['Location'].value_counts()
y:
Fruit Location
(Apple,) CHICAGO 1
LA 1
(Apple, Strawberry) HOUSTON 1
NY 1
(Strawberry,) BOSTON 1
Name: Location, dtype: int64
for index in set(y.index.get_level_values(0)):
if len(index)==2:
print(f"IDs that has {index[0]} and {index[1]}:")
print(y.loc[index].to_string())
else:
print(f"IDs that has {index[0]} only:")
print(y.loc[index].to_string())
IDs that has Apple only:
Location
CHICAGO 1
LA 1
IDs that has Apple and Strawberry:
Location
HOUSTON 1
NY 1
IDs that has Strawberry only:
Location
BOSTON 1