如何基于另一个数据框列为 True 创建函数?
How to create a function based on another dataframe column being True?
我有一个如下所示的数据框:
Name X Y
0 A False True
1 B True True
2 C True False
我想创建一个函数,例如:
example_function("A") = "A is in Y"
example_function("B") = "B is in X and Y"
example_function("C") = "C is in X"
这是我目前的代码(不正确,看起来效率不高):
def example_function(name):
for name in df['Name']:
if df['X'][name] == True and df['Y'][name] == False:
print(str(name) + "is in X")
elif df['X'][name] == False and df['Y'][name] == True:
print(str(name) + "is in Y")
else:
print(str(name) + "is in X and Y")
我最终想添加更多布尔列,因此它需要可扩展。我怎样才能做到这一点?创建字典而不是数据框会更好吗?
谢谢!
如果你真的想要一个功能,你可以这样做:
def example_function(label):
s = df.set_index('Name').loc[label]
l = s[s].index.to_list()
return f'{label} is in {" and ".join(l)}'
example_function('A')
'A is in Y'
example_function('B')
'B is in X and Y'
您还可以将所有解计算为字典:
s = (df.set_index('Name').replace({False: pd.NA}).stack()
.reset_index(level=0)['Name']
)
out = s.index.groupby(s)
输出:
{'A': ['Y'], 'B': ['X', 'Y'], 'C': ['X']}
我认为你可以继续使用 DataFrame,使用这样的函数可以获得相同的输出:
def func (name, df):
# some checks to verify that the name is actually in the df
occurrences_name = np.sum(df['Name'] == name)
if occurrences_name == 0:
raise ValueError('Name not found')
elif occurrences_name > 1:
raise ValueError('More than one name found')
# get the index corresponding to the name you're looking for
# and select the corresponding row
index = df[df['Name'] == name].index[0]
row = df.drop(['Name'], axis=1).iloc[index]
outstring = '{} is in '.format(name)
for i in range(len(row)):
if row[i] == True:
if i != 0: outstring += ', '
outstring += '{}'.format(row.index[i])
return outstring
当然你可以根据你的 df 的特定形状调整它,我假设包含名称的列实际上是 'Name'。
我有一个如下所示的数据框:
Name X Y
0 A False True
1 B True True
2 C True False
我想创建一个函数,例如:
example_function("A") = "A is in Y"
example_function("B") = "B is in X and Y"
example_function("C") = "C is in X"
这是我目前的代码(不正确,看起来效率不高):
def example_function(name):
for name in df['Name']:
if df['X'][name] == True and df['Y'][name] == False:
print(str(name) + "is in X")
elif df['X'][name] == False and df['Y'][name] == True:
print(str(name) + "is in Y")
else:
print(str(name) + "is in X and Y")
我最终想添加更多布尔列,因此它需要可扩展。我怎样才能做到这一点?创建字典而不是数据框会更好吗?
谢谢!
如果你真的想要一个功能,你可以这样做:
def example_function(label):
s = df.set_index('Name').loc[label]
l = s[s].index.to_list()
return f'{label} is in {" and ".join(l)}'
example_function('A')
'A is in Y'
example_function('B')
'B is in X and Y'
您还可以将所有解计算为字典:
s = (df.set_index('Name').replace({False: pd.NA}).stack()
.reset_index(level=0)['Name']
)
out = s.index.groupby(s)
输出:
{'A': ['Y'], 'B': ['X', 'Y'], 'C': ['X']}
我认为你可以继续使用 DataFrame,使用这样的函数可以获得相同的输出:
def func (name, df):
# some checks to verify that the name is actually in the df
occurrences_name = np.sum(df['Name'] == name)
if occurrences_name == 0:
raise ValueError('Name not found')
elif occurrences_name > 1:
raise ValueError('More than one name found')
# get the index corresponding to the name you're looking for
# and select the corresponding row
index = df[df['Name'] == name].index[0]
row = df.drop(['Name'], axis=1).iloc[index]
outstring = '{} is in '.format(name)
for i in range(len(row)):
if row[i] == True:
if i != 0: outstring += ', '
outstring += '{}'.format(row.index[i])
return outstring
当然你可以根据你的 df 的特定形状调整它,我假设包含名称的列实际上是 'Name'。