如果系列中单词的长度> 3,则过滤数据帧

filtering a dataframe if the length of the word inside the series > 3

社区!非常感谢我在学习过程中得到的所有支持 python 到目前为止!

我得到了以下数据框:

d = {'name': ['john', 'mary', 'james'], 'area':[['IT', 'Resources', 'Admin'], ['Software', 'ITS', 'Programming'], ['Teaching', 'Research', 'KS']]}
df = pd.DataFrame(data=d)

我的目标是:

换句话说,如果列列表中的单词长度 'area' > 3,则将其删除。

我正在尝试这样的事情,但我真的卡住了

处理这种情况的最佳方法是什么?

再次感谢!!

.map与列表推导相结合:

df['area'] = df['area'].map(lambda x: [e for e in x if len(e)>3])

0         [Resources, Admin]
1    [Software, Programming]
2       [Teaching, Research]

解释:

x = ["Software", "ABC", "Programming"]
# return e for every element in x but only if length of element is larger than 3
[e for e in x if len(e)>3]

您可以 expand all your lists, filter on str length and then put them back in lists by aggregating 使用 list:

df = df.explode("area")
df = df[df["area"].str.len() > 3].groupby("name", as_index=False).agg(list)
#     name                     area
# 0  james     [Teaching, Research]
# 1   john       [Resources, Admin]
# 2   mary  [Software, Programming]

在构建数据框之前。

一种简单有效的方法是创建一个新的 键列表:“area”,它将只包含长度大于 3 的字符串。例如:

d = {'name': ['john', 'mary', 'james'], 'area':['IT', 'Resources', 'Admin'], ['Software', 'ITS', 'Programming'], ['Teaching', 'Research', 'KS']]}

# Retrieving the areas from d.
area_list = d['area']

# Copying all values, whose length is larger than 3, in a new list.
filtered_area_list = [a in area_list if len(3) > 3]

# Replacing the old list in the dictionary with the new one.
d['area'] = filtered_area_list

# Creating the dataframe.
df = pd.DataFrame(data=d)

构建数据框后。

如果您的数据在数据框中,那么您可以使用 "map" 函数:

df['area'] = df['area'].map(lambda a: [e for e in a if len(e) > 3])