如果系列中单词的长度> 3,则过滤数据帧
filtering a dataframe if the length of the word inside the series > 3
社区!非常感谢我在学习过程中得到的所有支持 python 到目前为止!
我得到了以下数据框:
d = {'name': ['john', 'mary', 'james'], 'area':[['IT', 'Resources', 'Admin'], ['Software', 'ITS', 'Programming'], ['Teaching', 'Research', 'KS']]}
df = pd.DataFrame(data=d)
我的目标是:
换句话说,如果列列表中的单词长度 'area' > 3,则将其删除。
我正在尝试这样的事情,但我真的卡住了
处理这种情况的最佳方法是什么?
再次感谢!!
将.map
与列表推导相结合:
df['area'] = df['area'].map(lambda x: [e for e in x if len(e)>3])
0 [Resources, Admin]
1 [Software, Programming]
2 [Teaching, Research]
解释:
x = ["Software", "ABC", "Programming"]
# return e for every element in x but only if length of element is larger than 3
[e for e in x if len(e)>3]
您可以 expand all your lists, filter on str
length and then put them back in lists by aggregating 使用 list
:
df = df.explode("area")
df = df[df["area"].str.len() > 3].groupby("name", as_index=False).agg(list)
# name area
# 0 james [Teaching, Research]
# 1 john [Resources, Admin]
# 2 mary [Software, Programming]
在构建数据框之前。
一种简单有效的方法是创建一个新的 键列表:“area”,它将只包含长度大于 3 的字符串。例如:
d = {'name': ['john', 'mary', 'james'], 'area':['IT', 'Resources', 'Admin'], ['Software', 'ITS', 'Programming'], ['Teaching', 'Research', 'KS']]}
# Retrieving the areas from d.
area_list = d['area']
# Copying all values, whose length is larger than 3, in a new list.
filtered_area_list = [a in area_list if len(3) > 3]
# Replacing the old list in the dictionary with the new one.
d['area'] = filtered_area_list
# Creating the dataframe.
df = pd.DataFrame(data=d)
构建数据框后。
如果您的数据在数据框中,那么您可以使用 "map" 函数:
df['area'] = df['area'].map(lambda a: [e for e in a if len(e) > 3])
社区!非常感谢我在学习过程中得到的所有支持 python 到目前为止!
我得到了以下数据框:
d = {'name': ['john', 'mary', 'james'], 'area':[['IT', 'Resources', 'Admin'], ['Software', 'ITS', 'Programming'], ['Teaching', 'Research', 'KS']]}
df = pd.DataFrame(data=d)
我的目标是:
换句话说,如果列列表中的单词长度 'area' > 3,则将其删除。
我正在尝试这样的事情,但我真的卡住了
处理这种情况的最佳方法是什么?
再次感谢!!
将.map
与列表推导相结合:
df['area'] = df['area'].map(lambda x: [e for e in x if len(e)>3])
0 [Resources, Admin]
1 [Software, Programming]
2 [Teaching, Research]
解释:
x = ["Software", "ABC", "Programming"]
# return e for every element in x but only if length of element is larger than 3
[e for e in x if len(e)>3]
您可以 expand all your lists, filter on str
length and then put them back in lists by aggregating 使用 list
:
df = df.explode("area")
df = df[df["area"].str.len() > 3].groupby("name", as_index=False).agg(list)
# name area
# 0 james [Teaching, Research]
# 1 john [Resources, Admin]
# 2 mary [Software, Programming]
在构建数据框之前。
一种简单有效的方法是创建一个新的 键列表:“area”,它将只包含长度大于 3 的字符串。例如:
d = {'name': ['john', 'mary', 'james'], 'area':['IT', 'Resources', 'Admin'], ['Software', 'ITS', 'Programming'], ['Teaching', 'Research', 'KS']]}
# Retrieving the areas from d.
area_list = d['area']
# Copying all values, whose length is larger than 3, in a new list.
filtered_area_list = [a in area_list if len(3) > 3]
# Replacing the old list in the dictionary with the new one.
d['area'] = filtered_area_list
# Creating the dataframe.
df = pd.DataFrame(data=d)
构建数据框后。
如果您的数据在数据框中,那么您可以使用 "map" 函数:
df['area'] = df['area'].map(lambda a: [e for e in a if len(e) > 3])