如何删除仅包含某些值的行?
How to delete rows that contain only certain values?
我有一个这样的数据框
column_name
0 OnePlus phones never fail to meet my expectatiion.
1 received earlier than expected for local set.
2 \n
3 good
4 must buy!
5 \t
6
7 awesome product!
8 \n
我想删除所有仅包含 \n
、\t
、
、\n
的行。
输出应该是这样的:
column_name
0 OnePlus phones never fail to meet my expectatiion.
1 received earlier than expected for local set.
2 good
3 must buy!
4 awesome product!
我尝试了以下方法:
df = df[df.column_name != '\n'].reset_index(drop=True)
df = df[df.column_name != ''].reset_index(drop=True)
df = df[df.column_name != ' '].reset_index(drop=True)
df = df[df.column_name != ' '].reset_index(drop=True)
df = df[df.column_name != ' \n '].reset_index(drop=True)
但是是否有更优雅的方式或 pythonic 方式来执行此操作而不是重复代码?
您可以使用 Series.str.strip
并仅比较空字符串:
df1 = df[df.column_name.str.strip() != ''].reset_index(drop=True)
或将空值转换为布尔值:
df1 = df[df.column_name.str.strip().astype(bool)].reset_index(drop=True)
或过滤词,对我来说是必要的strip
(也许在真实数据中strip
应该被删除):
df1 = df[df.column_name.str.strip().str.contains('\W', na=False)].reset_index(drop=True)
如果需要删除缺失值且没有字符串值,请将这些值替换为 NaN
s,然后使用 DataFrame.dropna
:
df.column_name = df.column_name.replace(r'^\s*$', np.nan, regex=True)
df1 = df.dropna(subset=['column_name']).reset_index(drop=True)
另一种方法,删除条目与标记元素匹配的行:
df = df[~df['column_name'].isin(['\n','\t'])].dropna()
如果最后一行(或其他行)有多余的空格,你可以先做:
df['column_name'] = df['column_name'].str.strip()
使用df.str.contains()检查正斜杠后是否有小字母
df[df.Column Name.str.contains('[\][a-z]+',case=True, na=False, regex=True)]
在您的情况下,数据:
print(pd.DataFrame({'A':['OnePlus phones never fail to meet my expectatiion','received earlier than expected for local set.','\n','good','\t', np.nan,'must buy!','','awesome product!','\n' ]}))
A
0 OnePlus phones never fail to meet my expectatiion
1 received earlier than expected for local set.
2 \n
3 good
4 \t
5 NaN
6 must buy!
7
8 awesome product!
9 \n
解决方案
print(df[df.A.str.contains('[\][a-z]+',case=True, na=False, regex=True)])
A
0 OnePlus phones never fail to meet my expectatiion
1 received earlier than expected for local set.
3 good
6 must buy!
8 awesome product!
我有一个这样的数据框
column_name
0 OnePlus phones never fail to meet my expectatiion.
1 received earlier than expected for local set.
2 \n
3 good
4 must buy!
5 \t
6
7 awesome product!
8 \n
我想删除所有仅包含 \n
、\t
、
、\n
的行。
输出应该是这样的:
column_name
0 OnePlus phones never fail to meet my expectatiion.
1 received earlier than expected for local set.
2 good
3 must buy!
4 awesome product!
我尝试了以下方法:
df = df[df.column_name != '\n'].reset_index(drop=True)
df = df[df.column_name != ''].reset_index(drop=True)
df = df[df.column_name != ' '].reset_index(drop=True)
df = df[df.column_name != ' '].reset_index(drop=True)
df = df[df.column_name != ' \n '].reset_index(drop=True)
但是是否有更优雅的方式或 pythonic 方式来执行此操作而不是重复代码?
您可以使用 Series.str.strip
并仅比较空字符串:
df1 = df[df.column_name.str.strip() != ''].reset_index(drop=True)
或将空值转换为布尔值:
df1 = df[df.column_name.str.strip().astype(bool)].reset_index(drop=True)
或过滤词,对我来说是必要的strip
(也许在真实数据中strip
应该被删除):
df1 = df[df.column_name.str.strip().str.contains('\W', na=False)].reset_index(drop=True)
如果需要删除缺失值且没有字符串值,请将这些值替换为 NaN
s,然后使用 DataFrame.dropna
:
df.column_name = df.column_name.replace(r'^\s*$', np.nan, regex=True)
df1 = df.dropna(subset=['column_name']).reset_index(drop=True)
另一种方法,删除条目与标记元素匹配的行:
df = df[~df['column_name'].isin(['\n','\t'])].dropna()
如果最后一行(或其他行)有多余的空格,你可以先做:
df['column_name'] = df['column_name'].str.strip()
使用df.str.contains()检查正斜杠后是否有小字母
df[df.Column Name.str.contains('[\][a-z]+',case=True, na=False, regex=True)]
在您的情况下,数据:
print(pd.DataFrame({'A':['OnePlus phones never fail to meet my expectatiion','received earlier than expected for local set.','\n','good','\t', np.nan,'must buy!','','awesome product!','\n' ]}))
A
0 OnePlus phones never fail to meet my expectatiion
1 received earlier than expected for local set.
2 \n
3 good
4 \t
5 NaN
6 must buy!
7
8 awesome product!
9 \n
解决方案
print(df[df.A.str.contains('[\][a-z]+',case=True, na=False, regex=True)])
A
0 OnePlus phones never fail to meet my expectatiion
1 received earlier than expected for local set.
3 good
6 must buy!
8 awesome product!