pandas 无法将干净的 unicode 文本插入回 DataFrame
Unable to insert clean unicode text back into DataFrame in pandas
我正在做两件事。
1) 在 pandas 中过滤数据框
2) 清除过滤数据框中特定列中的 unicode 文本。
import pandas as pd
import probablepeople
from unidecode import unidecode
import re
#read data
df1 = pd.read_csv("H:\data.csv")
#filter
df1=df1[(df1.gender=="female")]
#reset index because otherwise indexes will be as per original dataframe
df1=df1.reset_index()
现在我正在尝试清理地址栏中的 unicode 文本
#clean unicode text
for i in range(10):
df1.loc[i][16] = re.sub(r"[^a-zA-Z.,' ]",r' ',df1.address[i])
但是,我无法这样做,下面是我遇到的错误。
c:\python27\lib\site-packages\ipykernel\__main__.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
我想你可以使用 str.replace
:
df1=df1[df1.gender=="female"]
#reset index with parameter drop if need new monotonic index (0,1,2,...)
df1=df1.reset_index(drop=True)
df1.address = df1.address.str.replace(r"[^a-zA-Z.,' ]",r' ')
我正在做两件事。 1) 在 pandas 中过滤数据框 2) 清除过滤数据框中特定列中的 unicode 文本。
import pandas as pd
import probablepeople
from unidecode import unidecode
import re
#read data
df1 = pd.read_csv("H:\data.csv")
#filter
df1=df1[(df1.gender=="female")]
#reset index because otherwise indexes will be as per original dataframe
df1=df1.reset_index()
现在我正在尝试清理地址栏中的 unicode 文本
#clean unicode text
for i in range(10):
df1.loc[i][16] = re.sub(r"[^a-zA-Z.,' ]",r' ',df1.address[i])
但是,我无法这样做,下面是我遇到的错误。
c:\python27\lib\site-packages\ipykernel\__main__.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
我想你可以使用 str.replace
:
df1=df1[df1.gender=="female"]
#reset index with parameter drop if need new monotonic index (0,1,2,...)
df1=df1.reset_index(drop=True)
df1.address = df1.address.str.replace(r"[^a-zA-Z.,' ]",r' ')