TypeError unhashable type: 'set' when you extract 和 select nan in dataset
TypeError unhashable type: 'set' when you extract and select nan in dataset
我正在尝试 select 熊猫数据框中的 nan 值,方法是在列或行中选择它们并将它们提取并保存在 csv 文件中,但我遇到了 TypeError
无法散列的类型:'set' 我想知道如何修复它以获得结果。
正如在以下脚本中看到的那样,我在将 inf 值转换为 nan 进行计数后使用 isnull()
函数对它们进行了 select 编辑,但在 enc 中我无法存储 nan 值在我的目标列中,由于 TypeError unhashable type: 'set'
,在 csv 文件中是 'C'
。以下是我的脚本:
import numpy as np
import pandas as pd
#extract the parameters and put them in lists based on id_set
df = pd.read_csv('D:\m22.TXT', header=None)
id_set = df[df.index % 4 == 0].astype('int').values
a = df[df.index % 4 == 1].values
b = df[df.index % 4 == 2].values
c = df[df.index % 4 == 3].values
data = {'A': a[:,0], 'B': b[:,0], 'C': c[:,0] }
main_data = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])
#Mark nan and inf by isnu() function
nan = np.array(main_data.isnull())
inf = np.array(main_data.isnull())
#Make sure to change inf values into nan
main_data = main_data.replace([np.inf, -np.inf], np.nan)
c = main_data.isnull().sum()
print(c)
percent_missing = main_data.isnull().sum() * 100 / len(main_data)
print(percent_missing)
#calculate nan values in percentage in desired column
m = len(main_data) - main_data['A'].count()
print(m)
#Monitor the data
print(main_data)
print (main_data.isnull())
print (main_data.isnull().any(axis=1))
#Select columns has nan(s)
print(main_data[main_data['C'].isnull()])
#Select rows has nan(s) based on id_set
nan_data = main_data[main_data.isnull().any(axis = {'C'})]
print (nan_data)
#write selected part in csv file by id_set
nan_data.to_csv('nan_data.csv', header=None, index=None)
我的数据框如下所示:
A B C
0 -56.343656 nan -418.540483
10 -87.577880 -16.061497 inf
20 nan -15.337254 inf
30 -83.724143 -18.061570 -531.053979
40 -67.462841 nan -431.924830
50 -63.377158 -28.260790 inf
60 nan -22.996095 nan
70 -38.386860 -35.921773 -534.576631
'C'
的期望输出如下:
'C'
10 inf/nan
20 inf/nan
50 inf/nan
60 nan
下面是我的数据集示例:dataset sample DL link
注意:id_set
values 写的不完整 eg. 000
显示为 0
希望有人有好的提示来修复它。
不确定这是否正是您的目标,但如果您想输出所有行,其中至少有一个条目是 nan 或 inf,您可以试试这个:
import pandas as pd
import numpy as np
df = pd.DataFrame(50*np.random.randn(8, 3), columns=['A', 'B', 'C'], index=np.arange(0, 80, 10).astype(int))
df.loc[0, 'A'] = np.nan
df.loc[10, 'C'] = np.inf
df.loc[20, 'B'] = np.nan
df.loc[20, 'C'] = np.inf
df.loc[50, 'C'] = np.inf
df.loc[60, 'C'] = np.nan
df[np.isinf(df)] = np.nan # convert inf to nan
df_nan = df[df.isnull().any(axis=1)] # extract sub data frame
df_nan.to_csv('nan_data.csv', header=None, index=None) # export
输入数据框(将 inf 转换为 nan 后)如下所示:
输出如下:
要在 csv 文件中输出索引标签和 'NaN',您可以使用:
df_nan.to_csv('nan_data.csv', na_rep='NaN')
这将输出:
如果您只想要第 'C' 列,您可以使用:
df_nan['C'].to_csv('nan_dataC.csv', na_rep='NaN')
如果您想要前导零,可以执行以下操作:
new_index = [str(x).zfill(3) for x in df_nan.index]
df_nan.index = new_index
我正在尝试 select 熊猫数据框中的 nan 值,方法是在列或行中选择它们并将它们提取并保存在 csv 文件中,但我遇到了 TypeError 无法散列的类型:'set' 我想知道如何修复它以获得结果。
正如在以下脚本中看到的那样,我在将 inf 值转换为 nan 进行计数后使用 isnull()
函数对它们进行了 select 编辑,但在 enc 中我无法存储 nan 值在我的目标列中,由于 TypeError unhashable type: 'set'
,在 csv 文件中是 'C'
。以下是我的脚本:
import numpy as np
import pandas as pd
#extract the parameters and put them in lists based on id_set
df = pd.read_csv('D:\m22.TXT', header=None)
id_set = df[df.index % 4 == 0].astype('int').values
a = df[df.index % 4 == 1].values
b = df[df.index % 4 == 2].values
c = df[df.index % 4 == 3].values
data = {'A': a[:,0], 'B': b[:,0], 'C': c[:,0] }
main_data = pd.DataFrame(data, columns=['A','B','C'], index = id_set[:,0])
#Mark nan and inf by isnu() function
nan = np.array(main_data.isnull())
inf = np.array(main_data.isnull())
#Make sure to change inf values into nan
main_data = main_data.replace([np.inf, -np.inf], np.nan)
c = main_data.isnull().sum()
print(c)
percent_missing = main_data.isnull().sum() * 100 / len(main_data)
print(percent_missing)
#calculate nan values in percentage in desired column
m = len(main_data) - main_data['A'].count()
print(m)
#Monitor the data
print(main_data)
print (main_data.isnull())
print (main_data.isnull().any(axis=1))
#Select columns has nan(s)
print(main_data[main_data['C'].isnull()])
#Select rows has nan(s) based on id_set
nan_data = main_data[main_data.isnull().any(axis = {'C'})]
print (nan_data)
#write selected part in csv file by id_set
nan_data.to_csv('nan_data.csv', header=None, index=None)
我的数据框如下所示:
A B C
0 -56.343656 nan -418.540483
10 -87.577880 -16.061497 inf
20 nan -15.337254 inf
30 -83.724143 -18.061570 -531.053979
40 -67.462841 nan -431.924830
50 -63.377158 -28.260790 inf
60 nan -22.996095 nan
70 -38.386860 -35.921773 -534.576631
'C'
的期望输出如下:
'C'
10 inf/nan
20 inf/nan
50 inf/nan
60 nan
下面是我的数据集示例:dataset sample DL link
注意:id_set
values 写的不完整 eg. 000
显示为 0
希望有人有好的提示来修复它。
不确定这是否正是您的目标,但如果您想输出所有行,其中至少有一个条目是 nan 或 inf,您可以试试这个:
import pandas as pd
import numpy as np
df = pd.DataFrame(50*np.random.randn(8, 3), columns=['A', 'B', 'C'], index=np.arange(0, 80, 10).astype(int))
df.loc[0, 'A'] = np.nan
df.loc[10, 'C'] = np.inf
df.loc[20, 'B'] = np.nan
df.loc[20, 'C'] = np.inf
df.loc[50, 'C'] = np.inf
df.loc[60, 'C'] = np.nan
df[np.isinf(df)] = np.nan # convert inf to nan
df_nan = df[df.isnull().any(axis=1)] # extract sub data frame
df_nan.to_csv('nan_data.csv', header=None, index=None) # export
输入数据框(将 inf 转换为 nan 后)如下所示:
输出如下:
要在 csv 文件中输出索引标签和 'NaN',您可以使用:
df_nan.to_csv('nan_data.csv', na_rep='NaN')
这将输出:
如果您只想要第 'C' 列,您可以使用:
df_nan['C'].to_csv('nan_dataC.csv', na_rep='NaN')
如果您想要前导零,可以执行以下操作:
new_index = [str(x).zfill(3) for x in df_nan.index]
df_nan.index = new_index