如何将数值分类数据转换为张量流中的稀疏张量?
How to convert numerical categorical data into Sparse tensors in tensorflow?
我的数据集格式如下图:
8,2,1,1,1,0,3,2,6,2,2,2,2
8,2,1,2,0,0,15,2,1,2,2,2,1
5,5,4,4,0,0,6,1,6,2,2,1,2
8,2,1,3,0,0,2,2,6,2,2,2,2
8,2,1,2,0,0,3,2,1,2,2,2,1
8,2,1,4,0,1,3,2,1,2,2,2,1
8,2,1,2,0,0,3,2,1,2,2,2,1
8,2,1,3,0,0,2,2,6,2,2,2,2
8,2,1,12,0,0,5,2,2,2,2,2,1
3,1,1,2,0,0,3,2,1,2,2,2,1
它由所有分类数据组成,其中每个特征都用数字编码。我尝试使用以下代码:
monthly_income = tf.contrib.layers.sparse_column_with_keys("monthly_income", keys=['1','2','3','4','5','6'])
#Other columns are also declared in the same way
m = tf.contrib.learn.LinearClassifier(feature_columns=[
caste, religion, differently_abled, nature_of_activity, school, dropout, qualification,
computer_literate, monthly_income, smoke,drink,tobacco,sex],
model_dir=model_dir)
但我收到以下错误:
TypeError: Signature mismatch. Keys must be dtype <dtype: 'string'>, got <dtype: 'int64'>.
我认为问题出在您显示的代码之外。我的猜测是 csv 文件中的特征被读取为整数,但你希望它们是字符串,通过传递 keys=['1', '2', ...]
.
尽管如此,在这种情况下,我建议您使用 sparse_column_with_integerized_feature:
monthly_income = tf.contrib.layers.sparse_column_with_integerized_feature("monthly_income", bucket_size=7)
我的数据集格式如下图:
8,2,1,1,1,0,3,2,6,2,2,2,2
8,2,1,2,0,0,15,2,1,2,2,2,1
5,5,4,4,0,0,6,1,6,2,2,1,2
8,2,1,3,0,0,2,2,6,2,2,2,2
8,2,1,2,0,0,3,2,1,2,2,2,1
8,2,1,4,0,1,3,2,1,2,2,2,1
8,2,1,2,0,0,3,2,1,2,2,2,1
8,2,1,3,0,0,2,2,6,2,2,2,2
8,2,1,12,0,0,5,2,2,2,2,2,1
3,1,1,2,0,0,3,2,1,2,2,2,1
它由所有分类数据组成,其中每个特征都用数字编码。我尝试使用以下代码:
monthly_income = tf.contrib.layers.sparse_column_with_keys("monthly_income", keys=['1','2','3','4','5','6'])
#Other columns are also declared in the same way
m = tf.contrib.learn.LinearClassifier(feature_columns=[
caste, religion, differently_abled, nature_of_activity, school, dropout, qualification,
computer_literate, monthly_income, smoke,drink,tobacco,sex],
model_dir=model_dir)
但我收到以下错误:
TypeError: Signature mismatch. Keys must be dtype <dtype: 'string'>, got <dtype: 'int64'>.
我认为问题出在您显示的代码之外。我的猜测是 csv 文件中的特征被读取为整数,但你希望它们是字符串,通过传递 keys=['1', '2', ...]
.
尽管如此,在这种情况下,我建议您使用 sparse_column_with_integerized_feature:
monthly_income = tf.contrib.layers.sparse_column_with_integerized_feature("monthly_income", bucket_size=7)