python pandas\numpy 按整数编码唯一

python pandas\numpy encode unique by integers

假设我有 x=["apple","orange","orange","apple","pear"] 我想要一个带有整数的分类表示,例如y=[1,2,2,1,3]。最好的方法是什么?

与Pandas:x.astype('category').cat.codes

您可以使用:

import pandas as pd

x=["apple","orange","orange","apple","pear"]
s = pd.Series(x)

print s

0     apple
1    orange
2    orange
3     apple
4      pear

print pd.Categorical(s).codes

[0 1 1 0 2]

或者:

import pandas as pd

x=["apple","orange","orange","apple","pear"]

print pd.Categorical(x).codes

#[0 1 1 0 2]

您可以使用 pd.factorize 并为此使用字段 0:

In [465]: pd.factorize(x)
Out[465]: (array([0, 1, 1, 0, 2]), array(['apple', 'orange', 'pear'], dtype=object))

In [466]: pd.factorize(x)[0] + 1
Out[466]: array([1, 2, 2, 1, 3])