python pandas\numpy 按整数编码唯一
python pandas\numpy encode unique by integers
假设我有 x=["apple","orange","orange","apple","pear"]
我想要一个带有整数的分类表示,例如y=[1,2,2,1,3]
。最好的方法是什么?
与Pandas:x.astype('category').cat.codes
您可以使用:
import pandas as pd
x=["apple","orange","orange","apple","pear"]
s = pd.Series(x)
print s
0 apple
1 orange
2 orange
3 apple
4 pear
print pd.Categorical(s).codes
[0 1 1 0 2]
或者:
import pandas as pd
x=["apple","orange","orange","apple","pear"]
print pd.Categorical(x).codes
#[0 1 1 0 2]
您可以使用 pd.factorize
并为此使用字段 0:
In [465]: pd.factorize(x)
Out[465]: (array([0, 1, 1, 0, 2]), array(['apple', 'orange', 'pear'], dtype=object))
In [466]: pd.factorize(x)[0] + 1
Out[466]: array([1, 2, 2, 1, 3])
假设我有 x=["apple","orange","orange","apple","pear"]
我想要一个带有整数的分类表示,例如y=[1,2,2,1,3]
。最好的方法是什么?
与Pandas:x.astype('category').cat.codes
您可以使用:
import pandas as pd
x=["apple","orange","orange","apple","pear"]
s = pd.Series(x)
print s
0 apple
1 orange
2 orange
3 apple
4 pear
print pd.Categorical(s).codes
[0 1 1 0 2]
或者:
import pandas as pd
x=["apple","orange","orange","apple","pear"]
print pd.Categorical(x).codes
#[0 1 1 0 2]
您可以使用 pd.factorize
并为此使用字段 0:
In [465]: pd.factorize(x)
Out[465]: (array([0, 1, 1, 0, 2]), array(['apple', 'orange', 'pear'], dtype=object))
In [466]: pd.factorize(x)[0] + 1
Out[466]: array([1, 2, 2, 1, 3])