用Python中的数字替换字符串进行数据分析

Question

如何为每个唯一值动态分配数字？我已经搜索过了，但我只能看到 1 个答案：

# creating a dict file 
gender = {'male': 1,'female': 2}
  
# traversing through dataframe
# Gender column and writing
# values where key matches
data.Gender = [gender[item] for item in data.Gender]
print(data)

但这些答案使用固定数字。如果我无法手动分配 gender 中的每个值怎么办？我该怎么做

Answer 1

嗯，我自己回答！

dictChangeValues = {}

for d in data.columns:
    i = 0
    aux = data[d].unique()
    dictChangeValues[d] = {}
    for n in aux:
        dictChangeValues[d][n] = i
        i += 1

Answer 2

既然你在做数据分析，那么我想使用 scikit-learn 对你来说不是开销。

from sklearn.preprocessing import LabelEncoder

>>> LabelEncoder().fit_transform(["a", "b", "c", "a", "b"])
array([0, 1, 2, 0, 1])

用Python中的数字替换字符串进行数据分析

Replacing strings with numbers in Python for Data Analysis

python

data-analysis

python-3.x

data-science