如何将 Pandas Dataframe 中的字符串转换为列表或字符数组?
How to convert strings in a Pandas Dataframe to a list or an array of characters?
我有一个名为 data 的数据框,其中一列包含字符串。我想从字符串中提取字符,因为我的目标是对它们进行单热编码并使其可用于分类。包含字符串的列存储在 predictors 中,如下所示:
predictors = pd.DataFrame(data, columns = ['Sequence']).to_numpy()
打印结果为:
[['DKWL']
['FCHN']
['KDQP']
...
['SGHC']
['KIGT']
['PGPT']]
,而我的目标是获得类似:
[['D', 'K', 'W', 'L']
...
['P', 'G', 'P, 'T']]
根据我的理解,这是一种更适合单热编码的形式。
我已经尝试过此处提供的答案 How do I convert string characters into a list? or here How to create a list with the characters of a string? 但没有成功。
具体来说,我也试过这个:
for row in predictors:
row = list(row)
但结果与预测变量的形式相同,即
[['DKWL']
['FCHN']
['KDQP']
...
['SGHC']
['KIGT']
['PGPT']]
您可以使用 list
通过列表理解将值转换为字母,然后在必要时转换为 array
:
predictors = np.array([list(x) for x in data])
或转换列 predictors['Sequence']
:
a = np.array([list(x) for x in predictors['Sequence']])
print(a)
[['D' 'K' 'W' 'L']
['F' 'C' 'H' 'N']
['K' 'D' 'Q' 'P']
['S' 'G' 'H' 'C']
['K' 'I' 'G' 'T']
['P' 'G' 'P' 'T']]
对于系列使用:
s = predictors['Sequence'].apply(list)
print(s)
0 [D, K, W, L]
1 [F, C, H, N]
2 [K, D, Q, P]
3 [S, G, H, C]
4 [K, I, G, T]
5 [P, G, P, T]
Name: Sequence, dtype: object
我有一个名为 data 的数据框,其中一列包含字符串。我想从字符串中提取字符,因为我的目标是对它们进行单热编码并使其可用于分类。包含字符串的列存储在 predictors 中,如下所示:
predictors = pd.DataFrame(data, columns = ['Sequence']).to_numpy()
打印结果为:
[['DKWL']
['FCHN']
['KDQP']
...
['SGHC']
['KIGT']
['PGPT']]
,而我的目标是获得类似:
[['D', 'K', 'W', 'L']
...
['P', 'G', 'P, 'T']]
根据我的理解,这是一种更适合单热编码的形式。
我已经尝试过此处提供的答案 How do I convert string characters into a list? or here How to create a list with the characters of a string? 但没有成功。
具体来说,我也试过这个:
for row in predictors:
row = list(row)
但结果与预测变量的形式相同,即
[['DKWL']
['FCHN']
['KDQP']
...
['SGHC']
['KIGT']
['PGPT']]
您可以使用 list
通过列表理解将值转换为字母,然后在必要时转换为 array
:
predictors = np.array([list(x) for x in data])
或转换列 predictors['Sequence']
:
a = np.array([list(x) for x in predictors['Sequence']])
print(a)
[['D' 'K' 'W' 'L']
['F' 'C' 'H' 'N']
['K' 'D' 'Q' 'P']
['S' 'G' 'H' 'C']
['K' 'I' 'G' 'T']
['P' 'G' 'P' 'T']]
对于系列使用:
s = predictors['Sequence'].apply(list)
print(s)
0 [D, K, W, L]
1 [F, C, H, N]
2 [K, D, Q, P]
3 [S, G, H, C]
4 [K, I, G, T]
5 [P, G, P, T]
Name: Sequence, dtype: object