Pandas: 将数组列转换为 numpy 矩阵
Pandas: Convert array column to numpy Matrix
我有以下格式的数据:
Col1 Col2 Col3
1, 1424549456, "3 4"
2, 1424549457, "2 3 4 5"
& 已成功将其读入 pandas。
如何将 Col3 转换为以下形式的 numpy 矩阵:
# each value needs to become a 1 in the index of the col
# i.e. in the above example 3 is the 4th value, thus
# it is [0 0 0 1] [0 indexing is included]
mtx = [0 0 0 1 1 0 # corresponds to first row
0 0 1 1 1 1]; # corresponds to second row
感谢您提供的任何帮助!
如果数据不多,您可以执行类似
的操作
res = []
def f(v):
r = np.zeros(6, np.int)
r[map(int, v.split())] = 1
res.append(r)
df.Col3.apply(f)
mat = np.array(res)
# if you really want it to be a matrix, you can do
mat = np.matrix(res)
查看this link了解更多信息
Since 0.13.1 there's str.get_dummies
:
In [11]: s = pd.Series(["3 4", "2 3 4 5"])
In [12]: s.str.get_dummies(sep=" ")
Out[12]:
2 3 4 5
0 0 1 1 0
1 1 1 1 1
您必须确保列是整数(而不是字符串)并重新索引:
In [13]: df = s.str.get_dummies(sep=" ")
In [14]: df.columns = df.columns.map(int)
In [15]: df.reindex(columns=np.arange(6), fill_value=0)
Out[15]:
0 1 2 3 4 5
0 0 0 0 1 1 0
1 0 0 1 1 1 1
要获取 numpy 值,请使用 .values
:
In [16]: df.reindex(columns=np.arange(6), fill_value=0).values
Out[16]:
array([[0, 0, 0, 1, 1, 0],
[0, 0, 1, 1, 1, 1]])
我有以下格式的数据:
Col1 Col2 Col3
1, 1424549456, "3 4"
2, 1424549457, "2 3 4 5"
& 已成功将其读入 pandas。
如何将 Col3 转换为以下形式的 numpy 矩阵:
# each value needs to become a 1 in the index of the col
# i.e. in the above example 3 is the 4th value, thus
# it is [0 0 0 1] [0 indexing is included]
mtx = [0 0 0 1 1 0 # corresponds to first row
0 0 1 1 1 1]; # corresponds to second row
感谢您提供的任何帮助!
如果数据不多,您可以执行类似
的操作res = []
def f(v):
r = np.zeros(6, np.int)
r[map(int, v.split())] = 1
res.append(r)
df.Col3.apply(f)
mat = np.array(res)
# if you really want it to be a matrix, you can do
mat = np.matrix(res)
查看this link了解更多信息
Since 0.13.1 there's str.get_dummies
:
In [11]: s = pd.Series(["3 4", "2 3 4 5"])
In [12]: s.str.get_dummies(sep=" ")
Out[12]:
2 3 4 5
0 0 1 1 0
1 1 1 1 1
您必须确保列是整数(而不是字符串)并重新索引:
In [13]: df = s.str.get_dummies(sep=" ")
In [14]: df.columns = df.columns.map(int)
In [15]: df.reindex(columns=np.arange(6), fill_value=0)
Out[15]:
0 1 2 3 4 5
0 0 0 0 1 1 0
1 0 0 1 1 1 1
要获取 numpy 值,请使用 .values
:
In [16]: df.reindex(columns=np.arange(6), fill_value=0).values
Out[16]:
array([[0, 0, 0, 1, 1, 0],
[0, 0, 1, 1, 1, 1]])