将 json 转换为 Python 中的 coo 稀疏矩阵
Transforming json into coo sparse matrix in Python
我正在尝试转换 JSON 形状的文件:
{"1":
{"2": 0, "3": 0, "4": 0, "5": 1, "6": 0, "7": 1, "8": 0, "9": 0, "10": 0, "11": 1, "12": 1, "13": 0, "14": 1, "15": 1, "16": 0, "17": 0, "18": 0, "19": 0, "20": 0, "21": 0, "22": 0, "23": 0, "24": 0, "25": 0, "26": 0, "27": 0, "28": 0, "29": 0, "30": 0, "31": 1, "32": 0, "33": 0, "34": 1, "35": 0, "36": 0, "37": 0, "38": 0, "39": 0, "40": 0, "41": 0, "42": 0, "43": 0, "44": 0, "45": 0},
"2":
{"2": 0, "3": 0, "4": 0, "5": 0, "6": 1, "7": 0, "8": 1, "9": 1, "10": 1, "11": 0, "12": 0, "13": 1, "14": 1, "15": 1, "16": 0, "17": 0, "18": 1, "19": 0, "20": 1, "21": 1, "22": 0, "23": 0, "24": 0, "25": 1, "26": 0, "27": 0, "28": 0, "29": 1, "30": 0, "31": 1, "32": 1, "33": 0, "34": 0, "35": 0, "36": 0, "37": 1, "38": 0, "39": 0, "40": 1, "41": 1, "42": 0, "43": 0, "44": 1, "45": 1},
"3":
{"2": 1, "3": 0, "4": 0, "5": 0, "6": 0, "7": 1, "8": 0, "9": 0, "10": 0, "11": 0, "12": 0, "13": 0, "14": 0, "15": 0, "16": 0, "17": 0, "18": 0, "19": 0, "20": 0, "21": 0, "22": 0, "23": 0, "24": 0, "25": 0, "26": 0, "27": 0, "28": 0, "29": 0, "30": 0, "31": 1, "32": 0, "33": 0, "34": 0, "35": 0, "36": 0, "37": 0, "38": 0, "39": 0, "40": 0, "41": 0, "42": 0, "43": 0, "44": 0, "45": 0},
"4":
{"2": 1, "3": 1, "4": 1, "5": 1, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0, "11": 1, "12": 1, "13": 0, "14": 0, "15": 0, "16": 1, "17": 1, "18": 0, "19": 0, "20": 0, "21": 0, "22": 1, "23": 1, "24": 1, "25": 0, "26": 1, "27": 1, "28": 1, "29": 0, "30": 1, "31": 0, "32": 0, "33": 0, "34": 1, "35": 0, "36": 1, "37": 0, "38": 1, "39": 0, "40": 0, "41": 0, "42": 1, "43": 1, "44": 0, "45": 0}}
进入一个 coo 稀疏矩阵,其中有一个坐标显示第一个键,然后是第二个键,然后是值,如下所示:
(1,2) 0
(1,3) 0
(1,4) 0
(1,5) 1
...
(4,44) 0
(4,45) 0
我尝试将 JSON 文件转换成 pandas 数据框,如下所示:
in 1 2 3 4
2 0 0 1 1
3 0 0 0 1
4 0 0 0 1
5 1 0 0 1
6 0 1 0 0
7 1 0 1 0
8 0 1 0 0
9 0 1 0 0
10 0 1 0 0
11 1 0 0 1
12 1 0 0 1
13 0 1 0 0
14 1 1 0 0
15 1 1 0 0
16 0 0 0 1
17 0 0 0 1
18 0 1 0 0
19 0 0 0 0
20 0 1 0 0
21 0 1 0 0
22 0 0 0 1
23 0 0 0 1
24 0 0 0 1
25 0 1 0 0
26 0 0 0 1
27 0 0 0 1
28 0 0 0 1
29 0 1 0 0
30 0 0 0 1
31 1 1 1 0
32 0 1 0 0
33 0 0 0 0
34 1 0 0 1
35 0 0 0 0
36 0 0 0 1
37 0 1 0 0
38 0 0 0 1
39 0 0 0 0
40 0 1 0 0
41 0 1 0 0
42 0 0 0 1
43 0 0 0 1
44 0 1 0 0
45 0 1 0 0
但我无法将其转换为稀疏矩阵,这会在放大时消除任何功能。
我们可以通过字典理解将字典转换为“multiindex dataframe”。例如:
pd.concat({k: pd.DataFrame.from_dict(v, orient='index') for k,v in data.items()})
对于给定的示例数据,这将生成一个包含 176 行和 1 列的数据框:
>>> pd.concat({k: pd.DataFrame.from_dict(v, orient='index') for k,v in data.items()})
0
1 2 0
3 0
4 0
5 1
6 0
7 1
8 0
9 0
10 0
11 1
12 1
13 0
14 1
15 1
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 0
26 0
27 0
28 0
29 0
30 0
31 1
... ..
4 16 1
17 1
18 0
19 0
20 0
21 0
22 1
23 1
24 1
25 0
26 1
27 1
28 1
29 0
30 1
31 0
32 0
33 0
34 1
35 0
36 1
37 0
38 1
39 0
40 0
41 0
42 1
43 1
44 0
45 0
[176 rows x 1 columns]
当我将你的 json
复制粘贴到 Ipython 会话时,我得到了一个有 4 个键的字典。
我可以将它解压缩到一个列表中:
In [466]: alist = []
...: for k,v in adict.items():
...: for k1,v1 in v.items():
...: alist.append((int(k),int(k1),v1))
...:
并制作一个数组:
In [467]: arr = np.array(alist)
In [468]: arr.shape
Out[468]: (176, 3)
并使用数组的 3 列作为输入 sparse.coo_matrix
:
In [469]: M = sparse.coo_matrix((arr[:,2],(arr[:,0],arr[:,1])))
In [470]: M
Out[470]:
<5x46 sparse matrix of type '<class 'numpy.int64'>'
with 176 stored elements in COOrdinate format>
In [471]: M.A
Out[471]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1,
0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0,
1, 1],
[0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0,
1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1,
0, 0]])
变体:
In [472]: rows, cols, data = [],[],[]
...: for k,v in adict.items():
...: for k1,v1 in v.items():
...: rows.append(int(k))
...: cols.append(int(k1))
...: data.append(v1)
...:
In [473]: len(rows)
Out[473]: 176
In [474]: M = sparse.coo_matrix((data,(rows,cols)))
我正在尝试转换 JSON 形状的文件:
{"1":
{"2": 0, "3": 0, "4": 0, "5": 1, "6": 0, "7": 1, "8": 0, "9": 0, "10": 0, "11": 1, "12": 1, "13": 0, "14": 1, "15": 1, "16": 0, "17": 0, "18": 0, "19": 0, "20": 0, "21": 0, "22": 0, "23": 0, "24": 0, "25": 0, "26": 0, "27": 0, "28": 0, "29": 0, "30": 0, "31": 1, "32": 0, "33": 0, "34": 1, "35": 0, "36": 0, "37": 0, "38": 0, "39": 0, "40": 0, "41": 0, "42": 0, "43": 0, "44": 0, "45": 0},
"2":
{"2": 0, "3": 0, "4": 0, "5": 0, "6": 1, "7": 0, "8": 1, "9": 1, "10": 1, "11": 0, "12": 0, "13": 1, "14": 1, "15": 1, "16": 0, "17": 0, "18": 1, "19": 0, "20": 1, "21": 1, "22": 0, "23": 0, "24": 0, "25": 1, "26": 0, "27": 0, "28": 0, "29": 1, "30": 0, "31": 1, "32": 1, "33": 0, "34": 0, "35": 0, "36": 0, "37": 1, "38": 0, "39": 0, "40": 1, "41": 1, "42": 0, "43": 0, "44": 1, "45": 1},
"3":
{"2": 1, "3": 0, "4": 0, "5": 0, "6": 0, "7": 1, "8": 0, "9": 0, "10": 0, "11": 0, "12": 0, "13": 0, "14": 0, "15": 0, "16": 0, "17": 0, "18": 0, "19": 0, "20": 0, "21": 0, "22": 0, "23": 0, "24": 0, "25": 0, "26": 0, "27": 0, "28": 0, "29": 0, "30": 0, "31": 1, "32": 0, "33": 0, "34": 0, "35": 0, "36": 0, "37": 0, "38": 0, "39": 0, "40": 0, "41": 0, "42": 0, "43": 0, "44": 0, "45": 0},
"4":
{"2": 1, "3": 1, "4": 1, "5": 1, "6": 0, "7": 0, "8": 0, "9": 0, "10": 0, "11": 1, "12": 1, "13": 0, "14": 0, "15": 0, "16": 1, "17": 1, "18": 0, "19": 0, "20": 0, "21": 0, "22": 1, "23": 1, "24": 1, "25": 0, "26": 1, "27": 1, "28": 1, "29": 0, "30": 1, "31": 0, "32": 0, "33": 0, "34": 1, "35": 0, "36": 1, "37": 0, "38": 1, "39": 0, "40": 0, "41": 0, "42": 1, "43": 1, "44": 0, "45": 0}}
进入一个 coo 稀疏矩阵,其中有一个坐标显示第一个键,然后是第二个键,然后是值,如下所示:
(1,2) 0
(1,3) 0
(1,4) 0
(1,5) 1
...
(4,44) 0
(4,45) 0
我尝试将 JSON 文件转换成 pandas 数据框,如下所示:
in 1 2 3 4
2 0 0 1 1
3 0 0 0 1
4 0 0 0 1
5 1 0 0 1
6 0 1 0 0
7 1 0 1 0
8 0 1 0 0
9 0 1 0 0
10 0 1 0 0
11 1 0 0 1
12 1 0 0 1
13 0 1 0 0
14 1 1 0 0
15 1 1 0 0
16 0 0 0 1
17 0 0 0 1
18 0 1 0 0
19 0 0 0 0
20 0 1 0 0
21 0 1 0 0
22 0 0 0 1
23 0 0 0 1
24 0 0 0 1
25 0 1 0 0
26 0 0 0 1
27 0 0 0 1
28 0 0 0 1
29 0 1 0 0
30 0 0 0 1
31 1 1 1 0
32 0 1 0 0
33 0 0 0 0
34 1 0 0 1
35 0 0 0 0
36 0 0 0 1
37 0 1 0 0
38 0 0 0 1
39 0 0 0 0
40 0 1 0 0
41 0 1 0 0
42 0 0 0 1
43 0 0 0 1
44 0 1 0 0
45 0 1 0 0
但我无法将其转换为稀疏矩阵,这会在放大时消除任何功能。
我们可以通过字典理解将字典转换为“multiindex dataframe”。例如:
pd.concat({k: pd.DataFrame.from_dict(v, orient='index') for k,v in data.items()})
对于给定的示例数据,这将生成一个包含 176 行和 1 列的数据框:
>>> pd.concat({k: pd.DataFrame.from_dict(v, orient='index') for k,v in data.items()})
0
1 2 0
3 0
4 0
5 1
6 0
7 1
8 0
9 0
10 0
11 1
12 1
13 0
14 1
15 1
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 0
26 0
27 0
28 0
29 0
30 0
31 1
... ..
4 16 1
17 1
18 0
19 0
20 0
21 0
22 1
23 1
24 1
25 0
26 1
27 1
28 1
29 0
30 1
31 0
32 0
33 0
34 1
35 0
36 1
37 0
38 1
39 0
40 0
41 0
42 1
43 1
44 0
45 0
[176 rows x 1 columns]
当我将你的 json
复制粘贴到 Ipython 会话时,我得到了一个有 4 个键的字典。
我可以将它解压缩到一个列表中:
In [466]: alist = []
...: for k,v in adict.items():
...: for k1,v1 in v.items():
...: alist.append((int(k),int(k1),v1))
...:
并制作一个数组:
In [467]: arr = np.array(alist)
In [468]: arr.shape
Out[468]: (176, 3)
并使用数组的 3 列作为输入 sparse.coo_matrix
:
In [469]: M = sparse.coo_matrix((arr[:,2],(arr[:,0],arr[:,1])))
In [470]: M
Out[470]:
<5x46 sparse matrix of type '<class 'numpy.int64'>'
with 176 stored elements in COOrdinate format>
In [471]: M.A
Out[471]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1,
0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0,
1, 1],
[0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0],
[0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0,
1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1,
0, 0]])
变体:
In [472]: rows, cols, data = [],[],[]
...: for k,v in adict.items():
...: for k1,v1 in v.items():
...: rows.append(int(k))
...: cols.append(int(k1))
...: data.append(v1)
...:
In [473]: len(rows)
Out[473]: 176
In [474]: M = sparse.coo_matrix((data,(rows,cols)))