对于一个数组中的所有唯一值,计算两个数组中相同值的数量
Count of the number of identical values in two arrays for all the unique values in an array
我有两个数组 A 和 B。A 有多个值(这些值可以是字符串、整数或浮点数),B 有值 0 和 1。对于 A 中的每个唯一值,我需要点数与 B 中的 1 和 B 中的 0 一致。这两个计数都需要存储为单独的变量。
例如:
A = [1, 1, 3, 2, 2, 1, 1, 3, 3] # input multivalue array; it has three unique values – 1,2,3
B = [0, 0, 0, 1, 1, 1, 0, 1, 0] # input binary array
#Desired result:
countA1_B1 = 1 #for unique value of '1' in A the count of places where there is '1' in B
countA1_B0 = 3 #for unique value of '1' in A the count of places where there is '0' in B
countAno1_B1 = 3 #for unique value of '1' in A the count of places where there is no '1' in A but there is '1' in B
countAno1_B0 = 2 #for unique value of '1' in A the count of places where there is no '1' in A and there is '0' in B
我需要这个来获取 A 中的所有唯一值。A array/list 将是一个栅格,因此唯一值是未知的。所以代码将首先提取 A 中的唯一值,然后进行剩余的计算
我解决这个问题的方法(参见 post :)
Import numpy as np
A = [1, 1, 3, 2, 2, 1, 1, 3, 3] # input array
B = [0, 0, 0, 1, 1, 1, 0, 1, 0] # input binary array
A_arr = np.array(A)
A_unq = np.unique(A_arr)
#code 1
A_masked_arrays = np.array((A_arr[None, :] == A_unq[:, None]).astype(int))
#code 2
# A_masked_arrays = [(A==unique_val).astype(int) for unique_val in
np.unique(A)]
print(A_masked_arrays)
out = {val: arr for val, arr in zip(list(A_unq), list(A_arr))}
#zip() throws error
#TypeError: 'zip' object is not callable.
dict = {}
for i in A_unq:
for j in A_masked_arrays:
dict = i, j
print(dict)
获得的结果:
# from code 1
[[1 1 0 0 0 1 1 0 0]
[0 0 0 1 1 0 0 0 0]
[0 0 1 0 0 0 0 1 1]]
# from code 2
[array([1, 1, 0, 0, 0, 1, 1, 0, 0]), array([0, 0, 0, 1, 1, 0, 0, 0, 0]),
array([0, 0, 1, 0, 0, 0, 0, 1, 1])]
使用字典创建我得到了这个结果
(1, array([1, 1, 0, 0, 0, 1, 1, 0, 0]))
(1, array([0, 0, 0, 1, 1, 0, 0, 0, 0]))
(1, array([0, 0, 1, 0, 0, 0, 0, 1, 1]))
(2, array([1, 1, 0, 0, 0, 1, 1, 0, 0]))
(2, array([0, 0, 0, 1, 1, 0, 0, 0, 0]))
(2, array([0, 0, 1, 0, 0, 0, 0, 1, 1]))
(3, array([1, 1, 0, 0, 0, 1, 1, 0, 0]))
(3, array([0, 0, 0, 1, 1, 0, 0, 0, 0]))
(3, array([0, 0, 1, 0, 0, 0, 0, 1, 1]))
这就是我被困的地方。从这里如何得到 A 中每个唯一值的最终计数,如 countA1_B1、countA1_B0、countAno1_B1、countAno1_B0 等。需要帮助。提前致谢。
使用pandas做这种groupby操作要容易得多:
In [11]: import pandas as pd
In [12]: df = pd.DataFrame({"A": A, "B": B})
In [13]: df
Out[13]:
A B
0 1 0
1 1 0
2 3 0
3 2 1
4 2 1
5 1 1
6 1 0
7 3 1
8 3 0
现在你可以使用groupby了:
In [14]: gb = df.groupby("A")["B"]
In [15]: gb.count() # number of As
Out[15]:
A
1 4
2 2
3 3
Name: B, dtype: int64
In [16]: gb.sum() # number of As where B == 1
Out[16]:
A
1 1
2 2
3 1
Name: B, dtype: int64
In [17]: gb.count() - gb.sum() # number of As where B == 0
Out[17]:
A
1 3
2 0
3 2
Name: B, dtype: int64
您还可以更明确、更普遍地(例如,如果它不只是 0 和 1)使用应用:
In [18]: gb.apply(lambda x: (x == 1).sum())
Out[18]:
A
1 1
2 2
3 1
Name: B, dtype: int64
有选择地使用 np.bincount
应该可以解决问题
Au, Ai = np.unique(A, return_index = True)
out = np.empty((2, Au.size))
out[0] = np.bincount(Ai, weight = 1-np.array(B), size = Au.size)
out[1] = bp.bincount(Ai, weight = np.array(B), size = Au.size)
outdict = {}
for i in range(Au.size):
for j in [0, 1]:
outdict[(Au(i), j)] = out[j, i]
我有两个数组 A 和 B。A 有多个值(这些值可以是字符串、整数或浮点数),B 有值 0 和 1。对于 A 中的每个唯一值,我需要点数与 B 中的 1 和 B 中的 0 一致。这两个计数都需要存储为单独的变量。 例如:
A = [1, 1, 3, 2, 2, 1, 1, 3, 3] # input multivalue array; it has three unique values – 1,2,3
B = [0, 0, 0, 1, 1, 1, 0, 1, 0] # input binary array
#Desired result:
countA1_B1 = 1 #for unique value of '1' in A the count of places where there is '1' in B
countA1_B0 = 3 #for unique value of '1' in A the count of places where there is '0' in B
countAno1_B1 = 3 #for unique value of '1' in A the count of places where there is no '1' in A but there is '1' in B
countAno1_B0 = 2 #for unique value of '1' in A the count of places where there is no '1' in A and there is '0' in B
我需要这个来获取 A 中的所有唯一值。A array/list 将是一个栅格,因此唯一值是未知的。所以代码将首先提取 A 中的唯一值,然后进行剩余的计算
我解决这个问题的方法(参见 post
Import numpy as np
A = [1, 1, 3, 2, 2, 1, 1, 3, 3] # input array
B = [0, 0, 0, 1, 1, 1, 0, 1, 0] # input binary array
A_arr = np.array(A)
A_unq = np.unique(A_arr)
#code 1
A_masked_arrays = np.array((A_arr[None, :] == A_unq[:, None]).astype(int))
#code 2
# A_masked_arrays = [(A==unique_val).astype(int) for unique_val in
np.unique(A)]
print(A_masked_arrays)
out = {val: arr for val, arr in zip(list(A_unq), list(A_arr))}
#zip() throws error
#TypeError: 'zip' object is not callable.
dict = {}
for i in A_unq:
for j in A_masked_arrays:
dict = i, j
print(dict)
获得的结果:
# from code 1
[[1 1 0 0 0 1 1 0 0]
[0 0 0 1 1 0 0 0 0]
[0 0 1 0 0 0 0 1 1]]
# from code 2
[array([1, 1, 0, 0, 0, 1, 1, 0, 0]), array([0, 0, 0, 1, 1, 0, 0, 0, 0]),
array([0, 0, 1, 0, 0, 0, 0, 1, 1])]
使用字典创建我得到了这个结果
(1, array([1, 1, 0, 0, 0, 1, 1, 0, 0]))
(1, array([0, 0, 0, 1, 1, 0, 0, 0, 0]))
(1, array([0, 0, 1, 0, 0, 0, 0, 1, 1]))
(2, array([1, 1, 0, 0, 0, 1, 1, 0, 0]))
(2, array([0, 0, 0, 1, 1, 0, 0, 0, 0]))
(2, array([0, 0, 1, 0, 0, 0, 0, 1, 1]))
(3, array([1, 1, 0, 0, 0, 1, 1, 0, 0]))
(3, array([0, 0, 0, 1, 1, 0, 0, 0, 0]))
(3, array([0, 0, 1, 0, 0, 0, 0, 1, 1]))
这就是我被困的地方。从这里如何得到 A 中每个唯一值的最终计数,如 countA1_B1、countA1_B0、countAno1_B1、countAno1_B0 等。需要帮助。提前致谢。
使用pandas做这种groupby操作要容易得多:
In [11]: import pandas as pd
In [12]: df = pd.DataFrame({"A": A, "B": B})
In [13]: df
Out[13]:
A B
0 1 0
1 1 0
2 3 0
3 2 1
4 2 1
5 1 1
6 1 0
7 3 1
8 3 0
现在你可以使用groupby了:
In [14]: gb = df.groupby("A")["B"]
In [15]: gb.count() # number of As
Out[15]:
A
1 4
2 2
3 3
Name: B, dtype: int64
In [16]: gb.sum() # number of As where B == 1
Out[16]:
A
1 1
2 2
3 1
Name: B, dtype: int64
In [17]: gb.count() - gb.sum() # number of As where B == 0
Out[17]:
A
1 3
2 0
3 2
Name: B, dtype: int64
您还可以更明确、更普遍地(例如,如果它不只是 0 和 1)使用应用:
In [18]: gb.apply(lambda x: (x == 1).sum())
Out[18]:
A
1 1
2 2
3 1
Name: B, dtype: int64
有选择地使用 np.bincount
应该可以解决问题
Au, Ai = np.unique(A, return_index = True)
out = np.empty((2, Au.size))
out[0] = np.bincount(Ai, weight = 1-np.array(B), size = Au.size)
out[1] = bp.bincount(Ai, weight = np.array(B), size = Au.size)
outdict = {}
for i in range(Au.size):
for j in [0, 1]:
outdict[(Au(i), j)] = out[j, i]