将一组分成n份,每组加块号在python
Divide a group into n and add block numbers for each group in python
我有以下 table:
ColumnA
ColumnB
A
12
B
32
C
44
D
76
E
99
F
123
G
65
H
87
I
76
J
231
k
80
l
55
m
27
n
67
我想将此 table 分成 'n'(此处为 n = 4)组并添加另一列,其中包含组名。输出应如下所示:
ColumnA
ColumnB
ColumnC
A
12
1
B
32
1
C
44
1
D
76
1
E
99
2
F
123
2
G
65
2
H
87
2
I
76
3
J
231
3
k
80
3
l
55
4
m
27
4
n
67
4
我为什么这么努力?
TGn = 4
idx = set(df.index // TGn)
treatment_groups = [i for i in range(1, n+1)]
df['columnC'] = (df.index // TGn).map(dict(zip(idx, treatment_groups)))
这组没有正确划分,不知道我哪里错了。我该如何更正它?
假设您的样本大小正好除以 n(即 sample_size%n
为 0):
import numpy as np
groups = range(1,n+1)
df['columnC'] = np.repeat(groups,int(len(df)/n))
如果您的样本量没有完全除以 n(即 sample_size%n
不是 0):
# Assigning the remaining rows to random groups
df['columnC'] = np.concatenate(
[np.repeat(groups,int(len(df)/n)),
np.random.randint(1, high=n, size=int(len(df)%n), dtype=int)])
# Assigning the remaining rows to group 'm'
df['columnC'] = np.concatenate(
[np.repeat(groups,int(len(df)/n)),
np.repeat([m],int(len(df)%n)), dtype=int)])
我有以下 table:
ColumnA | ColumnB |
---|---|
A | 12 |
B | 32 |
C | 44 |
D | 76 |
E | 99 |
F | 123 |
G | 65 |
H | 87 |
I | 76 |
J | 231 |
k | 80 |
l | 55 |
m | 27 |
n | 67 |
我想将此 table 分成 'n'(此处为 n = 4)组并添加另一列,其中包含组名。输出应如下所示:
ColumnA | ColumnB | ColumnC |
---|---|---|
A | 12 | 1 |
B | 32 | 1 |
C | 44 | 1 |
D | 76 | 1 |
E | 99 | 2 |
F | 123 | 2 |
G | 65 | 2 |
H | 87 | 2 |
I | 76 | 3 |
J | 231 | 3 |
k | 80 | 3 |
l | 55 | 4 |
m | 27 | 4 |
n | 67 | 4 |
我为什么这么努力?
TGn = 4
idx = set(df.index // TGn)
treatment_groups = [i for i in range(1, n+1)]
df['columnC'] = (df.index // TGn).map(dict(zip(idx, treatment_groups)))
这组没有正确划分,不知道我哪里错了。我该如何更正它?
假设您的样本大小正好除以 n(即 sample_size%n
为 0):
import numpy as np
groups = range(1,n+1)
df['columnC'] = np.repeat(groups,int(len(df)/n))
如果您的样本量没有完全除以 n(即 sample_size%n
不是 0):
# Assigning the remaining rows to random groups
df['columnC'] = np.concatenate(
[np.repeat(groups,int(len(df)/n)),
np.random.randint(1, high=n, size=int(len(df)%n), dtype=int)])
# Assigning the remaining rows to group 'm'
df['columnC'] = np.concatenate(
[np.repeat(groups,int(len(df)/n)),
np.repeat([m],int(len(df)%n)), dtype=int)])