数据框中一列中一系列值的频率
Frequency of a sequence of values in a column in dataframe
我是 pandas 的新手,我有一个如下所示的数据集
id values exp
z1 s1 NaN
z1 s2 NaN
z1 s3 NaN
z1 s4 v
z1 s2 NaN
z1 s3 NaN
z1 s4 w
z2 s1 NaN
z2 s5 NaN
z2 s4 w
z3 s1 NaN
z3 s2 NaN
z3 s3 NaN
z3 s4 v
z4 s1 NaN
z4 s2 NaN
z4 s4 w
并且我想获得类似 (Table2) 的输出,显示由 exp
列
中的值终止的序列的频率
id seq exp
z1 s1-s2-s3-s4 v
z1 s2-s3-s4 w
z2 s1-s5-s4 w
z3 s1-s2-s3-s4 v
z4 s1-s2-s4 w
我想要的最终结果如下,但我可以使用 Table2 来实现。
seq count
s1-s2-s3-s4 2
s2-s3-s4 1
s1-s5-s4 1
s1-s2-s4 1
寻找解决此问题的指导pandas/python。
您可以使用 bfill
for replace NaN
back filling, then groupby
by column id
and Series
created by bfill
with apply
join
. Last value_counts
:
print (df.exp.bfill())
0 v
1 v
2 v
3 v
4 w
5 w
6 w
7 w
8 w
9 w
10 v
11 v
12 v
13 v
14 w
15 w
16 w
Name: exp, dtype: object
df = df.groupby(['id', df.exp.bfill()])['values'].apply('-'.join).reset_index()
print (df)
id exp values
0 z1 v s1-s2-s3-s4
1 z1 w s2-s3-s4
2 z2 w s1-s5-s4
3 z3 v s1-s2-s3-s4
4 z4 w s1-s2-s4
df1 = df['values'].value_counts().reset_index()
df1.columns = ['seq','counts']
print (df1)
seq counts
0 s1-s2-s3-s4 2
1 s2-s3-s4 1
2 s1-s2-s4 1
3 s1-s5-s4 1
我是 pandas 的新手,我有一个如下所示的数据集
id values exp
z1 s1 NaN
z1 s2 NaN
z1 s3 NaN
z1 s4 v
z1 s2 NaN
z1 s3 NaN
z1 s4 w
z2 s1 NaN
z2 s5 NaN
z2 s4 w
z3 s1 NaN
z3 s2 NaN
z3 s3 NaN
z3 s4 v
z4 s1 NaN
z4 s2 NaN
z4 s4 w
并且我想获得类似 (Table2) 的输出,显示由 exp
列
id seq exp
z1 s1-s2-s3-s4 v
z1 s2-s3-s4 w
z2 s1-s5-s4 w
z3 s1-s2-s3-s4 v
z4 s1-s2-s4 w
我想要的最终结果如下,但我可以使用 Table2 来实现。
seq count
s1-s2-s3-s4 2
s2-s3-s4 1
s1-s5-s4 1
s1-s2-s4 1
寻找解决此问题的指导pandas/python。
您可以使用 bfill
for replace NaN
back filling, then groupby
by column id
and Series
created by bfill
with apply
join
. Last value_counts
:
print (df.exp.bfill())
0 v
1 v
2 v
3 v
4 w
5 w
6 w
7 w
8 w
9 w
10 v
11 v
12 v
13 v
14 w
15 w
16 w
Name: exp, dtype: object
df = df.groupby(['id', df.exp.bfill()])['values'].apply('-'.join).reset_index()
print (df)
id exp values
0 z1 v s1-s2-s3-s4
1 z1 w s2-s3-s4
2 z2 w s1-s5-s4
3 z3 v s1-s2-s3-s4
4 z4 w s1-s2-s4
df1 = df['values'].value_counts().reset_index()
df1.columns = ['seq','counts']
print (df1)
seq counts
0 s1-s2-s3-s4 2
1 s2-s3-s4 1
2 s1-s2-s4 1
3 s1-s5-s4 1