如何在给定参数的情况下填充 pandas 数据框
How to populate pandas dataframe given a parameter
在我看来,我正在努力解决一个简单的问题。
我有一个像这样的 pandas 数据框:
results = pd.DataFrame([['executing (i) run', '2+(i)', 3],
['sampling (i) run', '3+(i)', 3]],
columns=['operation', 'executions', 'result'])
所以,输入是:
In [1]: results
Out[1]:
operation executions result
0 executing (i) run 2+(i) 3
1 sampling (i) run 3+(i) 3
我想要做的是在给定参数的情况下填充结果数据框并更新单元格的值。比方说 i = 4, 期望的输出是 :
In [2]: results_populated
Out[2]:
operation executions result
0 executing (0) run 2+(0) 3
1 executing (1) run 2+(1) 3
2 executing (2) run 2+(2) 3
3 executing (3) run 2+(3) 3
4 sampling (0) run 3+(0) 3
5 sampling (1) run 3+(1) 3
6 sampling (2) run 3+(2) 3
7 sampling (3) run 3+(3) 3
我知道我可以在 for 循环中迭代每一行或使用 iter_rows(或类似的)但是当 table 有数百个不同的操作并且“i”可以是数千。
我发现这是非常理想的:
res_expanded = results.loc[results.index[results['operation'].str
.contains(r"(i)", regex = True)]
.repeat(i)].reset_index(drop=True)
和returns:
operation executions result
0 executing (i) run 2+(i) 3
1 executing (i) run 2+(i) 3
2 executing (i) run 2+(i) 3
3 executing (i) run 2+(i) 3
4 sampling (i) run 3+(i) 3
5 sampling (i) run 3+(i) 3
6 sampling (i) run 3+(i) 3
7 sampling (i) run 3+(i) 3
但我找不到最佳(矢量化?)方式来执行每个单元格的更新。任何帮助将不胜感激。
非常感谢。
第一步:
df = pd.DataFrame([['executing (i) run', '2+(i)', 3],
['sampling (i) run', '3+(i)', 3]],
columns=['operation', 'executions', 'result'])
df = df.apply(lambda x: x.repeat(4))
df
operation executions result
0 executing (i) run 2+(i) 3
0 executing (i) run 2+(i) 3
0 executing (i) run 2+(i) 3
0 executing (i) run 2+(i) 3
1 sampling (i) run 3+(i) 3
1 sampling (i) run 3+(i) 3
1 sampling (i) run 3+(i) 3
1 sampling (i) run 3+(i) 3
第 2 步:
df = df.assign(tag=[*range(4)] * df.groupby('operation').ngroups)
df
operation executions result tag
0 executing (i) run 2+(i) 3 0
0 executing (i) run 2+(i) 3 1
0 executing (i) run 2+(i) 3 2
0 executing (i) run 2+(i) 3 3
1 sampling (i) run 3+(i) 3 0
1 sampling (i) run 3+(i) 3 1
1 sampling (i) run 3+(i) 3 2
1 sampling (i) run 3+(i) 3 3
第 3 步:
df.apply(lambda ser: ser.map(lambda x: x.replace('(i)', f'({ser.tag})') if isinstance(x, str) else x), axis=1)
operation executions result tag
0 executing (0) run 2+(0) 3 0
0 executing (1) run 2+(1) 3 1
0 executing (2) run 2+(2) 3 2
0 executing (3) run 2+(3) 3 3
1 sampling (0) run 3+(0) 3 0
1 sampling (1) run 3+(1) 3 1
1 sampling (2) run 3+(2) 3 2
1 sampling (3) run 3+(3) 3 3
完成!
在我看来,我正在努力解决一个简单的问题。 我有一个像这样的 pandas 数据框:
results = pd.DataFrame([['executing (i) run', '2+(i)', 3],
['sampling (i) run', '3+(i)', 3]],
columns=['operation', 'executions', 'result'])
所以,输入是:
In [1]: results
Out[1]:
operation executions result
0 executing (i) run 2+(i) 3
1 sampling (i) run 3+(i) 3
我想要做的是在给定参数的情况下填充结果数据框并更新单元格的值。比方说 i = 4, 期望的输出是 :
In [2]: results_populated
Out[2]:
operation executions result
0 executing (0) run 2+(0) 3
1 executing (1) run 2+(1) 3
2 executing (2) run 2+(2) 3
3 executing (3) run 2+(3) 3
4 sampling (0) run 3+(0) 3
5 sampling (1) run 3+(1) 3
6 sampling (2) run 3+(2) 3
7 sampling (3) run 3+(3) 3
我知道我可以在 for 循环中迭代每一行或使用 iter_rows(或类似的)但是当 table 有数百个不同的操作并且“i”可以是数千。 我发现这是非常理想的:
res_expanded = results.loc[results.index[results['operation'].str
.contains(r"(i)", regex = True)]
.repeat(i)].reset_index(drop=True)
和returns:
operation executions result
0 executing (i) run 2+(i) 3
1 executing (i) run 2+(i) 3
2 executing (i) run 2+(i) 3
3 executing (i) run 2+(i) 3
4 sampling (i) run 3+(i) 3
5 sampling (i) run 3+(i) 3
6 sampling (i) run 3+(i) 3
7 sampling (i) run 3+(i) 3
但我找不到最佳(矢量化?)方式来执行每个单元格的更新。任何帮助将不胜感激。
非常感谢。
第一步:
df = pd.DataFrame([['executing (i) run', '2+(i)', 3],
['sampling (i) run', '3+(i)', 3]],
columns=['operation', 'executions', 'result'])
df = df.apply(lambda x: x.repeat(4))
df
operation executions result
0 executing (i) run 2+(i) 3
0 executing (i) run 2+(i) 3
0 executing (i) run 2+(i) 3
0 executing (i) run 2+(i) 3
1 sampling (i) run 3+(i) 3
1 sampling (i) run 3+(i) 3
1 sampling (i) run 3+(i) 3
1 sampling (i) run 3+(i) 3
第 2 步:
df = df.assign(tag=[*range(4)] * df.groupby('operation').ngroups)
df
operation executions result tag
0 executing (i) run 2+(i) 3 0
0 executing (i) run 2+(i) 3 1
0 executing (i) run 2+(i) 3 2
0 executing (i) run 2+(i) 3 3
1 sampling (i) run 3+(i) 3 0
1 sampling (i) run 3+(i) 3 1
1 sampling (i) run 3+(i) 3 2
1 sampling (i) run 3+(i) 3 3
第 3 步:
df.apply(lambda ser: ser.map(lambda x: x.replace('(i)', f'({ser.tag})') if isinstance(x, str) else x), axis=1)
operation executions result tag
0 executing (0) run 2+(0) 3 0
0 executing (1) run 2+(1) 3 1
0 executing (2) run 2+(2) 3 2
0 executing (3) run 2+(3) 3 3
1 sampling (0) run 3+(0) 3 0
1 sampling (1) run 3+(1) 3 1
1 sampling (2) run 3+(2) 3 2
1 sampling (3) run 3+(3) 3 3
完成!