查找具有相同根名称但后缀不同的两列之间的百分比差异

Question

我的问题有点类似于

我无法在没有循环的情况下对共享相同根子字符串的列执行操作。基本上，我想使用以“_PY”结尾的列和除后缀外具有相同名称的另一列来计算百分比变化。

什么是可能的单行解决方案，或者不涉及 for 循环的解决方案？

url = r'https://www2.arccorp.com/globalassets/forms/corpstats.csv?1653338666304'
df = pd.read_csv(url)
df = df[df['TYPE'] == 'M']

PY_cols = [col for col in df.columns if col.endswith("PY")]
reg_cols = [col.split("_PY")[0] for col in PY_cols]

for k,v in zip(reg_cols,PY_cols):
    df[f"{k}_YOY%"] = round((df[k] - df[v]) / df[v] * 100,2)
    
df

Answer 1

必须将 df 子集化为您需要的列。然后 zip 将提取您需要进行百分比计算的对。

url = r'https://www2.arccorp.com/globalassets/forms/corpstats.csv?1653338666304'
df = pd.read_csv(url)
df = df[df['TYPE'] == 'M']

df_cols = [col for col in df.columns]
PY_cols = [col for col in df.columns if col.endswith("PY")]
# find the matching column, where the names match without the suffix.
PY_use = [col for col in PY_cols if col.split("_PY")[0] in df_cols]
df_use = [col.split("_PY")[0] for col in PY_use]  

for k,v in zip(df_use,PY_use):
    df[f"{k}_YOY%"] = round((df[k] - df[v]) / df[v] * 100,2)

Answer 2

您可以使用：

v = (df[df.columns[df.columns.str.endswith('_PY')]]
       .rename(columns=lambda x: x.rsplit('_', maxsplit=1)[0]))
k = df[v.columns]

out = pd.concat([df, k.sub(v).div(v).mul(100).round(2).add_suffix('_YOY%')], axis=1)

Answer 3

你可以利用 numpy:

py_df_array = (df[df_use].values, df[PY_use].values)
perc_dif = np.round((py_df_array[0] - py_df_array[1]) / py_df_array[1] * 100, 2)
df_perc = pd.DataFrame(perc_def, columns=[f"{col}_YOY%" for col in df_use])
df = pd.concat([df, df_perc], axis=1)

查找具有相同根名称但后缀不同的两列之间的百分比差异

Find percent difference between two columns, that share same root name, but differ in suffix

python

pandas