根据一些规则，如何扩展Pandas中的数据？

Question

请原谅我的英文。希望能说清楚

假设我们有这个数据：

>>> data = {'Span':[3,3.5], 'Low':[6.2,5.16], 'Medium':[4.93,4.1], 'High':[3.68,3.07], 'VeryHigh':[2.94,2.45], 'ExtraHigh':[2.48,2.06], '0.9':[4.9,3.61], '1.5':[3.23,2.38], '2':[2.51,1.85]}
>>> df = pd.DataFrame(data)
>>> df
   Span   Low  Medium  High  VeryHigh  ExtraHigh   0.9   1.5     2
0   3.0  6.20    4.93  3.68      2.94       2.48  4.90  3.23  2.51
1   3.5  5.16    4.10  3.07      2.45       2.06  3.61  2.38  1.85

我想得到这个数据：

    Span       Wind  Snow  MaxSpacing
0    3.0        Low   0.0        6.20
1    3.0     Medium   0.0        4.93
2    3.0       High   0.0        3.68
3    3.0   VeryHigh   0.0        2.94
4    3.0  ExtraHigh   0.0        2.48
5    3.0          0   0.9        4.90
6    3.0          0   1.5        3.23
7    3.0          0   2.0        2.51
8    3.5        Low   0.0        5.16
9    3.5     Medium   0.0        4.10
10   3.5       High   0.0        3.07
11   3.5   VeryHigh   0.0        2.45
12   3.5  ExtraHigh   0.0        2.06
13   3.5          0   0.9        3.61
14   3.5          0   1.5        2.38
15   3.5          0   2.0        1.85

原则适用于df：

Span 通过 Wind 和 Snow 的组合扩展得到 MaxSpacing
Wind 和 Snow 是 mutually exclusive。当Wind为'Low', 'Medium', 'High', 'VeryHigh', 'ExtraHigh'之一时，Snow为零；当 Snow 是 0.9, 1.5, 2 之一时，Wind 是零。

请帮忙。谢谢。

Answer 1

使用 DataFrame.melt for unpivot and then sorting by indices, create Snow column by to_numeric and Series.fillna in DataFrame.insert 并为 Wind 列最后设置 0：

df = (df.melt('Span', ignore_index=False, var_name='Wind', value_name='MaxSpacing')
        .sort_index(ignore_index=True))

s = pd.to_numeric(df['Wind'], errors='coerce')
df.insert(2, 'Snow', s.fillna(0))
df.loc[s.notna(), 'Wind'] = 0
print (df)
    Span       Wind  Snow  MaxSpacing
0    3.0        Low   0.0        6.20
1    3.0     Medium   0.0        4.93
2    3.0       High   0.0        3.68
3    3.0   VeryHigh   0.0        2.94
4    3.0  ExtraHigh   0.0        2.48
5    3.0          0   0.9        4.90
6    3.0          0   1.5        3.23
7    3.0          0   2.0        2.51
8    3.5        Low   0.0        5.16
9    3.5     Medium   0.0        4.10
10   3.5       High   0.0        3.07
11   3.5   VeryHigh   0.0        2.45
12   3.5  ExtraHigh   0.0        2.06
13   3.5          0   0.9        3.61
14   3.5          0   1.5        2.38
15   3.5          0   2.0        1.85

DataFrame.set_index and DataFrame.stack 的替代解决方案：

df = df.set_index('Span').rename_axis('Wind', axis=1).stack().reset_index(name='MaxSpacing')

s = pd.to_numeric(df['Wind'], errors='coerce')
df.insert(2, 'Snow', s.fillna(0))
df.loc[s.notna(), 'Wind'] = 0
print (df)
    Span       Wind  Snow  MaxSpacing
0    3.0        Low   0.0        6.20
1    3.0     Medium   0.0        4.93
2    3.0       High   0.0        3.68
3    3.0   VeryHigh   0.0        2.94
4    3.0  ExtraHigh   0.0        2.48
5    3.0          0   0.9        4.90
6    3.0          0   1.5        3.23
7    3.0          0   2.0        2.51
8    3.5        Low   0.0        5.16
9    3.5     Medium   0.0        4.10
10   3.5       High   0.0        3.07
11   3.5   VeryHigh   0.0        2.45
12   3.5  ExtraHigh   0.0        2.06
13   3.5          0   0.9        3.61
14   3.5          0   1.5        2.38
15   3.5          0   2.0        1.85

根据一些规则，如何扩展Pandas中的数据？

Based on some rules, how to expand data in Pandas?

serialization

dataframe

pandas

pandas-groupby