解析所有列名并创建新列

Question

date     red,heavy,new  blue,light,old
1-2-20   320             120
2-3-20   220             125

我想遍历所有行和列，这样我就可以解析列名并将它们用作新列的值。我想获取这种格式的数据：

我想要重复日期。 'value' col 来自原始 table.

date     color           weight   condition.  value
1-2-20   red             heavy     new        320
1-2-20   blue            light.    old.       120
2-3-20   red.            heavy     new.       220

我试过了，当我只有一列时它起作用了

colName = df_retransform.columns[1]

lst = colName.split(",")
color = lst[0]
weight = lst[1]
condition = lst[2]


df_retransform.rename(columns={colName: 'value'}, inplace=True)
df_retransform['color'] = color
df_retransform['weight'] = weight
df_retransform['condition'] = condition

但我无法修改它，以便我可以对所有列进行修改。

Answer 1

使用 DataFrame.melt with Series.str.split, DataFrame.pop 用于使用和删除列 variable，如有必要，最后更改列名称的顺序：

首先你可以测试是否所有没有数据的列都有2 ,:

print ([col for col in df.columns if col.count(',') != 2])
['date'] 


df = df.melt('date')
df[['color', 'weight', 'condition']] = df.pop('variable').str.split(',', expand=True)

df = df[['date', 'color', 'weight', 'condition', 'value']]
print (df)
     date color weight condition  value
0  1-2-20   red  heavy       new    320
1  2-3-20   red  heavy       new    220
2  1-2-20  blue  light       old    120
3  2-3-20  blue  light       old    125

或对 MultiIndex Series 使用 DataFrame.stack，然后拆分并为新列重新创建新的所有级别：

print (df)
    date  red,heavy,new  blue,light,old
0  1-2-20            320             NaN
1     NaN            220           125.0

s = df.set_index('date').stack(dropna=False)
s.index = pd.MultiIndex.from_tuples([(i, *j.split(',')) for i, j in s.index], 
                                    names=['date', 'color', 'weight', 'condition'])
df = s.reset_index(name='value')
print (df)

     date color weight condition  value
0  1-2-20   red  heavy       new  320.0
1  1-2-20  blue  light       old    NaN
2     NaN   red  heavy       new  220.0
3     NaN  blue  light       old  125.0

Answer 2

您也可以使用 pivot_longer function from pyjanitor; at the moment you have to install the latest development version from github:

 # install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
 import janitor

df.pivot_longer(index="date", 
                names_to=("color", "weight", "condition"), 
                names_sep=",")

date    color   weight  condition   value
0   1-2-20  red     heavy   new     320
1   2-3-20  red     heavy   new     220
2   1-2-20  blue    light   old     120
3   2-3-20  blue    light   old     125

您将新列的名称传递给 names_to，并在 names_sep 中指定分隔符 (,)。

如果您希望它按出现顺序返回，您可以将布尔值 True 传递给 sort_by_appearance 参数：

df.pivot_longer(
    index="date",
    names_to=("color", "weight", "condition"),
    names_sep=",",
    sort_by_appearance=True,
)


    date    color   weight  condition   value
0   1-2-20  red     heavy   new     320
1   1-2-20  blue    light   old     120
2   2-3-20  red     heavy   new     220
3   2-3-20  blue    light   old     125

解析所有列名并创建新列

parse all col names and create new columns

python

data-analysis

dataframe

pandas