解析所有列名并创建新列
parse all col names and create new columns
date red,heavy,new blue,light,old
1-2-20 320 120
2-3-20 220 125
我想遍历所有行和列,这样我就可以解析列名并将它们用作新列的值。我想获取这种格式的数据:
我想要重复日期。 'value' col 来自原始 table.
date color weight condition. value
1-2-20 red heavy new 320
1-2-20 blue light. old. 120
2-3-20 red. heavy new. 220
我试过了,当我只有一列时它起作用了
colName = df_retransform.columns[1]
lst = colName.split(",")
color = lst[0]
weight = lst[1]
condition = lst[2]
df_retransform.rename(columns={colName: 'value'}, inplace=True)
df_retransform['color'] = color
df_retransform['weight'] = weight
df_retransform['condition'] = condition
但我无法修改它,以便我可以对所有列进行修改。
使用 DataFrame.melt
with Series.str.split
, DataFrame.pop
用于使用和删除列 variable
,如有必要,最后更改列名称的顺序:
首先你可以测试是否所有没有数据的列都有2 ,
:
print ([col for col in df.columns if col.count(',') != 2])
['date']
df = df.melt('date')
df[['color', 'weight', 'condition']] = df.pop('variable').str.split(',', expand=True)
df = df[['date', 'color', 'weight', 'condition', 'value']]
print (df)
date color weight condition value
0 1-2-20 red heavy new 320
1 2-3-20 red heavy new 220
2 1-2-20 blue light old 120
3 2-3-20 blue light old 125
或对 MultiIndex Series
使用 DataFrame.stack
,然后拆分并为新列重新创建新的所有级别:
print (df)
date red,heavy,new blue,light,old
0 1-2-20 320 NaN
1 NaN 220 125.0
s = df.set_index('date').stack(dropna=False)
s.index = pd.MultiIndex.from_tuples([(i, *j.split(',')) for i, j in s.index],
names=['date', 'color', 'weight', 'condition'])
df = s.reset_index(name='value')
print (df)
date color weight condition value
0 1-2-20 red heavy new 320.0
1 1-2-20 blue light old NaN
2 NaN red heavy new 220.0
3 NaN blue light old 125.0
您也可以使用 pivot_longer function from pyjanitor; at the moment you have to install the latest development version from github:
# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
import janitor
df.pivot_longer(index="date",
names_to=("color", "weight", "condition"),
names_sep=",")
date color weight condition value
0 1-2-20 red heavy new 320
1 2-3-20 red heavy new 220
2 1-2-20 blue light old 120
3 2-3-20 blue light old 125
您将新列的名称传递给 names_to
,并在 names_sep
中指定分隔符 (,
)。
如果您希望它按出现顺序返回,您可以将布尔值 True
传递给 sort_by_appearance
参数:
df.pivot_longer(
index="date",
names_to=("color", "weight", "condition"),
names_sep=",",
sort_by_appearance=True,
)
date color weight condition value
0 1-2-20 red heavy new 320
1 1-2-20 blue light old 120
2 2-3-20 red heavy new 220
3 2-3-20 blue light old 125
date red,heavy,new blue,light,old
1-2-20 320 120
2-3-20 220 125
我想遍历所有行和列,这样我就可以解析列名并将它们用作新列的值。我想获取这种格式的数据:
我想要重复日期。 'value' col 来自原始 table.
date color weight condition. value
1-2-20 red heavy new 320
1-2-20 blue light. old. 120
2-3-20 red. heavy new. 220
我试过了,当我只有一列时它起作用了
colName = df_retransform.columns[1]
lst = colName.split(",")
color = lst[0]
weight = lst[1]
condition = lst[2]
df_retransform.rename(columns={colName: 'value'}, inplace=True)
df_retransform['color'] = color
df_retransform['weight'] = weight
df_retransform['condition'] = condition
但我无法修改它,以便我可以对所有列进行修改。
使用 DataFrame.melt
with Series.str.split
, DataFrame.pop
用于使用和删除列 variable
,如有必要,最后更改列名称的顺序:
首先你可以测试是否所有没有数据的列都有2 ,
:
print ([col for col in df.columns if col.count(',') != 2])
['date']
df = df.melt('date')
df[['color', 'weight', 'condition']] = df.pop('variable').str.split(',', expand=True)
df = df[['date', 'color', 'weight', 'condition', 'value']]
print (df)
date color weight condition value
0 1-2-20 red heavy new 320
1 2-3-20 red heavy new 220
2 1-2-20 blue light old 120
3 2-3-20 blue light old 125
或对 MultiIndex Series
使用 DataFrame.stack
,然后拆分并为新列重新创建新的所有级别:
print (df)
date red,heavy,new blue,light,old
0 1-2-20 320 NaN
1 NaN 220 125.0
s = df.set_index('date').stack(dropna=False)
s.index = pd.MultiIndex.from_tuples([(i, *j.split(',')) for i, j in s.index],
names=['date', 'color', 'weight', 'condition'])
df = s.reset_index(name='value')
print (df)
date color weight condition value
0 1-2-20 red heavy new 320.0
1 1-2-20 blue light old NaN
2 NaN red heavy new 220.0
3 NaN blue light old 125.0
您也可以使用 pivot_longer function from pyjanitor; at the moment you have to install the latest development version from github:
# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
import janitor
df.pivot_longer(index="date",
names_to=("color", "weight", "condition"),
names_sep=",")
date color weight condition value
0 1-2-20 red heavy new 320
1 2-3-20 red heavy new 220
2 1-2-20 blue light old 120
3 2-3-20 blue light old 125
您将新列的名称传递给 names_to
,并在 names_sep
中指定分隔符 (,
)。
如果您希望它按出现顺序返回,您可以将布尔值 True
传递给 sort_by_appearance
参数:
df.pivot_longer(
index="date",
names_to=("color", "weight", "condition"),
names_sep=",",
sort_by_appearance=True,
)
date color weight condition value
0 1-2-20 red heavy new 320
1 1-2-20 blue light old 120
2 2-3-20 red heavy new 220
3 2-3-20 blue light old 125