如何使索引数据框中的每一行只有一个对象
How to make each row in index Dataframe have only one object
我想创建一个包含 4 个变量(“Draughts”,'Heating it sufficiently is too expensive',“Heating system in indificant”,“Poor building fabric”)的 seaborn 箱形图,y 轴为温度.问题是许多人针对每个意见调查了不止一个选项。我想知道如何在保留所有数据的同时分隔每一行中的选项。
这是一些数据:
CausesCold
Draughts 15.0
Draughts 19.0
Heating it sufficiently is too expensive 0.0
Draughts 10.0
Draughts 15.0
Draughts 20.0
Heating it sufficiently is too expensive,Heatin... 5.0
Heating it sufficiently is too expensive,Heatin... 18.0
Heating system in inadequate,Draughts 15.0
Heating system in inadequate,Poor building fabric 15.0
Heating it sufficiently is too expensive,Heatin... 21.0
Heating system in inadequate 21.0
Heating system in inadequate 21.0
Heating it sufficiently is too expensive 10.0
Draughts 0.0
Heating it sufficiently is too expensive,Poor b... 18.0
Heating system in inadequate 18.0
Poor building fabric,Draughts 19.0
Heating system in inadequate,Poor building fabr... 19.0
Heating system in inadequate 18.0
Heating system in inadequate 17.0
Heating it sufficiently is too expensive,Poor b... 18.0
Heating it sufficiently is too expensive,Heatin... 15.0
Heating it sufficiently is too expensive,Heatin... 15.0
Heating it sufficiently is too expensive,Poor b... 20.0
Heating it sufficiently is too expensive 17.0
Heating it sufficiently is too expensive 17.0
Heating system in inadequate 0.0
Heating it sufficiently is too expensive 10.0
Heating it sufficiently is too expensive,Heatin... 0.0
我希望它是这样的:
CurrentThermostatTemp
CausesCold
Poor building fabric 20.0
Poor building fabric 17.0
Poor building fabric 20.0
Poor building fabric 19.0
Poor building fabric 20.0
Poor building fabric 17.0
Poor building fabric 18.0
Poor building fabric 22.0
Poor building fabric 25.0
Poor building fabric 20.0
Poor building fabric 15.0
Poor building fabric 19.0
Poor building fabric 20.0
Poor building fabric 20.0
Poor building fabric 20.0
Poor building fabric 21.0
Poor building fabric 19.0
Poor building fabric 20.0
Poor building fabric 18.0
Poor building fabric 20.0
Poor building fabric 17.0
Poor building fabric 25.0
Poor building fabric 18.0
Poor building fabric 20.0
Poor building fabric 16.0
Poor building fabric 15.0
Poor building fabric 21.0
Poor building fabric 25.0
Poor building fabric 23.0
Poor building fabric 30.0
... ...
Draughts 20.0
Draughts 20.0
Draughts 17.0
Draughts 16.0
Draughts 25.0
Draughts 21.0
Draughts 21.0
Draughts 18.0
Draughts 20.0
Draughts 20.0
Draughts 18.0
我不清楚您的数据在此处的格式究竟如何。恒温器读数是否已在其自己的列中?
无论如何,您可能想要使用 pandas.Series.str.split
类似
temp = data['CausesCold'].str.split(',', n = 1, expand = True)
这将创建一个包含两个编号列的新数据框。
如果我假设恒温器值已经在单独的列中关闭,那么我会将恒温器值合并到这个“temp”数据框。类似于:
temp['thermostat']=df['thermostat']
您的临时 df 看起来像:
|********************************|
|0 |1 |thermostat |
|Reason 1. |Reason 2 |Number |
|Reason 1. |Reason 2 |Number |
|Reason 1. |null |Number |
|********************************|
您希望 0 和 1 列与其对应的恒温器值堆叠在一起。
所以拆分df
df=temp[['0','thermostat']]
df1=temp[['1','thermostat']]
然后附加它们。也可能是某些人只有 1 个答案(即“1”列为空)的情况,所以继续处理这个问题。
df=df.append(df1.dropna(subset=['1']))
如果您不幸拥有原始数据源,其中原因和恒温器代码都在同一个字符串中,我可能会首先对该字符串中的任何数字进行正则表达式提取,然后将其定义为名为 'thermostat' 或类似内容的新列。
无论如何,这应该会让您朝着正确的方向前进。它不一定是到达那里的最有效方式,但它会让你到达那里。
我想创建一个包含 4 个变量(“Draughts”,'Heating it sufficiently is too expensive',“Heating system in indificant”,“Poor building fabric”)的 seaborn 箱形图,y 轴为温度.问题是许多人针对每个意见调查了不止一个选项。我想知道如何在保留所有数据的同时分隔每一行中的选项。 这是一些数据:
CausesCold
Draughts 15.0
Draughts 19.0
Heating it sufficiently is too expensive 0.0
Draughts 10.0
Draughts 15.0
Draughts 20.0
Heating it sufficiently is too expensive,Heatin... 5.0
Heating it sufficiently is too expensive,Heatin... 18.0
Heating system in inadequate,Draughts 15.0
Heating system in inadequate,Poor building fabric 15.0
Heating it sufficiently is too expensive,Heatin... 21.0
Heating system in inadequate 21.0
Heating system in inadequate 21.0
Heating it sufficiently is too expensive 10.0
Draughts 0.0
Heating it sufficiently is too expensive,Poor b... 18.0
Heating system in inadequate 18.0
Poor building fabric,Draughts 19.0
Heating system in inadequate,Poor building fabr... 19.0
Heating system in inadequate 18.0
Heating system in inadequate 17.0
Heating it sufficiently is too expensive,Poor b... 18.0
Heating it sufficiently is too expensive,Heatin... 15.0
Heating it sufficiently is too expensive,Heatin... 15.0
Heating it sufficiently is too expensive,Poor b... 20.0
Heating it sufficiently is too expensive 17.0
Heating it sufficiently is too expensive 17.0
Heating system in inadequate 0.0
Heating it sufficiently is too expensive 10.0
Heating it sufficiently is too expensive,Heatin... 0.0
我希望它是这样的:
CurrentThermostatTemp
CausesCold
Poor building fabric 20.0
Poor building fabric 17.0
Poor building fabric 20.0
Poor building fabric 19.0
Poor building fabric 20.0
Poor building fabric 17.0
Poor building fabric 18.0
Poor building fabric 22.0
Poor building fabric 25.0
Poor building fabric 20.0
Poor building fabric 15.0
Poor building fabric 19.0
Poor building fabric 20.0
Poor building fabric 20.0
Poor building fabric 20.0
Poor building fabric 21.0
Poor building fabric 19.0
Poor building fabric 20.0
Poor building fabric 18.0
Poor building fabric 20.0
Poor building fabric 17.0
Poor building fabric 25.0
Poor building fabric 18.0
Poor building fabric 20.0
Poor building fabric 16.0
Poor building fabric 15.0
Poor building fabric 21.0
Poor building fabric 25.0
Poor building fabric 23.0
Poor building fabric 30.0
... ...
Draughts 20.0
Draughts 20.0
Draughts 17.0
Draughts 16.0
Draughts 25.0
Draughts 21.0
Draughts 21.0
Draughts 18.0
Draughts 20.0
Draughts 20.0
Draughts 18.0
我不清楚您的数据在此处的格式究竟如何。恒温器读数是否已在其自己的列中?
无论如何,您可能想要使用 pandas.Series.str.split
类似
temp = data['CausesCold'].str.split(',', n = 1, expand = True)
这将创建一个包含两个编号列的新数据框。
如果我假设恒温器值已经在单独的列中关闭,那么我会将恒温器值合并到这个“temp”数据框。类似于:
temp['thermostat']=df['thermostat']
您的临时 df 看起来像:
|********************************|
|0 |1 |thermostat |
|Reason 1. |Reason 2 |Number |
|Reason 1. |Reason 2 |Number |
|Reason 1. |null |Number |
|********************************|
您希望 0 和 1 列与其对应的恒温器值堆叠在一起。
所以拆分df
df=temp[['0','thermostat']]
df1=temp[['1','thermostat']]
然后附加它们。也可能是某些人只有 1 个答案(即“1”列为空)的情况,所以继续处理这个问题。
df=df.append(df1.dropna(subset=['1']))
如果您不幸拥有原始数据源,其中原因和恒温器代码都在同一个字符串中,我可能会首先对该字符串中的任何数字进行正则表达式提取,然后将其定义为名为 'thermostat' 或类似内容的新列。
无论如何,这应该会让您朝着正确的方向前进。它不一定是到达那里的最有效方式,但它会让你到达那里。