如何使索引数据框中的每一行只有一个对象

How to make each row in index Dataframe have only one object

我想创建一个包含 4 个变量(“Draughts”,'Heating it sufficiently is too expensive',“Heating system in indificant”,“Poor building fabric”)的 seaborn 箱形图,y 轴为温度.问题是许多人针对每个意见调查了不止一个选项。我想知道如何在保留所有数据的同时分隔每一行中的选项。 这是一些数据:

CausesCold                                                               
Draughts                                                             15.0
Draughts                                                             19.0
Heating it sufficiently is too expensive                              0.0
Draughts                                                             10.0
Draughts                                                             15.0
Draughts                                                             20.0
Heating it sufficiently is too expensive,Heatin...                    5.0
Heating it sufficiently is too expensive,Heatin...                   18.0
Heating system in inadequate,Draughts                                15.0
Heating system in inadequate,Poor building fabric                    15.0
Heating it sufficiently is too expensive,Heatin...                   21.0
Heating system in inadequate                                         21.0
Heating system in inadequate                                         21.0
Heating it sufficiently is too expensive                             10.0
Draughts                                                              0.0
Heating it sufficiently is too expensive,Poor b...                   18.0
Heating system in inadequate                                         18.0
Poor building fabric,Draughts                                        19.0
Heating system in inadequate,Poor building fabr...                   19.0
Heating system in inadequate                                         18.0
Heating system in inadequate                                         17.0
Heating it sufficiently is too expensive,Poor b...                   18.0
Heating it sufficiently is too expensive,Heatin...                   15.0
Heating it sufficiently is too expensive,Heatin...                   15.0
Heating it sufficiently is too expensive,Poor b...                   20.0
Heating it sufficiently is too expensive                             17.0
Heating it sufficiently is too expensive                             17.0
Heating system in inadequate                                          0.0
Heating it sufficiently is too expensive                             10.0
Heating it sufficiently is too expensive,Heatin...                    0.0

我希望它是这样的:

                          CurrentThermostatTemp
CausesCold                                 
Poor building fabric                   20.0
Poor building fabric                   17.0
Poor building fabric                   20.0
Poor building fabric                   19.0
Poor building fabric                   20.0
Poor building fabric                   17.0
Poor building fabric                   18.0
Poor building fabric                   22.0
Poor building fabric                   25.0
Poor building fabric                   20.0
Poor building fabric                   15.0
Poor building fabric                   19.0
Poor building fabric                   20.0
Poor building fabric                   20.0
Poor building fabric                   20.0
Poor building fabric                   21.0
Poor building fabric                   19.0
Poor building fabric                   20.0
Poor building fabric                   18.0
Poor building fabric                   20.0
Poor building fabric                   17.0
Poor building fabric                   25.0
Poor building fabric                   18.0
Poor building fabric                   20.0
Poor building fabric                   16.0
Poor building fabric                   15.0
Poor building fabric                   21.0
Poor building fabric                   25.0
Poor building fabric                   23.0
Poor building fabric                   30.0
...                                     ...
Draughts                               20.0
Draughts                               20.0
Draughts                               17.0
Draughts                               16.0
Draughts                               25.0
Draughts                               21.0
Draughts                               21.0
Draughts                               18.0
Draughts                               20.0
Draughts                               20.0
Draughts                               18.0

我不清楚您的数据在此处的格式究竟如何。恒温器读数是否已在其自己的列中?

无论如何,您可能想要使用 pandas.Series.str.split

类似

temp = data['CausesCold'].str.split(',', n = 1, expand = True) 

这将创建一个包含两个编号列的新数据框。

如果我假设恒温器值已经在单独的列中关闭,那么我会将恒温器值合并到这个“temp”数据框。类似于:

temp['thermostat']=df['thermostat']

您的临时 df 看起来像:

|********************************|
|0         |1        |thermostat |
|Reason 1. |Reason 2 |Number     |
|Reason 1. |Reason 2 |Number     |
|Reason 1. |null     |Number     |
|********************************|

您希望 0 和 1 列与其对应的恒温器值堆叠在一起。

所以拆分df

df=temp[['0','thermostat']]
df1=temp[['1','thermostat']]

然后附加它们。也可能是某些人只有 1 个答案(即“1”列为空)的情况,所以继续处理这个问题。

df=df.append(df1.dropna(subset=['1']))

如果您不幸拥有原始数据源,其中原因和恒温器代码都在同一个字符串中,我可能会首先对该字符串中的任何数字进行正则表达式提取,然后将其定义为名为 'thermostat' 或类似内容的新列。

无论如何,这应该会让您朝着正确的方向前进。它不一定是到达那里的最有效方式,但它会让你到达那里。