如何将包含列表作为字段的 pandas 数据框拆分为多索引数据框?
How to split a pandas dataframe that contains lists as fields into a multi-indexed dataframe?
我有一个 pandas Dataframe,其中包含列表作为元素(不是 numpy 数组),我想将其分解为一个分层索引的 Dataframe。
这是我要实现的目标的示例:
我有一个这种形式的 DataFrame:
| Model | Company | Url | Criteria | Rating
1| Mode11 | Company1 | Url1 |[Criteria1, Criteria2] | [Rating1 , Rating2]
2| Mode12 | Company2 | Url2 |[Criteria4, Criteria5] | [Rating4, Rating5]
进入
|Model | Company | Url | Rating
----------------------------------
Criteria 1|Model1| Company1| Url1 |Rating1
Criteria 2|Model1| Company1| Url1 |Rating2
Criteria 3|Model1| Company1| Url1 |Rating3
Criteria 4|Model2| Company2| Url2 |Rating4
Criteria 5|Model2| Company2| Url2 |Rating5
Criteria 6|Model2| Company2| Url2 |Rating6
import sys # you don't need StringIO if you're reading the data from a file
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
dfstr = StringIO(""" | Model | Company | Url | Criteria | Rating
1| Mode11 | Company1 | Url1 |[Criteria1, Criteria2] | [Rating1 , Rating2]
2| Mode12 | Company2 | Url2 |[Criteria4, Criteria5] | [Rating4, Rating5]""")
df = pd.DataFrame.from_csv(dfstr, sep='|')
df.columns = ['Model','Company','Url','Criteria','Rating'] # no whitespace
def slist(listish): # list-without-quotes is kind of a pain
return listish.strip(' []').split(', ')
def row2df(oldrow): # make a little sub-DataFrame for each row
tmp = pd.DataFrame(zip(slist(row[1].Criteria),slist(row[1].Rating)))
for val in ['Model','Company','Url']:
tmp[val] = row[1][val]
return tmp
outdf = pd.concat([row2df(row) for row in df.iterrows()]) # and stick them together
outdf.index = outdf[0] # index on the Criteria
outdf.drop(0, axis=1,inplace=True)
outdf.columns = ['Rating','Model','Company','Url'] # tidy up the names
print(outdf)
我有一个 pandas Dataframe,其中包含列表作为元素(不是 numpy 数组),我想将其分解为一个分层索引的 Dataframe。 这是我要实现的目标的示例: 我有一个这种形式的 DataFrame:
| Model | Company | Url | Criteria | Rating
1| Mode11 | Company1 | Url1 |[Criteria1, Criteria2] | [Rating1 , Rating2]
2| Mode12 | Company2 | Url2 |[Criteria4, Criteria5] | [Rating4, Rating5]
进入
|Model | Company | Url | Rating
----------------------------------
Criteria 1|Model1| Company1| Url1 |Rating1
Criteria 2|Model1| Company1| Url1 |Rating2
Criteria 3|Model1| Company1| Url1 |Rating3
Criteria 4|Model2| Company2| Url2 |Rating4
Criteria 5|Model2| Company2| Url2 |Rating5
Criteria 6|Model2| Company2| Url2 |Rating6
import sys # you don't need StringIO if you're reading the data from a file
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
import pandas as pd
dfstr = StringIO(""" | Model | Company | Url | Criteria | Rating
1| Mode11 | Company1 | Url1 |[Criteria1, Criteria2] | [Rating1 , Rating2]
2| Mode12 | Company2 | Url2 |[Criteria4, Criteria5] | [Rating4, Rating5]""")
df = pd.DataFrame.from_csv(dfstr, sep='|')
df.columns = ['Model','Company','Url','Criteria','Rating'] # no whitespace
def slist(listish): # list-without-quotes is kind of a pain
return listish.strip(' []').split(', ')
def row2df(oldrow): # make a little sub-DataFrame for each row
tmp = pd.DataFrame(zip(slist(row[1].Criteria),slist(row[1].Rating)))
for val in ['Model','Company','Url']:
tmp[val] = row[1][val]
return tmp
outdf = pd.concat([row2df(row) for row in df.iterrows()]) # and stick them together
outdf.index = outdf[0] # index on the Criteria
outdf.drop(0, axis=1,inplace=True)
outdf.columns = ['Rating','Model','Company','Url'] # tidy up the names
print(outdf)