JSON 列是嵌套值
JSON column is nested values
我有两列数据餐厅名称和评论者等级:
name grades
0 Honey'S Thai Pavilion [{u'date': 2014-08-12 00:00:00, u'grade'..
1 Siam Sqaure Thai Cuisine [{u'date': 2014-11-06 00:00:00, u'grade'...
问题是一列是 JSON 中多个 'date,' 'grade' 和 'score' 配对的列表(从技术上讲是 BSON,因为这是来自MongoDB 教程)。我需要分解成绩列,以便得到如下所示的结果数据框:
name Date Grade Score
Honey'S Thai Pavilion 2014-08-12 00:00:00 A 6
Honey'S Thai Pavilion 2015-03-14 00:00:00 B 5
Honey'S Thai Pavilion 2013-07-15 00:00:00 C 6
Siam Sqaure Thai Cuisine 2014-11-06 00:00:00 A 3
Siam Sqaure Thai Cuisine 2015-06-06 00:00:00 B 2
所以我需要拆分出一栏但保留餐厅名称。下面的代码实现了将成绩列放入一个漂亮的数据框中,但我不知道如何保留餐厅名称。
from pymongo import MongoClient
import pymongo
import pandas as pd
client = MongoClient()
db = client.test
)
cursor2 = db.restaurants.find().sort([
("borough", pymongo.ASCENDING),
("cuisine", pymongo.DESCENDING)
])
#cursor.sort("cuisine",pymongo.ASCENDING)
data = pd.DataFrame(list(cursor2))[['name', 'grades']]
data_list= []
for i in range(0, len(data.grades)):
g_data = pd.DataFrame(data.grades[i])
data_list.append(g_data)
result = pd.concat(data_list)
print result.head(100)
不太了解 pandas,但您可以使用生成器表达式将 mongo 游标的结果展平,然后将生成器提供给 pandas 数据框,如下所示:
flattened_data = (
{
'name': record['name'],
'date': grade['date'],
'grade': grade['grade'],
'score': grade.get('score')
}
for record in cursor2
for grade in record['grades']
)
result = pd.DataFrame(flattened_data)[['name', 'date', 'grade', 'score']]
print result.head(100)
这样,您就不需要在 for
循环上构建 data_list
列表。
我有两列数据餐厅名称和评论者等级:
name grades
0 Honey'S Thai Pavilion [{u'date': 2014-08-12 00:00:00, u'grade'..
1 Siam Sqaure Thai Cuisine [{u'date': 2014-11-06 00:00:00, u'grade'...
问题是一列是 JSON 中多个 'date,' 'grade' 和 'score' 配对的列表(从技术上讲是 BSON,因为这是来自MongoDB 教程)。我需要分解成绩列,以便得到如下所示的结果数据框:
name Date Grade Score
Honey'S Thai Pavilion 2014-08-12 00:00:00 A 6
Honey'S Thai Pavilion 2015-03-14 00:00:00 B 5
Honey'S Thai Pavilion 2013-07-15 00:00:00 C 6
Siam Sqaure Thai Cuisine 2014-11-06 00:00:00 A 3
Siam Sqaure Thai Cuisine 2015-06-06 00:00:00 B 2
所以我需要拆分出一栏但保留餐厅名称。下面的代码实现了将成绩列放入一个漂亮的数据框中,但我不知道如何保留餐厅名称。
from pymongo import MongoClient
import pymongo
import pandas as pd
client = MongoClient()
db = client.test
)
cursor2 = db.restaurants.find().sort([
("borough", pymongo.ASCENDING),
("cuisine", pymongo.DESCENDING)
])
#cursor.sort("cuisine",pymongo.ASCENDING)
data = pd.DataFrame(list(cursor2))[['name', 'grades']]
data_list= []
for i in range(0, len(data.grades)):
g_data = pd.DataFrame(data.grades[i])
data_list.append(g_data)
result = pd.concat(data_list)
print result.head(100)
不太了解 pandas,但您可以使用生成器表达式将 mongo 游标的结果展平,然后将生成器提供给 pandas 数据框,如下所示:
flattened_data = (
{
'name': record['name'],
'date': grade['date'],
'grade': grade['grade'],
'score': grade.get('score')
}
for record in cursor2
for grade in record['grades']
)
result = pd.DataFrame(flattened_data)[['name', 'date', 'grade', 'score']]
print result.head(100)
这样,您就不需要在 for
循环上构建 data_list
列表。