JSON 列是嵌套值

JSON column is nested values

我有两列数据餐厅名称和评论者等级:

   name                            grades
0  Honey'S Thai Pavilion           [{u'date': 2014-08-12 00:00:00, u'grade'..  
1  Siam Sqaure Thai Cuisine        [{u'date': 2014-11-06 00:00:00, u'grade'...

问题是一列是 JSON 中多个 'date,' 'grade' 和 'score' 配对的列表(从技术上讲是 BSON,因为这是来自MongoDB 教程)。我需要分解成绩列,以便得到如下所示的结果数据框:

name                       Date                   Grade         Score
Honey'S Thai Pavilion      2014-08-12 00:00:00    A             6
Honey'S Thai Pavilion      2015-03-14 00:00:00    B             5
Honey'S Thai Pavilion      2013-07-15 00:00:00    C             6
Siam Sqaure Thai Cuisine   2014-11-06 00:00:00    A             3
Siam Sqaure Thai Cuisine   2015-06-06 00:00:00    B             2

所以我需要拆分出一栏但保留餐厅名称。下面的代码实现了将成绩列放入一个漂亮的数据框中,但我不知道如何保留餐厅名称。

    from pymongo import MongoClient
    import pymongo
    import pandas as pd

    client = MongoClient()

    db = client.test

)
    cursor2 = db.restaurants.find().sort([
        ("borough", pymongo.ASCENDING),
        ("cuisine", pymongo.DESCENDING)
    ])

    #cursor.sort("cuisine",pymongo.ASCENDING)
    data = pd.DataFrame(list(cursor2))[['name', 'grades']]

    data_list= []
    for i in range(0, len(data.grades)):
        g_data = pd.DataFrame(data.grades[i])
        data_list.append(g_data)

    result = pd.concat(data_list)
    print result.head(100)

不太了解 pandas,但您可以使用生成器表达式将 mongo 游标的结果展平,然后将生成器提供给 pandas 数据框,如下所示:

flattened_data = (
    {
        'name': record['name'],
        'date': grade['date'],
        'grade': grade['grade'],
        'score': grade.get('score')
    }
    for record in cursor2
    for grade in record['grades']
)
result = pd.DataFrame(flattened_data)[['name', 'date', 'grade', 'score']]
print result.head(100)

这样,您就不需要在 for 循环上构建 data_list 列表。