如何以 Python Pandas 数据框 table 格式获取 mongodb 嵌套文档

How to get mongodb nested documents in Python Pandas dataframe table format

name age address
1 "Steve" 27 {"number": 4, "street": "Main Road", "city": "Oxford"}
2 "Adam" 32 {"number": 78, "street": "High St", "city": "Cambridge"}

然而,子文档只会在子文档单元格中显示为 JSON

from pandas import DataFrame

df = DataFrame(list(db.collection_name.find({}))
print(df)

如何使用 python 获得低于第二名的 table?

这之后的处理方法是什么?

name age address.number address.street address.city
1 Steve 27 4 "Main Road" "Oxford"
2 Adam 32 78 "High St" "Cambridge"

可以使用pd.DataFrame to expand the JSON/dict in column address into a dataframe of the JSON/dict contents. Then, join with the original dataframe using .join(),如下:

可选步骤:如果您的 JSON/dict 实际上是字符串,请先将它们转换为正确的 JSON/dict。否则,跳过此步骤。

import ast
df['address'] = df['address'].map(ast.literal_eval)

主要代码:

import pandas as pd

df[['name', 'age']].join(pd.DataFrame(df['address'].tolist(), index=df.index).add_prefix('address.'))

结果:

    name  age  address.number address.street address.city
1  Steve   27               4      Main Road       Oxford
2   Adam   32              78        High St    Cambridge

或者,如果您只有几列要从 JSON/dict 添加,您也可以使用字符串访问器 str[] 逐一添加它们,如下所示

df['address.number'] = df['address'].str['number']
df['address.street'] = df['address'].str['street']
df['address.city'] = df['address'].str['city']

设置

import pandas as pd

data = {'name': {1: 'Steve', 2: 'Adam'},
        'age': {1: 27, 2: 32},
        'address': {1: {"number": 4, "street": "Main Road", "city": "Oxford"},
                    2: {"number": 78, "street": "High St", "city": "Cambridge"}}}
df = pd.DataFrame(data)

根据用例,设置 aggregation pipeline and $project 必要的嵌套文档到顶层可能更有意义:

df = pd.DataFrame(db.collection_name.aggregate([{
    '$project': {
        '_id': 0,
        'name': '$name',
        'age': '$age',
        # Raise Sub-documents to top-level under new name
        'address_number': '$address.number',
        'address_street': '$address.street',
        'address_city': '$address.city'
    }
}]))

df:

    name  age  address_number address_street address_city
0  Steve   27               4      Main Road       Oxford
1   Adam   32              78        High St    Cambridge

或者如果有太多字段需要手动操作,我们也可以 replaceRoot and mergeObjects:

df = pd.DataFrame(db.collection_name.aggregate([
    {'$replaceRoot': {'newRoot': {'$mergeObjects': ["$$ROOT", "$address"]}}},
    {'$project': {'_id': 0, 'address': 0}}
]))

df:

    name  age  number     street       city
0  Steve   27       4  Main Road     Oxford
1   Adam   32      78    High St  Cambridge

collection_name 设置:

# Drop Collection if exists
db.collection_name.drop()
# Insert Sample Documents
db.collection_name.insert_many([{
    'name': 'Steve', 'age': 27,
    'address': {"number": 4, "street": "Main Road", "city": "Oxford"}
}, {
    'name': 'Adam', 'age': 32,
    'address': {"number": 78, "street": "High St", "city": "Cambridge"}
}])