如何以 Python Pandas 数据框 table 格式获取 mongodb 嵌套文档

Question

	name	age	address
1	"Steve"	27	{"number": 4, "street": "Main Road", "city": "Oxford"}
2	"Adam"	32	{"number": 78, "street": "High St", "city": "Cambridge"}

然而，子文档只会在子文档单元格中显示为 JSON

from pandas import DataFrame

df = DataFrame(list(db.collection_name.find({}))
print(df)

如何使用 python 获得低于第二名的 table？

这之后的处理方法是什么？

	name	age	address.number	address.street	address.city
1	Steve	27	4	"Main Road"	"Oxford"
2	Adam	32	78	"High St"	"Cambridge"

Answer 1

可以使用pd.DataFrame to expand the JSON/dict in column address into a dataframe of the JSON/dict contents. Then, join with the original dataframe using .join()，如下：

可选步骤：如果您的 JSON/dict 实际上是字符串，请先将它们转换为正确的 JSON/dict。否则，跳过此步骤。

import ast
df['address'] = df['address'].map(ast.literal_eval)

主要代码：

import pandas as pd

df[['name', 'age']].join(pd.DataFrame(df['address'].tolist(), index=df.index).add_prefix('address.'))

结果：

    name  age  address.number address.street address.city
1  Steve   27               4      Main Road       Oxford
2   Adam   32              78        High St    Cambridge

或者，如果您只有几列要从 JSON/dict 添加，您也可以使用字符串访问器 str[] 逐一添加它们，如下所示

df['address.number'] = df['address'].str['number']
df['address.street'] = df['address'].str['street']
df['address.city'] = df['address'].str['city']

设置

import pandas as pd

data = {'name': {1: 'Steve', 2: 'Adam'},
        'age': {1: 27, 2: 32},
        'address': {1: {"number": 4, "street": "Main Road", "city": "Oxford"},
                    2: {"number": 78, "street": "High St", "city": "Cambridge"}}}
df = pd.DataFrame(data)

Answer 2

根据用例，设置 aggregation pipeline and $project 必要的嵌套文档到顶层可能更有意义：

df = pd.DataFrame(db.collection_name.aggregate([{
    '$project': {
        '_id': 0,
        'name': '$name',
        'age': '$age',
        # Raise Sub-documents to top-level under new name
        'address_number': '$address.number',
        'address_street': '$address.street',
        'address_city': '$address.city'
    }
}]))

df:

    name  age  address_number address_street address_city
0  Steve   27               4      Main Road       Oxford
1   Adam   32              78        High St    Cambridge

或者如果有太多字段需要手动操作，我们也可以 replaceRoot and mergeObjects:

df = pd.DataFrame(db.collection_name.aggregate([
    {'$replaceRoot': {'newRoot': {'$mergeObjects': ["$$ROOT", "$address"]}}},
    {'$project': {'_id': 0, 'address': 0}}
]))

df:

    name  age  number     street       city
0  Steve   27       4  Main Road     Oxford
1   Adam   32      78    High St  Cambridge

collection_name 设置：

# Drop Collection if exists
db.collection_name.drop()
# Insert Sample Documents
db.collection_name.insert_many([{
    'name': 'Steve', 'age': 27,
    'address': {"number": 4, "street": "Main Road", "city": "Oxford"}
}, {
    'name': 'Adam', 'age': 32,
    'address': {"number": 78, "street": "High St", "city": "Cambridge"}
}])

如何以 Python Pandas 数据框 table 格式获取 mongodb 嵌套文档

How to get mongodb nested documents in Python Pandas dataframe table format

python

mongodb

pymongo

pandas