如何以 Python Pandas 数据框 table 格式获取 mongodb 嵌套文档
How to get mongodb nested documents in Python Pandas dataframe table format
name
age
address
1
"Steve"
27
{"number": 4, "street": "Main Road", "city": "Oxford"}
2
"Adam"
32
{"number": 78, "street": "High St", "city": "Cambridge"}
然而,子文档只会在子文档单元格中显示为 JSON
from pandas import DataFrame
df = DataFrame(list(db.collection_name.find({}))
print(df)
如何使用 python 获得低于第二名的 table?
这之后的处理方法是什么?
name
age
address.number
address.street
address.city
1
Steve
27
4
"Main Road"
"Oxford"
2
Adam
32
78
"High St"
"Cambridge"
可以使用pd.DataFrame
to expand the JSON/dict in column address
into a dataframe of the JSON/dict contents. Then, join with the original dataframe using .join()
,如下:
可选步骤:如果您的 JSON/dict 实际上是字符串,请先将它们转换为正确的 JSON/dict。否则,跳过此步骤。
import ast
df['address'] = df['address'].map(ast.literal_eval)
主要代码:
import pandas as pd
df[['name', 'age']].join(pd.DataFrame(df['address'].tolist(), index=df.index).add_prefix('address.'))
结果:
name age address.number address.street address.city
1 Steve 27 4 Main Road Oxford
2 Adam 32 78 High St Cambridge
或者,如果您只有几列要从 JSON/dict 添加,您也可以使用字符串访问器 str[]
逐一添加它们,如下所示
df['address.number'] = df['address'].str['number']
df['address.street'] = df['address'].str['street']
df['address.city'] = df['address'].str['city']
设置
import pandas as pd
data = {'name': {1: 'Steve', 2: 'Adam'},
'age': {1: 27, 2: 32},
'address': {1: {"number": 4, "street": "Main Road", "city": "Oxford"},
2: {"number": 78, "street": "High St", "city": "Cambridge"}}}
df = pd.DataFrame(data)
根据用例,设置 aggregation pipeline and $project 必要的嵌套文档到顶层可能更有意义:
df = pd.DataFrame(db.collection_name.aggregate([{
'$project': {
'_id': 0,
'name': '$name',
'age': '$age',
# Raise Sub-documents to top-level under new name
'address_number': '$address.number',
'address_street': '$address.street',
'address_city': '$address.city'
}
}]))
df
:
name age address_number address_street address_city
0 Steve 27 4 Main Road Oxford
1 Adam 32 78 High St Cambridge
或者如果有太多字段需要手动操作,我们也可以 replaceRoot
and mergeObjects
:
df = pd.DataFrame(db.collection_name.aggregate([
{'$replaceRoot': {'newRoot': {'$mergeObjects': ["$$ROOT", "$address"]}}},
{'$project': {'_id': 0, 'address': 0}}
]))
df
:
name age number street city
0 Steve 27 4 Main Road Oxford
1 Adam 32 78 High St Cambridge
collection_name
设置:
# Drop Collection if exists
db.collection_name.drop()
# Insert Sample Documents
db.collection_name.insert_many([{
'name': 'Steve', 'age': 27,
'address': {"number": 4, "street": "Main Road", "city": "Oxford"}
}, {
'name': 'Adam', 'age': 32,
'address': {"number": 78, "street": "High St", "city": "Cambridge"}
}])
name | age | address | |
---|---|---|---|
1 | "Steve" | 27 | {"number": 4, "street": "Main Road", "city": "Oxford"} |
2 | "Adam" | 32 | {"number": 78, "street": "High St", "city": "Cambridge"} |
然而,子文档只会在子文档单元格中显示为 JSON
from pandas import DataFrame
df = DataFrame(list(db.collection_name.find({}))
print(df)
如何使用 python 获得低于第二名的 table?
这之后的处理方法是什么?
name | age | address.number | address.street | address.city | |
---|---|---|---|---|---|
1 | Steve | 27 | 4 | "Main Road" | "Oxford" |
2 | Adam | 32 | 78 | "High St" | "Cambridge" |
可以使用pd.DataFrame
to expand the JSON/dict in column address
into a dataframe of the JSON/dict contents. Then, join with the original dataframe using .join()
,如下:
可选步骤:如果您的 JSON/dict 实际上是字符串,请先将它们转换为正确的 JSON/dict。否则,跳过此步骤。
import ast
df['address'] = df['address'].map(ast.literal_eval)
主要代码:
import pandas as pd
df[['name', 'age']].join(pd.DataFrame(df['address'].tolist(), index=df.index).add_prefix('address.'))
结果:
name age address.number address.street address.city
1 Steve 27 4 Main Road Oxford
2 Adam 32 78 High St Cambridge
或者,如果您只有几列要从 JSON/dict 添加,您也可以使用字符串访问器 str[]
逐一添加它们,如下所示
df['address.number'] = df['address'].str['number']
df['address.street'] = df['address'].str['street']
df['address.city'] = df['address'].str['city']
设置
import pandas as pd
data = {'name': {1: 'Steve', 2: 'Adam'},
'age': {1: 27, 2: 32},
'address': {1: {"number": 4, "street": "Main Road", "city": "Oxford"},
2: {"number": 78, "street": "High St", "city": "Cambridge"}}}
df = pd.DataFrame(data)
根据用例,设置 aggregation pipeline and $project 必要的嵌套文档到顶层可能更有意义:
df = pd.DataFrame(db.collection_name.aggregate([{
'$project': {
'_id': 0,
'name': '$name',
'age': '$age',
# Raise Sub-documents to top-level under new name
'address_number': '$address.number',
'address_street': '$address.street',
'address_city': '$address.city'
}
}]))
df
:
name age address_number address_street address_city
0 Steve 27 4 Main Road Oxford
1 Adam 32 78 High St Cambridge
或者如果有太多字段需要手动操作,我们也可以 replaceRoot
and mergeObjects
:
df = pd.DataFrame(db.collection_name.aggregate([
{'$replaceRoot': {'newRoot': {'$mergeObjects': ["$$ROOT", "$address"]}}},
{'$project': {'_id': 0, 'address': 0}}
]))
df
:
name age number street city
0 Steve 27 4 Main Road Oxford
1 Adam 32 78 High St Cambridge
collection_name
设置:
# Drop Collection if exists
db.collection_name.drop()
# Insert Sample Documents
db.collection_name.insert_many([{
'name': 'Steve', 'age': 27,
'address': {"number": 4, "street": "Main Road", "city": "Oxford"}
}, {
'name': 'Adam', 'age': 32,
'address': {"number": 78, "street": "High St", "city": "Cambridge"}
}])