将 json 字典嵌套到 pandas 中的行
Unnest json dict to rows in pandas
我有来自 json 文件的以下数据集:
mydf = pd.DataFrame({
'load': {
0: {'id': '100','name': 'Joe'}, 1: {'id': '101','name': 'Ann'},
2: {'id': '102','name': 'Sue'}, 3: {'id': '103','name': 'Leo'}},
'order_date': {0: '2019-04-01', 1: '2019-04-01', 2: '2019-04-02', 3: '2019-04-03'},
'detail': {
0: [{'product_gross_total': 980,'unitary_gross_price': 490,
'hierarchy_name': 'FOOD','payment': [{'amount': 980.0, 'id': 124}],
'product_id': '230','product_name': 'APPLE','quantity': 2}],
1: [{'product_gross_total': 1900,'unitary_gross_price': 1900,
'hierarchy_name': 'MISC','payment': [{'amount': 1900.0, 'id': 125}],
'product_id': '96','product_name': 'CIGAR','quantity': 1}],
2: [{'product_gross_total': 600,'unitary_gross_price': 200,
'hierarchy_name': 'FOOD','payment': [{'amount': 600.0, 'id': 126}],
'product_id': '240','product_name': 'GRAPE','quantity': 3}],
3: [{'product_gross_total': 1400,'unitary_gross_price': 700,
'hierarchy_name': 'MISC','payment': [{'amount': 1400.0, 'id': 132}],
'product_id': '78','product_name': 'QUMASK','quantity': 2},
{'product_gross_total': 1800,'unitary_gross_price': 900,
'hierarchy_name': 'MISC','payment': [{'amount': 1800.0, 'id': 132}],
'product_id': '71','product_name': 'CANDLE','quantity': 2}]
}})
我想将字典转换成列,但是对于列内列表中的每个元素 'detail' 我希望每个产品都有一行。这是预期的结果:
order_date id name product_gross_total unitary_gross_price hierarchy_name product_id product_name quantity
0 2019-04-01 100 Joe 980 490 FOOD 230 APPLE 2
1 2019-04-01 101 Ann 1900 1900 MISC 96 CIGAR 1
2 2019-04-02 102 Sue 600 200 FOOD 240 GRAPE 3
3 2019-04-03 103 Leo 1400 700 MISC 78 QUMASK 2
4 2019-04-03 103 Leo 1800 900 MISC 71 CANDLE 2
这是我试过的。首先,我将 'json_normalize' 用于 'load' 列并且效果很好:
mydf = mydf.join(pd.json_normalize(mydf['load']))
mydf = mydf.drop(['load'], axis=1)
mydf
order_date detail id name
0 2019-04-01 [{'product_gross_total': 980, 'unitary_gross_p... 100 Joe
1 2019-04-01 [{'product_gross_total': 1900, 'unitary_gross_... 101 Ann
2 2019-04-02 [{'product_gross_total': 600, 'unitary_gross_p... 102 Sue
3 2019-04-03 [{'product_gross_total': 1400, 'unitary_gross_... 103 Leo
但是当我尝试对详细信息列执行相同操作时,我得到了这个
mydf = mydf.join(pd.json_normalize(mydf['detail']))
mydf = mydf.drop(['detail'], axis=1)
mydf
order_date id name 0 1
0 2019-04-01 100 Joe {'product_gross_total': 980, 'unitary_gross_pr... None
1 2019-04-01 101 Ann {'product_gross_total': 1900, 'unitary_gross_p... None
2 2019-04-02 102 Sue {'product_gross_total': 600, 'unitary_gross_pr... None
3 2019-04-03 103 Leo {'product_gross_total': 1400, 'unitary_gross_p... {'product_gross_total': 1800, 'unitary_gross_p...
我猜想每个列中的每个元素 'detail' 都会添加一个列...所以如果我有 15 个产品的交易,我将有 15 个列。我坚持将它们转换为行。任何帮助或指导将不胜感激。
尝试:
mydf = mydf.explode("detail")
mydf = pd.concat(
[
mydf,
mydf.pop("load").apply(pd.Series),
mydf.pop("detail").apply(pd.Series),
],
axis=1,
)
mydf = mydf.drop(columns="payment")
print(mydf.to_markdown())
打印:
order_date
id
name
product_gross_total
unitary_gross_price
hierarchy_name
product_id
product_name
quantity
0
2019-04-01
100
Joe
980
490
FOOD
230
APPLE
2
1
2019-04-01
101
Ann
1900
1900
MISC
96
CIGAR
1
2
2019-04-02
102
Sue
600
200
FOOD
240
GRAPE
3
3
2019-04-03
103
Leo
1400
700
MISC
78
QUMASK
2
3
2019-04-03
103
Leo
1800
900
MISC
71
CANDLE
2
我有来自 json 文件的以下数据集:
mydf = pd.DataFrame({
'load': {
0: {'id': '100','name': 'Joe'}, 1: {'id': '101','name': 'Ann'},
2: {'id': '102','name': 'Sue'}, 3: {'id': '103','name': 'Leo'}},
'order_date': {0: '2019-04-01', 1: '2019-04-01', 2: '2019-04-02', 3: '2019-04-03'},
'detail': {
0: [{'product_gross_total': 980,'unitary_gross_price': 490,
'hierarchy_name': 'FOOD','payment': [{'amount': 980.0, 'id': 124}],
'product_id': '230','product_name': 'APPLE','quantity': 2}],
1: [{'product_gross_total': 1900,'unitary_gross_price': 1900,
'hierarchy_name': 'MISC','payment': [{'amount': 1900.0, 'id': 125}],
'product_id': '96','product_name': 'CIGAR','quantity': 1}],
2: [{'product_gross_total': 600,'unitary_gross_price': 200,
'hierarchy_name': 'FOOD','payment': [{'amount': 600.0, 'id': 126}],
'product_id': '240','product_name': 'GRAPE','quantity': 3}],
3: [{'product_gross_total': 1400,'unitary_gross_price': 700,
'hierarchy_name': 'MISC','payment': [{'amount': 1400.0, 'id': 132}],
'product_id': '78','product_name': 'QUMASK','quantity': 2},
{'product_gross_total': 1800,'unitary_gross_price': 900,
'hierarchy_name': 'MISC','payment': [{'amount': 1800.0, 'id': 132}],
'product_id': '71','product_name': 'CANDLE','quantity': 2}]
}})
我想将字典转换成列,但是对于列内列表中的每个元素 'detail' 我希望每个产品都有一行。这是预期的结果:
order_date id name product_gross_total unitary_gross_price hierarchy_name product_id product_name quantity
0 2019-04-01 100 Joe 980 490 FOOD 230 APPLE 2
1 2019-04-01 101 Ann 1900 1900 MISC 96 CIGAR 1
2 2019-04-02 102 Sue 600 200 FOOD 240 GRAPE 3
3 2019-04-03 103 Leo 1400 700 MISC 78 QUMASK 2
4 2019-04-03 103 Leo 1800 900 MISC 71 CANDLE 2
这是我试过的。首先,我将 'json_normalize' 用于 'load' 列并且效果很好:
mydf = mydf.join(pd.json_normalize(mydf['load']))
mydf = mydf.drop(['load'], axis=1)
mydf
order_date detail id name
0 2019-04-01 [{'product_gross_total': 980, 'unitary_gross_p... 100 Joe
1 2019-04-01 [{'product_gross_total': 1900, 'unitary_gross_... 101 Ann
2 2019-04-02 [{'product_gross_total': 600, 'unitary_gross_p... 102 Sue
3 2019-04-03 [{'product_gross_total': 1400, 'unitary_gross_... 103 Leo
但是当我尝试对详细信息列执行相同操作时,我得到了这个
mydf = mydf.join(pd.json_normalize(mydf['detail']))
mydf = mydf.drop(['detail'], axis=1)
mydf
order_date id name 0 1
0 2019-04-01 100 Joe {'product_gross_total': 980, 'unitary_gross_pr... None
1 2019-04-01 101 Ann {'product_gross_total': 1900, 'unitary_gross_p... None
2 2019-04-02 102 Sue {'product_gross_total': 600, 'unitary_gross_pr... None
3 2019-04-03 103 Leo {'product_gross_total': 1400, 'unitary_gross_p... {'product_gross_total': 1800, 'unitary_gross_p...
我猜想每个列中的每个元素 'detail' 都会添加一个列...所以如果我有 15 个产品的交易,我将有 15 个列。我坚持将它们转换为行。任何帮助或指导将不胜感激。
尝试:
mydf = mydf.explode("detail")
mydf = pd.concat(
[
mydf,
mydf.pop("load").apply(pd.Series),
mydf.pop("detail").apply(pd.Series),
],
axis=1,
)
mydf = mydf.drop(columns="payment")
print(mydf.to_markdown())
打印:
order_date | id | name | product_gross_total | unitary_gross_price | hierarchy_name | product_id | product_name | quantity | |
---|---|---|---|---|---|---|---|---|---|
0 | 2019-04-01 | 100 | Joe | 980 | 490 | FOOD | 230 | APPLE | 2 |
1 | 2019-04-01 | 101 | Ann | 1900 | 1900 | MISC | 96 | CIGAR | 1 |
2 | 2019-04-02 | 102 | Sue | 600 | 200 | FOOD | 240 | GRAPE | 3 |
3 | 2019-04-03 | 103 | Leo | 1400 | 700 | MISC | 78 | QUMASK | 2 |
3 | 2019-04-03 | 103 | Leo | 1800 | 900 | MISC | 71 | CANDLE | 2 |