将带子集的 table 转换为 Json 格式
Convert a table with subset to a Json format
我正在尝试处理上市公司的财务信息。我已经下载了数据,现在我正在尝试将其转换为 JSON
格式。
table中有小节,使用 4 ~ 表示 1 个缩进,8 表示 2 个缩进,如下所示:
- 一个缩进表示向下一级
- 双缩进表示向下 2 级
例如 Cost of Goods Sold (COGS) incl. D&A
是部分 header 并且 COGS Growth
应该被捕获为Cost of Goods Sold (COGS) incl. D&A
.
的 child 元素
能否请您帮我确定一种将此数据框转换为 JSON
文件的方法?
Table表示数据帧
| Item Item| 2016| 2017 | 2018 | 2019 | 2020 | 5-year trend|
| :---------: | :----:| :----: | :----: | :----: | :----: |:------------:|
| Sales/Revenue |- |- |- | - |615.82K | NaN |
| ~~~~Sales Growth |- |- |- | - |- | NaN |
| Cost of Goods Sold (COGS) incl. D&A |684 |5.44K |3.14K | 32.5K |- | NaN |
| ~~~~COGS Growth |- |694.59% |-42.19% | 934.31% |- | NaN |
| ~~~~COGS excluding D&A |- |- |- | - |- | NaN |
| ~~~~Depreciation & Amortization Expense |684 |5.44K |3.14K | 32.5K |41.83K | NaN |
| ~~~~~~~~Depreciation |684 |5.44K |3.14K | 32.5K |41.83K | NaN |
| ~~~~~~~~Amortization of Intangibles |- |- |- | - |- | NaN |
| Gross Income |(684) |(5.44K) |(3.14K) | (32.5K) |- | NaN |
| ~~~~Gross Income Growth |- |-694.59% |42.19% | -934.31% |- | NaN |
| ~~~~Gross Profit Margin |- |- |- | - |- | NaN |
| SG&A Expense |1.91M |4.79M |5.88M | 9.5M |9.63M | NaN |
| ~~~~SGA Growth |- |151.12% |22.61% | 61.51% |1.37% | NaN |
| ~~~~Research & Development |- |- |- | - |- | NaN |
| ~~~~Other SG&A |1.91M |4.79M |5.88M | 9.5M |9.63M | NaN |
| ~~~~Other Operating Expense |- |- |- | - |- | NaN |
| Unusual Expense |- |- |- | - |- | NaN |
| EBIT after Unusual Expense |- |- |- | - |- | NaN |
| Non Operating Income/Expense |- |- |(52.76K) | 60.09K |(2.2K) | NaN |
| Non-Operating Interest Income |8.9K |170.93K |59.8K | 50.79K |19.15K | NaN |
| Equity in Affiliates (Pretax) |- |- |- | - |- | NaN |
| Interest Expense |- |- |- | - |115.55K | NaN |
| ~~~~Interest Expense Growth |- |- |- | - |- | NaN |
| ~~~~Gross Interest Expense |- |- |- | - |115.55K | NaN |
| ~~~~Interest Capitalized |- |- |- | - |- | NaN |
Table 分小节组织
Item Item
Subsection1
Subsection2
2016
2017
2018
2019
2020
5-year trend
Sales/Revenue
-
-
-
-
615.82K
NaN
Sales Growth
-
-
-
-
-
NaN
Cost of Goods Sold (COGS) incl. D&A
684
5.44K
3.14K
32.5K
-
NaN
COGS Growth
-
694.59%
-42.19%
934.31%
-
NaN
COGS excluding D&A
-
-
-
-
-
NaN
Depreciation & Amortization Expense
684
5.44K
3.14K
32.5K
41.83K
NaN
Depreciation
684
5.44K
3.14K
32.5K
41.83K
NaN
Amortization of Intangibles
-
-
-
-
-
NaN
Gross Income
(684)
(5.44K)
(3.14K)
(32.5K)
-
NaN
Gross Income Growth
-
-694.59%
42.19%
-934.31%
-
NaN
Gross Profit Mar
-
-
-
-
-
NaN
SG&A Expense
1.91M
4.79M
5.88M
9.5M
9.63M
NaN
SGA Growth
-
151.12%
22.61%
61.51%
1.37%
NaN
Research & Development
-
-
-
-
-
NaN
Other SG&A
1.91M
4.79M
5.88M
9.5M
9.63M
NaN
Other Operating Expense
-
-
-
-
-
NaN
Unusual Expense
-
-
-
-
-
NaN
EBIT after Unusual Expense
-
-
-
-
-
NaN
Non Operating Income/Expense
-
-
(52.76K)
60.09K
(2.2K)
NaN
Non-Operating Interest Income
8.9K
170.93K
59.8K
50.79K
19.15K
NaN
Equity in Affiliates (Pretax)
-
-
-
-
-
NaN
Interest Expense
-
-
-
-
115.55K
NaN
Interest Expense Growth
-
-
-
-
-
NaN
Gross Interest Expense
-
-
-
-
115.55K
NaN
Interest Capitalized
-
-
-
-
-
NaN
我可以通过向缺失的单元格添加值然后对 3 列进行分组来解决此问题,代码如下所示。这是我用来构建此代码的
d = (dframe.fillna("-").groupby(['Item Item','ItemSubsection1','ItemSubsection2'])['2016','2017','2018','2019','2020']
.apply(lambda x: x.to_dict('r'))
.reset_index(name='data')
.groupby(['Item Item','ItemSubsection1'])['ItemSubsection2','data']
.apply(lambda x: x.to_dict('r'))
.reset_index(name='data')
.groupby('Item Item')['ItemSubsection1','data']
.apply(lambda x: x.set_index('ItemSubsection1', 'ItemSubsection2')['data'].to_dict())
.to_json()
)
我正在尝试处理上市公司的财务信息。我已经下载了数据,现在我正在尝试将其转换为 JSON
格式。
table中有小节,使用 4 ~ 表示 1 个缩进,8 表示 2 个缩进,如下所示:
- 一个缩进表示向下一级
- 双缩进表示向下 2 级
例如 Cost of Goods Sold (COGS) incl. D&A
是部分 header 并且 COGS Growth
应该被捕获为Cost of Goods Sold (COGS) incl. D&A
.
能否请您帮我确定一种将此数据框转换为 JSON
文件的方法?
Table表示数据帧
| Item Item| 2016| 2017 | 2018 | 2019 | 2020 | 5-year trend|
| :---------: | :----:| :----: | :----: | :----: | :----: |:------------:|
| Sales/Revenue |- |- |- | - |615.82K | NaN |
| ~~~~Sales Growth |- |- |- | - |- | NaN |
| Cost of Goods Sold (COGS) incl. D&A |684 |5.44K |3.14K | 32.5K |- | NaN |
| ~~~~COGS Growth |- |694.59% |-42.19% | 934.31% |- | NaN |
| ~~~~COGS excluding D&A |- |- |- | - |- | NaN |
| ~~~~Depreciation & Amortization Expense |684 |5.44K |3.14K | 32.5K |41.83K | NaN |
| ~~~~~~~~Depreciation |684 |5.44K |3.14K | 32.5K |41.83K | NaN |
| ~~~~~~~~Amortization of Intangibles |- |- |- | - |- | NaN |
| Gross Income |(684) |(5.44K) |(3.14K) | (32.5K) |- | NaN |
| ~~~~Gross Income Growth |- |-694.59% |42.19% | -934.31% |- | NaN |
| ~~~~Gross Profit Margin |- |- |- | - |- | NaN |
| SG&A Expense |1.91M |4.79M |5.88M | 9.5M |9.63M | NaN |
| ~~~~SGA Growth |- |151.12% |22.61% | 61.51% |1.37% | NaN |
| ~~~~Research & Development |- |- |- | - |- | NaN |
| ~~~~Other SG&A |1.91M |4.79M |5.88M | 9.5M |9.63M | NaN |
| ~~~~Other Operating Expense |- |- |- | - |- | NaN |
| Unusual Expense |- |- |- | - |- | NaN |
| EBIT after Unusual Expense |- |- |- | - |- | NaN |
| Non Operating Income/Expense |- |- |(52.76K) | 60.09K |(2.2K) | NaN |
| Non-Operating Interest Income |8.9K |170.93K |59.8K | 50.79K |19.15K | NaN |
| Equity in Affiliates (Pretax) |- |- |- | - |- | NaN |
| Interest Expense |- |- |- | - |115.55K | NaN |
| ~~~~Interest Expense Growth |- |- |- | - |- | NaN |
| ~~~~Gross Interest Expense |- |- |- | - |115.55K | NaN |
| ~~~~Interest Capitalized |- |- |- | - |- | NaN |
Table 分小节组织
Item Item | Subsection1 | Subsection2 | 2016 | 2017 | 2018 | 2019 | 2020 | 5-year trend |
---|---|---|---|---|---|---|---|---|
Sales/Revenue | - | - | - | - | 615.82K | NaN | ||
Sales Growth | - | - | - | - | - | NaN | ||
Cost of Goods Sold (COGS) incl. D&A | 684 | 5.44K | 3.14K | 32.5K | - | NaN | ||
COGS Growth | - | 694.59% | -42.19% | 934.31% | - | NaN | ||
COGS excluding D&A | - | - | - | - | - | NaN | ||
Depreciation & Amortization Expense | 684 | 5.44K | 3.14K | 32.5K | 41.83K | NaN | ||
Depreciation | 684 | 5.44K | 3.14K | 32.5K | 41.83K | NaN | ||
Amortization of Intangibles | - | - | - | - | - | NaN | ||
Gross Income | (684) | (5.44K) | (3.14K) | (32.5K) | - | NaN | ||
Gross Income Growth | - | -694.59% | 42.19% | -934.31% | - | NaN | ||
Gross Profit Mar | - | - | - | - | - | NaN | ||
SG&A Expense | 1.91M | 4.79M | 5.88M | 9.5M | 9.63M | NaN | ||
SGA Growth | - | 151.12% | 22.61% | 61.51% | 1.37% | NaN | ||
Research & Development | - | - | - | - | - | NaN | ||
Other SG&A | 1.91M | 4.79M | 5.88M | 9.5M | 9.63M | NaN | ||
Other Operating Expense | - | - | - | - | - | NaN | ||
Unusual Expense | - | - | - | - | - | NaN | ||
EBIT after Unusual Expense | - | - | - | - | - | NaN | ||
Non Operating Income/Expense | - | - | (52.76K) | 60.09K | (2.2K) | NaN | ||
Non-Operating Interest Income | 8.9K | 170.93K | 59.8K | 50.79K | 19.15K | NaN | ||
Equity in Affiliates (Pretax) | - | - | - | - | - | NaN | ||
Interest Expense | - | - | - | - | 115.55K | NaN | ||
Interest Expense Growth | - | - | - | - | - | NaN | ||
Gross Interest Expense | - | - | - | - | 115.55K | NaN | ||
Interest Capitalized | - | - | - | - | - | NaN |
我可以通过向缺失的单元格添加值然后对 3 列进行分组来解决此问题,代码如下所示。这是我用来构建此代码的
d = (dframe.fillna("-").groupby(['Item Item','ItemSubsection1','ItemSubsection2'])['2016','2017','2018','2019','2020']
.apply(lambda x: x.to_dict('r'))
.reset_index(name='data')
.groupby(['Item Item','ItemSubsection1'])['ItemSubsection2','data']
.apply(lambda x: x.to_dict('r'))
.reset_index(name='data')
.groupby('Item Item')['ItemSubsection1','data']
.apply(lambda x: x.set_index('ItemSubsection1', 'ItemSubsection2')['data'].to_dict())
.to_json()
)