将带子集的 table 转换为 Json 格式

Convert a table with subset to a Json format

我正在尝试处理上市公司的财务信息。我已经下载了数据,现在我正在尝试将其转换为 JSON 格式。

table中有小节,使用 4 ~ 表示 1 个缩进,8 表示 2 个缩进,如下所示:

例如 Cost of Goods Sold (COGS) incl. D&A 是部分 header 并且 COGS Growth 应该被捕获为Cost of Goods Sold (COGS) incl. D&A.

的 child 元素

能否请您帮我确定一种将此数据框转换为 JSON 文件的方法?

Table表示数据帧

|                                       Item  Item|      2016|     2017 |    2018 |    2019   |     2020 |  5-year trend|
|                                     :---------: |    :----:|   :----: |  :----: |  :----:   |   :----: |:------------:|
| Sales/Revenue                                   |-         |-         |-        | -         |615.82K   | NaN          |
| ~~~~Sales Growth                                |-         |-         |-        | -         |-         | NaN          |
| Cost of Goods Sold (COGS) incl. D&A             |684       |5.44K     |3.14K    | 32.5K     |-         | NaN          |
| ~~~~COGS Growth                                 |-         |694.59%   |-42.19%  | 934.31%   |-         | NaN          |
| ~~~~COGS excluding D&A                          |-         |-         |-        | -         |-         | NaN          |
| ~~~~Depreciation & Amortization Expense         |684       |5.44K     |3.14K    | 32.5K     |41.83K    | NaN          |
| ~~~~~~~~Depreciation                            |684       |5.44K     |3.14K    | 32.5K     |41.83K    | NaN          |
| ~~~~~~~~Amortization of Intangibles             |-         |-         |-        | -         |-         | NaN          |
| Gross Income                                    |(684)     |(5.44K)   |(3.14K)  | (32.5K)   |-         | NaN          |
| ~~~~Gross Income Growth                         |-         |-694.59%  |42.19%   | -934.31%  |-         | NaN          |
| ~~~~Gross Profit Margin                         |-         |-         |-        | -         |-         | NaN          |
| SG&A Expense                                    |1.91M     |4.79M     |5.88M    | 9.5M      |9.63M     | NaN          |
| ~~~~SGA Growth                                  |-         |151.12%   |22.61%   | 61.51%    |1.37%     | NaN          |
| ~~~~Research & Development                      |-         |-         |-        | -         |-         | NaN          |
| ~~~~Other SG&A                                  |1.91M     |4.79M     |5.88M    | 9.5M      |9.63M     | NaN          |
| ~~~~Other Operating Expense                     |-         |-         |-        | -         |-         | NaN          |
| Unusual Expense                                 |-         |-         |-        | -         |-         | NaN          |
| EBIT after Unusual Expense                      |-         |-         |-        | -         |-         | NaN          |
| Non Operating Income/Expense                    |-         |-         |(52.76K) | 60.09K    |(2.2K)    | NaN          |
| Non-Operating Interest Income                   |8.9K      |170.93K   |59.8K    | 50.79K    |19.15K    | NaN          |
| Equity in Affiliates (Pretax)                   |-         |-         |-        | -         |-         | NaN          |
| Interest Expense                                |-         |-         |-        | -         |115.55K   | NaN          |
| ~~~~Interest Expense Growth                     |-         |-         |-        | -         |-         | NaN          |
| ~~~~Gross Interest Expense                      |-         |-         |-        | -         |115.55K   | NaN          |
| ~~~~Interest Capitalized                        |-         |-         |-        | -         |-         | NaN          |

Table 分小节组织

Item Item Subsection1 Subsection2 2016 2017 2018 2019 2020 5-year trend
Sales/Revenue - - - - 615.82K NaN
Sales Growth - - - - - NaN
Cost of Goods Sold (COGS) incl. D&A 684 5.44K 3.14K 32.5K - NaN
COGS Growth - 694.59% -42.19% 934.31% - NaN
COGS excluding D&A - - - - - NaN
Depreciation & Amortization Expense 684 5.44K 3.14K 32.5K 41.83K NaN
Depreciation 684 5.44K 3.14K 32.5K 41.83K NaN
Amortization of Intangibles - - - - - NaN
Gross Income (684) (5.44K) (3.14K) (32.5K) - NaN
Gross Income Growth - -694.59% 42.19% -934.31% - NaN
Gross Profit Mar - - - - - NaN
SG&A Expense 1.91M 4.79M 5.88M 9.5M 9.63M NaN
SGA Growth - 151.12% 22.61% 61.51% 1.37% NaN
Research & Development - - - - - NaN
Other SG&A 1.91M 4.79M 5.88M 9.5M 9.63M NaN
Other Operating Expense - - - - - NaN
Unusual Expense - - - - - NaN
EBIT after Unusual Expense - - - - - NaN
Non Operating Income/Expense - - (52.76K) 60.09K (2.2K) NaN
Non-Operating Interest Income 8.9K 170.93K 59.8K 50.79K 19.15K NaN
Equity in Affiliates (Pretax) - - - - - NaN
Interest Expense - - - - 115.55K NaN
Interest Expense Growth - - - - - NaN
Gross Interest Expense - - - - 115.55K NaN
Interest Capitalized - - - - - NaN

我可以通过向缺失的单元格添加值然后对 3 列进行分组来解决此问题,代码如下所示。这是我用来构建此代码的

d = (dframe.fillna("-").groupby(['Item  Item','ItemSubsection1','ItemSubsection2'])['2016','2017','2018','2019','2020']
       .apply(lambda x: x.to_dict('r'))
       .reset_index(name='data')
       .groupby(['Item  Item','ItemSubsection1'])['ItemSubsection2','data']
       .apply(lambda x: x.to_dict('r'))
       .reset_index(name='data')
       .groupby('Item  Item')['ItemSubsection1','data']
       .apply(lambda x: x.set_index('ItemSubsection1', 'ItemSubsection2')['data'].to_dict())
       .to_json()
       )