使用 python 对字典的理解来为 pandas 数据框赋值

Using python comprehension with dictionary to assign values to pandas dataframe

假设我正在尝试构建一个数据框以像 table 一样打印出来以检查扇区:

SectorDescription   SectorCode
0   State Energy Data Systems   SEDS
1   Coal Data   COAL
2   Petroleum Data  PET
3   Natural Gas Data    NG
4   Electricity Data    ELEC
5   Petroleum Imports Data  PET_IMPORTS
6   Short-Term Energy Outlook Data  STEO
7   International Energy Data   INTL
8   Annual Energy Outlook Data  AEO

现在我有:

QuandlEIASector = {"State Energy Data Systems":"SEDS",
                  "Coal Data":"COAL",
                  "Petroleum Data":"PET",
                  "Natural Gas Data":"NG",
                  "Electricity Data":"ELEC",
                  "Petroleum Imports Data":"PET_IMPORTS",
                  "Short-Term Energy Outlook Data":"STEO",
                  "International Energy Data":"INTL",
                  "Annual Energy Outlook Data":"AEO"}

我所做的是:

QuandlEIASectorList = pd.DataFrame()
QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
QuandlEIASectorList

但是 python 的理解是否可以更快地将列值分配给 pandas 数据框?

创建Series然后转换为DataFrame:

QuandlEIASectorList = (pd.Series(QuandlEIASector)
                         .rename_axis('SectorDescription')
                         .reset_index(name='SectorCode'))

相似:

QuandlEIASectorList = (pd.Series(QuandlEIASector, name='SectorCode')
                         .rename_axis('SectorDescription')
                         .reset_index())

您的代码应与 DataFrame 构造函数一起使用:

QuandlEIASectorList = pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
                                    'SectorCode': list(QuandlEIASector.values())})

或:

QuandlEIASectorList = pd.DataFrame(list(QuandlEIASector.items()), 
                                   columns=['SectorDescription','SectorCode'])

10k 键的性能

QuandlEIASector = dict(zip([f'{x} data' for x in np.arange(10000)], 
                           [f'{x} keys' for x in np.arange(10000)]))

In [73]: %%timeit
    ...: QuandlEIASectorList = pd.DataFrame()
    ...: QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
    ...: QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
    ...: 
5.94 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [74]: %%timeit
    ...: (pd.Series(QuandlEIASector)
    ...:    .rename_axis('SectorDescription')
    ...:    .reset_index(name='SectorCode'))
    ...:                          
5.37 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [75]: %%timeit
    ...: (pd.Series(QuandlEIASector, name='SectorCode')
    ...:    .rename_axis('SectorDescription')
    ...:    .reset_index())
    ...:                          
5.34 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [76]: %%timeit
    ...: pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
    ...:               'SectorCode': list(QuandlEIASector.values())})
    ...:                                    
2.26 ms ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [77]: %%timeit
    ...: pd.DataFrame(list(QuandlEIASector.items()), 
    ...:              columns=['SectorDescription','SectorCode'])
    ...:                                    
3.15 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)