如何 select 使用 wb api 提取的最新值

Question

我目前有这个：

                        industry    population
            country date        
Australia   2017-01-01  NaN         NaN
            2016-01-01  24.327571   18.898304
            2015-01-01  25.396251   18.835267
            2014-01-01  27.277007   18.834835
United States2017-01-01 NaN         NaN
            2016-01-01  NaN         19.028231
            2015-01-01  20.027274   19.212860
            2014-01-01  20.867359   19.379071

并希望 select 每个国家和列的最新值，以便返回最新的非空值：

                        industry    population

Australia              24.327571    18.898304

United States           20.027274   19.028231

我知道我可以按国家/地区索引分组，它是包含国家/地区和日期的多级行业的一部分，但之后我不确定如何进行。

Answer 1

解决方案是将自定义函数与 bfill 和 iloc 一起用于 select 组中的第一行：

df = df.groupby(level=0).apply(lambda x: x.bfill().iloc[0])
print (df)
                industry  population
country                             
Australia      24.327571   18.898304
United States  20.027274   19.028231

groupby + first for automatically removing first NaNs, but in future this behaviour should be changed - it is bug 的解决方案：

df = df.groupby(level=0).first()
print (df)
                industry  population
country                             
Australia      24.327571   18.898304
United States  20.027274   19.028231

如何 select 使用 wb api 提取的最新值

How to select most recent value pulled using wb api

indexing

dataframe

pandas

pandas-groupby