如何 select 使用 wb api 提取的最新值
How to select most recent value pulled using wb api
我目前有这个:
industry population
country date
Australia 2017-01-01 NaN NaN
2016-01-01 24.327571 18.898304
2015-01-01 25.396251 18.835267
2014-01-01 27.277007 18.834835
United States2017-01-01 NaN NaN
2016-01-01 NaN 19.028231
2015-01-01 20.027274 19.212860
2014-01-01 20.867359 19.379071
并希望 select 每个国家和列的最新值,以便返回最新的非空值:
industry population
Australia 24.327571 18.898304
United States 20.027274 19.028231
我知道我可以按国家/地区索引分组,它是包含国家/地区和日期的多级行业的一部分,但之后我不确定如何进行。
解决方案是将自定义函数与 bfill
和 iloc
一起用于 select 组中的第一行:
df = df.groupby(level=0).apply(lambda x: x.bfill().iloc[0])
print (df)
industry population
country
Australia 24.327571 18.898304
United States 20.027274 19.028231
groupby
+ first
for automatically removing first NaN
s, but in future this behaviour should be changed - it is bug 的解决方案:
df = df.groupby(level=0).first()
print (df)
industry population
country
Australia 24.327571 18.898304
United States 20.027274 19.028231
我目前有这个:
industry population
country date
Australia 2017-01-01 NaN NaN
2016-01-01 24.327571 18.898304
2015-01-01 25.396251 18.835267
2014-01-01 27.277007 18.834835
United States2017-01-01 NaN NaN
2016-01-01 NaN 19.028231
2015-01-01 20.027274 19.212860
2014-01-01 20.867359 19.379071
并希望 select 每个国家和列的最新值,以便返回最新的非空值:
industry population
Australia 24.327571 18.898304
United States 20.027274 19.028231
我知道我可以按国家/地区索引分组,它是包含国家/地区和日期的多级行业的一部分,但之后我不确定如何进行。
解决方案是将自定义函数与 bfill
和 iloc
一起用于 select 组中的第一行:
df = df.groupby(level=0).apply(lambda x: x.bfill().iloc[0])
print (df)
industry population
country
Australia 24.327571 18.898304
United States 20.027274 19.028231
groupby
+ first
for automatically removing first NaN
s, but in future this behaviour should be changed - it is bug 的解决方案:
df = df.groupby(level=0).first()
print (df)
industry population
country
Australia 24.327571 18.898304
United States 20.027274 19.028231