Pandas (drop_duplicates) 中的多索引只保留单个索引的最后一个值
Keep only Last value of a single index in multi-index in Pandas (drop_duplicates)
如何获取多索引数据框的最后日期。
我已将我的 df 简化为以下内容:
Dates =['01/10/2017', '28/10/2018', '20/10/2019', '27/10/2019', '30/10/2019']
cols = ['Date', 'P1', 'P2', 'P3']
ProductIDs = [1, 1, 1, 3, 5]
df= pd.DataFrame(index = ProductIDs, columns= cols)
df.index.name = 'ProductIDs'
df.Date = Dates
df = df.reset_index().set_index(['ProductIDs', 'Date'])
df[:] = np.random.randint(0,20, size=(5,3))
df
P1 P2 P3
ProductIDs Date
1 01/10/2017 3 2 2
28/10/2018 1 4 9
20/10/2019 3 14 3
3 27/10/2019 3 1 7
5 30/10/2019 2 13 4
df.groupby(level=[0]).last()
给我想要的结果,不包括日期。我怎样才能同时看到日期?
如何获得:
P1 P2 P3
ProductIDs Date
1 20/10/2019 3 14 3
3 27/10/2019 3 1 7
5 30/10/2019 2 13 4
首先通过get_level_values
, check duplicates by duplicated
and invert condition by ~
. Last filter by boolean indexing
提取第一层的值:
df1 = df[~df.index.get_level_values(0).duplicated(keep='last')]
print (df1)
P1 P2 P3
ProductIDs Date
1 28/10/2018 19 0 8
3 27/10/2019 16 2 3
5 30/10/2019 14 6 8
详情:
print (df.index.get_level_values(0))
Int64Index([1, 1, 3, 5], dtype='int64', name='ProductIDs')
print (df.index.get_level_values(0).duplicated(keep='last'))
[ True False False False]
print (~df.index.get_level_values(0).duplicated(keep='last'))
[False True True True]
如何获取多索引数据框的最后日期。
我已将我的 df 简化为以下内容:
Dates =['01/10/2017', '28/10/2018', '20/10/2019', '27/10/2019', '30/10/2019']
cols = ['Date', 'P1', 'P2', 'P3']
ProductIDs = [1, 1, 1, 3, 5]
df= pd.DataFrame(index = ProductIDs, columns= cols)
df.index.name = 'ProductIDs'
df.Date = Dates
df = df.reset_index().set_index(['ProductIDs', 'Date'])
df[:] = np.random.randint(0,20, size=(5,3))
df
P1 P2 P3
ProductIDs Date
1 01/10/2017 3 2 2
28/10/2018 1 4 9
20/10/2019 3 14 3
3 27/10/2019 3 1 7
5 30/10/2019 2 13 4
df.groupby(level=[0]).last()
给我想要的结果,不包括日期。我怎样才能同时看到日期?
如何获得:
P1 P2 P3
ProductIDs Date
1 20/10/2019 3 14 3
3 27/10/2019 3 1 7
5 30/10/2019 2 13 4
首先通过get_level_values
, check duplicates by duplicated
and invert condition by ~
. Last filter by boolean indexing
提取第一层的值:
df1 = df[~df.index.get_level_values(0).duplicated(keep='last')]
print (df1)
P1 P2 P3
ProductIDs Date
1 28/10/2018 19 0 8
3 27/10/2019 16 2 3
5 30/10/2019 14 6 8
详情:
print (df.index.get_level_values(0))
Int64Index([1, 1, 3, 5], dtype='int64', name='ProductIDs')
print (df.index.get_level_values(0).duplicated(keep='last'))
[ True False False False]
print (~df.index.get_level_values(0).duplicated(keep='last'))
[False True True True]