为什么第 'Date' 列被最后一个工作日取代?

Why is Column 'Date' getting replaced by last working day?

我正在使用一个包含日期列的数据框,我必须找到每个月的最后一个工作日期,我使用的代码有效,但它的工作原理没有意义

数据框 'apple' 最初有 6 列,但我主要关注 'Date' 列,该列的日期范围为每个月的 2014-1980 年 示例数据:

    Date    Open    High    Low Close   Volume  Adj Close
0   2014-07-08  96.27   96.80   93.92   95.35   65130000    95.35
1   2014-07-07  94.14   95.99   94.10   95.97   56305400    95.97
2   2014-07-03  93.67   94.10   93.20   94.03   22891800    94.03
3   2014-07-02  93.87   94.06   93.09   93.48   28420900    93.48
4   2014-07-01  93.52   94.07   93.13   93.52   38170200    93.52
from pandas.tseries.offsets import MonthEnd
apple['Last_Day']=pd.to_datetime(apple['Date'],format="%Y-%m")+MonthEnd(0)
banana=apple.loc[-apple.Last_Day.duplicated()]

我原以为新创建的 'Last_Day' 列会有每个月的最后一天,但令人惊讶的是 'Date' 列有每个月的最后一个工作日,我不明白因为我没有将任何东西初始化为 'Date' 所以 'Date' 中的所有值是如何被上一个工作日替换的, 输出:

    Date        Open    High    Low     Close   Volume    Adj Close  Last_Day
0   2014-07-08  96.27   96.80   93.92   95.35   65130000    95.35   2014-07-31
5   2014-06-30  92.10   93.73   92.09   92.93   49482300    92.93   2014-06-30
26  2014-05-30  637.98  644.17  628.90  633.00  141005200   90.43   2014-05-31
47  2014-04-30  592.64  599.43  589.80  590.09  114160200   83.83   2014-04-30
68  2014-03-31  539.23  540.81  535.93  536.74  42167300    76.25   2014-03-31
89  2014-02-28  529.08  532.75  522.12  526.24  92992200    74.76   2014-02-28
108 2014-01-31  495.18  501.53  493.55  500.60  116199300   70.69   2014-01-31

No, my doubt is why is the Date column getting replaced by last working date, I do want the last working day but I did not understand how was the Date column replaced by last working day

没有替换,但每月和每年 Date 的最后一个值取决于删除重复项后 Date 列中的数据。

所以这里的最后一个值与 Last_Day 相同,除了 2014 年 7 月 - 每月的最后一天 2014-07-08

为了更好地理解更改的数据和排序 - 然后获取每个月的第一个值或每个月的最后一个值:

print (apple)
         Date   Open   High    Low  Close    Volume  Adj Close
0  2014-07-08  96.27  96.80  93.92  95.35  65130000      95.35
1  2014-06-07  94.14  95.99  94.10  95.97  56305400      95.97
2  2014-06-03  93.67  94.10  93.20  94.03  22891800      94.03
3  2014-05-31  93.87  94.06  93.09  93.48  28420900      93.48
4  2014-07-31  93.52  94.07  93.13  93.52  38170200      93.52

from pandas.tseries.offsets import MonthEnd

apple['Date']=pd.to_datetime(apple['Date'])
apple = apple.sort_values('Date')
print (apple)
        Date   Open   High    Low  Close    Volume  Adj Close
3 2014-05-31  93.87  94.06  93.09  93.48  28420900      93.48
2 2014-06-03  93.67  94.10  93.20  94.03  22891800      94.03
1 2014-06-07  94.14  95.99  94.10  95.97  56305400      95.97
0 2014-07-08  96.27  96.80  93.92  95.35  65130000      95.35
4 2014-07-31  93.52  94.07  93.13  93.52  38170200      93.52

apple['Last_Day']=apple['Date']+MonthEnd(0)
banana=apple.loc[-apple.Last_Day.duplicated()]
print (banana)
        Date   Open   High    Low  Close    Volume  Adj Close   Last_Day
3 2014-05-31  93.87  94.06  93.09  93.48  28420900      93.48 2014-05-31
2 2014-06-03  93.67  94.10  93.20  94.03  22891800      94.03 2014-06-30
0 2014-07-08  96.27  96.80  93.92  95.35  65130000      95.35 2014-07-31

from pandas.tseries.offsets import MonthEnd


apple['Date']=pd.to_datetime(apple['Date'])
apple1 = apple.sort_values('Date', ascending=False)
print (apple1)
        Date   Open   High    Low  Close    Volume  Adj Close
4 2014-07-31  93.52  94.07  93.13  93.52  38170200      93.52
0 2014-07-08  96.27  96.80  93.92  95.35  65130000      95.35
1 2014-06-07  94.14  95.99  94.10  95.97  56305400      95.97
2 2014-06-03  93.67  94.10  93.20  94.03  22891800      94.03
3 2014-05-31  93.87  94.06  93.09  93.48  28420900      93.48

apple1['Last_Day']=apple1['Date']+MonthEnd(0)
banana1=apple1.loc[-apple1.Last_Day.duplicated()]
print (banana1)
        Date   Open   High    Low  Close    Volume  Adj Close   Last_Day
4 2014-07-31  93.52  94.07  93.13  93.52  38170200      93.52 2014-07-31
1 2014-06-07  94.14  95.99  94.10  95.97  56305400      95.97 2014-06-30
3 2014-05-31  93.87  94.06  93.09  93.48  28420900      93.48 2014-05-31