根据标签的索引获取数据框的子集

Question

我有一个来自雅虎财经的数据框

import pandas as pd
import yfinance
ticker = yfinance.Ticker("INFY.NS")
df = ticker.history(period = '1y')
print(df)

这给了我 df 作为，

如果我指定，

date = "2021-04-23"

我需要 df 的一个子集，其行的索引标签为“2021-04-23”
日期前 2 天的行
日期后 1 天的行

这里重要的是，我们不能在使用日期字符串之前和之后计算，因为 df 可能没有一些日期，而是根据索引打印行。（即 2 行上一个索引和一行下一个索引）比如df里面没有“2021-04-21”而是“2021-04-20”

我们如何实施？

Answer 1

如果需要前后值的位置（如果 DatetimeIndex 中始终存在 date），请使用 DataFrame.iloc with position by Index.get_loc 和 min，max 用于 select 行，如果不存在 2 之前或 1 之后的值，如样本数据：

df = pd.DataFrame({'a':[1,2,3]}, 
                   index=pd.to_datetime(['2021-04-21','2021-04-23','2021-04-25']))

date = "2021-04-23"
pos = df.index.get_loc(date)
df = df.iloc[max(0, pos-2):min(len(df), pos+2)]
print (df)
            a
2021-04-21  1
2021-04-23  2
2021-04-25  3

注意事项： min 和 max 添加为未失败 selecting 如果日期是第一个（之前不存在 2 个值，或者第二个 - 之前不存在第二个值）或最后一个（之后不存在值）

Answer 2

您可以选择基于整数的索引。首先找到所需 date 的整数位置，然后使用 iloc:

获取所需的子集

def get_subset(df, date):
    # get the integer index of the matching date(s)
    matching_dates_inds, = np.nonzero(df.index == date)
    
    # and take the first one (works in case of duplicates)
    first_matching_date_ind = matching_dates_inds[0]
    
    # take the 4-element subset
    desired_subset = df.iloc[first_matching_date_ind - 2: first_matching_date_ind + 2]

    return desired_subset

根据标签的索引获取数据框的子集

To get subset of dataframe based on index of a label

python

dataframe

python-3.x

pandas

data-science