如何在 python Pandas 中按日期提取数据

How to extract Data by date in python Pandas

我正在尝试按特定日期(例如 06/20/2021 - 06/30/2021)提取数据。现在它读取 CSV 文件,按日期对数据进行排序,并找到任何重复项。下一步是按日期时间范围提取所有数据,我想知道我该怎么做。非常感谢任何帮助:)。这是我下面的内容:

import pandas as pd
from datetime import date, timedelta

#df = pd.read_excel(r"/Users/britevoxops2/Desktop/sample_date.xlsx") #reading Excel File

df = pd.read_csv(r"/Users/filename/Desktop/sample_date.csv")
print(df) #print original data
df.head()

Final_result = df.sort_values('Joining Date') #sorting date
print(Final_result)

duplicate = df[df['Name'].duplicated() == True] #finding duplicate name
print('Here are the Duplicates: \n',duplicate) 


 
df.loc['2021-06-20' : '2021-06-30']

您可以将其转换为 pandas 日期时间格式并使用

提取所需的日期

df_date = df[(df['Joining Date'] < '23-03-21') & (df['Joining Date'] > '03-03-21')]

下面应该适合你。

示例数据帧:

>>> df
          Date
0   06/10/2021
1   06/11/2021
2   06/12/2021
3   06/13/2021
4   06/14/2021
5   06/15/2021
6   06/16/2021
7   06/17/2021
8   06/18/2021
9   06/19/2021
10  06/20/2021
11  06/21/2021
12  06/22/2021
13  06/23/2021
14  06/24/2021
15  06/25/2021
16  06/26/2021
17  06/27/2021
18  06/28/2021
19  06/29/2021
20  06/30/2021

Date 列转换为 datetime 甲酸盐:

>>> df['Date'] = pd.to_datetime(df['Date'])
>>> df
         Date
0  2021-06-10
1  2021-06-11
2  2021-06-12
3  2021-06-13
4  2021-06-14
5  2021-06-15
6  2021-06-16
7  2021-06-17
8  2021-06-18
9  2021-06-19
10 2021-06-20
11 2021-06-21
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30

现在select您想要的日期范围:

>>> df[(df['Date'] > '06/21/2021') & (df['Date'] <= '06/30/2021')]
         Date
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30

另一种方法是使用布尔掩码,然后使用 df.loc[mask]

>>> mask = (df['Date'] > '06/21/2021') & (df['Date'] <= '06/30/2021')

>>> print(df.loc[mask])
         Date
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30

第三种方法:

使用pandas.Series.between

>>> df[df.Date.between("06/21/2021", "06/30/2021")]
         Date
11 2021-06-21
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30
# df[df['Date'].between("06/21/2021", "06/30/2021")]
# df.loc[df['Date'].between('06/21/2021','06/30/2021', inclusive=True)] <-- You can use `inclusive` with True or False.

使用 df.query,您可以通过在环境中使用“@”字符作为前缀来引用变量,如下所示。

>>> start_date, end_date = "06/21/2021", "06/30/2021"

>>> print(df.query('Date >= @start_date and Date <= @end_date'))
         Date
11 2021-06-21
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30