在 pandas SQL 中使用日期范围 where 子句查询 returns 空数据框

Using date range where clause in pandas SQL query returns an empty dataframe

我在数据库中有一个 table,其中包含与跨越 10 年以上的交易相关的数百万行。由于将它们全部导入显然是一种浪费,因此我试图导入仅限于特定月份范围的数据子集。当我尝试使用下面的代码作为连接测试并导入前 1000 行时它工作正常,但是当我在 where 子句中指定日期范围时它 returns 一个空数据框。

如果我能得到任何帮助来纠正这个问题,我将不胜感激。提前致谢

import pyodbc
import pandas as pd
conn = pyodbc.connect('Driver={SQL Server};'\
'Server=NAME;'\
'Database=DBNAME;'\
'Trusted_Connection=yes;')

tquery = """SELECT TOP (1000) * FROM [SALES Transactions_V];"""
df = pd.read_sql_query(tquery, conn)
df.dtypes

输出:

DW_Id                            int64
Company                         object
Campaign Initiative             object
Closing Entry                     bool
Department Code                 object
Description                     object
Document No                     object
Document Type                    int64
Entry No                         int64
Expense Type                    object
GL Account No                   object
Incremental Field       datetime64[ns]
Posting Date            datetime64[ns]
Strategic Initiative            object
Vendor No                       object
Vendor Name                     object
Amount                         float64
GBP Amount                     float64
Actual per CWT                  object
DW_Batch                         int64
DW_SourceCode                   object
DW_TimeStamp            datetime64[ns]
dtype: object

df.head()

DW_Id   Company Campaign Initiative Closing Entry   Department Code Description Document No Document Type   Entry No    Expense Type    ... Posting Date    Strategic Initiative    Vendor No   Vendor Name Amount  GBP Amount  Actual per CWT  DW_Batch    DW_SourceCode   DW_TimeStamp
0   1   ABC Co.,LLC None    False       AGDATA INC. PMJ10000    1   1   None    ... 2007-02-27  None    None    None    -125.25 0.0 None    13726   Nav 2020-05-11 08:50:37.437
1   2   ABC Co.,LLC None    False       AGDATA INC. PMJ10000    1   2   None    ... 2007-02-27  None    AGD01   AGDATA, INC.    125.25  0.0 None    13726   Nav 2020-05-11 08:50:37.437
2   3   ABC Co.,LLC None    False       AGDATA INC. PMJ10000    1   3   None    ... 2007-02-27  None    AGD01   AGDATA, INC.    125.25  0.0 None    13726   Nav 2020-05-11 08:50:37.437

然而,当我使用以下代码过滤 04-01-2020 和 04-30-2020 之间的日期范围时,它会给我一个空数据框

df1 = pd.read_sql_query('SELECT * FROM [SALES Transactions_V] WHERE [Posting Date] BETWEEN ''2020-04-01'' AND ''2020-04-30'';', conn)

df1.dtypes

DW_Id                   object
Company                 object
Campaign Initiative     object
Closing Entry           object
Department Code         object
Description             object
Document No             object
Document Type           object
Entry No                object
Expense Type            object
GL Account No           object
Incremental Field       object
Posting Date            object
Strategic Initiative    object
Vendor No               object
Vendor Name             object
Amount                  object
GBP Amount              object
Actual per CWT          object
DW_Batch                object
DW_SourceCode           object
DW_TimeStamp            object
dtype: object

我相信 where 子句的日期范围是导致此问题的原因,但我无法找到解决此问题的解决方案,非常感谢您提供任何意见。谢谢!

考虑parameterization, the industry best practice, when passing values to an SQL query and is supported with pyobbc and pandas.read_sql_query。这样做可以避免转义引号和连接或插入文字值或变量。

sql = '''SELECT * FROM [SALES Transactions_V] 
         WHERE [Posting Date] BETWEEN ? AND ?;
      '''

df1 = pd.read_sql_query(sql, conn, params=['2020-04-01', '2020-04-30'])

或按日期部分:

sql = '''SELECT * FROM [SALES Transactions_V] 
         WHERE YEAR([Posting Date]) = ? 
           AND MONTH([Posting Date]) = ?;
      '''

df1 = pd.read_sql_query(sql, conn, params=[2020, 4])