在 pandas SQL 中使用日期范围 where 子句查询 returns 空数据框
Using date range where clause in pandas SQL query returns an empty dataframe
我在数据库中有一个 table,其中包含与跨越 10 年以上的交易相关的数百万行。由于将它们全部导入显然是一种浪费,因此我试图导入仅限于特定月份范围的数据子集。当我尝试使用下面的代码作为连接测试并导入前 1000 行时它工作正常,但是当我在 where 子句中指定日期范围时它 returns 一个空数据框。
如果我能得到任何帮助来纠正这个问题,我将不胜感激。提前致谢
import pyodbc
import pandas as pd
conn = pyodbc.connect('Driver={SQL Server};'\
'Server=NAME;'\
'Database=DBNAME;'\
'Trusted_Connection=yes;')
tquery = """SELECT TOP (1000) * FROM [SALES Transactions_V];"""
df = pd.read_sql_query(tquery, conn)
df.dtypes
输出:
DW_Id int64
Company object
Campaign Initiative object
Closing Entry bool
Department Code object
Description object
Document No object
Document Type int64
Entry No int64
Expense Type object
GL Account No object
Incremental Field datetime64[ns]
Posting Date datetime64[ns]
Strategic Initiative object
Vendor No object
Vendor Name object
Amount float64
GBP Amount float64
Actual per CWT object
DW_Batch int64
DW_SourceCode object
DW_TimeStamp datetime64[ns]
dtype: object
df.head()
DW_Id Company Campaign Initiative Closing Entry Department Code Description Document No Document Type Entry No Expense Type ... Posting Date Strategic Initiative Vendor No Vendor Name Amount GBP Amount Actual per CWT DW_Batch DW_SourceCode DW_TimeStamp
0 1 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 1 None ... 2007-02-27 None None None -125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437
1 2 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 2 None ... 2007-02-27 None AGD01 AGDATA, INC. 125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437
2 3 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 3 None ... 2007-02-27 None AGD01 AGDATA, INC. 125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437
然而,当我使用以下代码过滤 04-01-2020 和 04-30-2020 之间的日期范围时,它会给我一个空数据框
df1 = pd.read_sql_query('SELECT * FROM [SALES Transactions_V] WHERE [Posting Date] BETWEEN ''2020-04-01'' AND ''2020-04-30'';', conn)
df1.dtypes
DW_Id object
Company object
Campaign Initiative object
Closing Entry object
Department Code object
Description object
Document No object
Document Type object
Entry No object
Expense Type object
GL Account No object
Incremental Field object
Posting Date object
Strategic Initiative object
Vendor No object
Vendor Name object
Amount object
GBP Amount object
Actual per CWT object
DW_Batch object
DW_SourceCode object
DW_TimeStamp object
dtype: object
我相信 where 子句的日期范围是导致此问题的原因,但我无法找到解决此问题的解决方案,非常感谢您提供任何意见。谢谢!
考虑parameterization, the industry best practice, when passing values to an SQL query and is supported with pyobbc
and pandas.read_sql_query
。这样做可以避免转义引号和连接或插入文字值或变量。
sql = '''SELECT * FROM [SALES Transactions_V]
WHERE [Posting Date] BETWEEN ? AND ?;
'''
df1 = pd.read_sql_query(sql, conn, params=['2020-04-01', '2020-04-30'])
或按日期部分:
sql = '''SELECT * FROM [SALES Transactions_V]
WHERE YEAR([Posting Date]) = ?
AND MONTH([Posting Date]) = ?;
'''
df1 = pd.read_sql_query(sql, conn, params=[2020, 4])
我在数据库中有一个 table,其中包含与跨越 10 年以上的交易相关的数百万行。由于将它们全部导入显然是一种浪费,因此我试图导入仅限于特定月份范围的数据子集。当我尝试使用下面的代码作为连接测试并导入前 1000 行时它工作正常,但是当我在 where 子句中指定日期范围时它 returns 一个空数据框。
如果我能得到任何帮助来纠正这个问题,我将不胜感激。提前致谢
import pyodbc
import pandas as pd
conn = pyodbc.connect('Driver={SQL Server};'\
'Server=NAME;'\
'Database=DBNAME;'\
'Trusted_Connection=yes;')
tquery = """SELECT TOP (1000) * FROM [SALES Transactions_V];"""
df = pd.read_sql_query(tquery, conn)
df.dtypes
输出:
DW_Id int64
Company object
Campaign Initiative object
Closing Entry bool
Department Code object
Description object
Document No object
Document Type int64
Entry No int64
Expense Type object
GL Account No object
Incremental Field datetime64[ns]
Posting Date datetime64[ns]
Strategic Initiative object
Vendor No object
Vendor Name object
Amount float64
GBP Amount float64
Actual per CWT object
DW_Batch int64
DW_SourceCode object
DW_TimeStamp datetime64[ns]
dtype: object
df.head()
DW_Id Company Campaign Initiative Closing Entry Department Code Description Document No Document Type Entry No Expense Type ... Posting Date Strategic Initiative Vendor No Vendor Name Amount GBP Amount Actual per CWT DW_Batch DW_SourceCode DW_TimeStamp
0 1 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 1 None ... 2007-02-27 None None None -125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437
1 2 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 2 None ... 2007-02-27 None AGD01 AGDATA, INC. 125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437
2 3 ABC Co.,LLC None False AGDATA INC. PMJ10000 1 3 None ... 2007-02-27 None AGD01 AGDATA, INC. 125.25 0.0 None 13726 Nav 2020-05-11 08:50:37.437
然而,当我使用以下代码过滤 04-01-2020 和 04-30-2020 之间的日期范围时,它会给我一个空数据框
df1 = pd.read_sql_query('SELECT * FROM [SALES Transactions_V] WHERE [Posting Date] BETWEEN ''2020-04-01'' AND ''2020-04-30'';', conn)
df1.dtypes
DW_Id object
Company object
Campaign Initiative object
Closing Entry object
Department Code object
Description object
Document No object
Document Type object
Entry No object
Expense Type object
GL Account No object
Incremental Field object
Posting Date object
Strategic Initiative object
Vendor No object
Vendor Name object
Amount object
GBP Amount object
Actual per CWT object
DW_Batch object
DW_SourceCode object
DW_TimeStamp object
dtype: object
我相信 where 子句的日期范围是导致此问题的原因,但我无法找到解决此问题的解决方案,非常感谢您提供任何意见。谢谢!
考虑parameterization, the industry best practice, when passing values to an SQL query and is supported with pyobbc
and pandas.read_sql_query
。这样做可以避免转义引号和连接或插入文字值或变量。
sql = '''SELECT * FROM [SALES Transactions_V]
WHERE [Posting Date] BETWEEN ? AND ?;
'''
df1 = pd.read_sql_query(sql, conn, params=['2020-04-01', '2020-04-30'])
或按日期部分:
sql = '''SELECT * FROM [SALES Transactions_V]
WHERE YEAR([Posting Date]) = ?
AND MONTH([Posting Date]) = ?;
'''
df1 = pd.read_sql_query(sql, conn, params=[2020, 4])