pandasql::sqldf 没有捕获循环变量

pandasql::sqldf not capturing looping variable

我试图用 pandasql::sqldf 遍历一个列表,但是这个 sqldf 似乎没有捕获循环变量。以下是我的问题的程式化概述:

import pandas as pd
from pandasql import sqldf
from datetime import datetime

FreqGamePlay = pd.DataFrame({'CONTACT_WID' : [1, 2, 3, 1, 4], 
                         'TITLE_NOMIN_DT' : pd.to_datetime(['20130102', '20140103', '20120518', 
                                        '20140317', '20111123']),
                        'FreqGamePlay' : [12, 9, 22, 4, 5]})
FreqGamePlay = FreqGamePlay[['CONTACT_WID', 'TITLE_NOMIN_DT', 'FreqGamePlay']]

periodsList = ['2012-12-26', '2012-02-28']
for i in periodsList:
    temp = sqldf("select CONTACT_WID, sum(FreqGamePlay) as FGP from FreqGamePlay where TITLE_NOMIN_DT > i group by CONTACT_WID;", globals())
    print(temp)

以上程序报错如下:

PandaSQLException: (sqlite3.OperationalError) no such column: i [SQL: 'select CONTACT_WID, sum(FreqGamePlay) as FGP from FreqGamePlay where TITLE_NOMIN_DT > i group by CONTACT_WID;']

但如果我手动硬编码日期,它就可以正常工作:

for i in periodsList:
    temp = sqldf("select CONTACT_WID, sum(FreqGamePlay) as FGP from FreqGamePlay where TITLE_NOMIN_DT > '2012-12-26' group by CONTACT_WID;", globals())
    print(temp)

但上面的代码效率不高,因为实际程序的日期列表要大得多。任何建议表示赞赏,谢谢

这是因为您将“i”变量直接包含在 SQL 字符串中,因此 Python 假定它是字符串的一部分并且变量不会被计算(您可以注意到在错误消息 i 变量未被其值替换)。我建议您阅读一些有关使用 Python 字符串和变量的内容。在那之前,试试这个:

for i in periodsList:
    query = "select CONTACT_WID, sum(FreqGamePlay) as FGP from FreqGamePlay where TITLE_NOMIN_DT > '{}' group by CONTACT_WID;".format(i)
    temp = sqldf(query, globals())

大括号用作变量的占位符,format() 方法用于用变量值替换占位符。