在 Pandas Dataframe 上执行 SQL 并将结果存储在同一个 Dataframe 中
Executing SQL on Pandas Dataframe and storing results in same Dataframe
我有一个类似于上图的数据框。我想要做的是循环遍历 SQL_SCRIPT 下的 SQL 语句,执行它们,并将结果存储在下一列中,该列将被称为 'RESULTS'。当我尝试执行它(不将其存储在任何地方)时,它运行良好,但是当我尝试将结果存储在新的数据框列中时,它会出错:
ValueError: cannot set a row with mismatched columns
代码如下:
def run_tests(self):
s = self.connection()
df = self.retrieve_sql()
df_type = df.loc[df['STEP_TYPE'] == 'T']
df_to_list = df_type[['TABLE_NM', 'TEST_TABLE_NM', 'SQL_SCRIPT']]
print(df_to_list)
for sql_script in df_to_list['SQL_SCRIPT']:
df_to_list.loc['RESULTS'] = pd.read_sql(sql_script,s)
print(df_to_list)
我也试过只使用会话执行,而不是 read_sql,这也有效,但我不确定如何将结果存储到该路径的数据帧中:
def run_tests(self):
s = self.connection()
df = self.retrieve_sql()
df_type = df.loc[df['STEP_TYPE'] == 'T']
df_to_list = df_type[['TABLE_NM', 'TEST_TABLE_NM', 'SQL_SCRIPT']]
print(df_to_list)
for sql_script in df_to_list['SQL_SCRIPT']:
s.execute(sql_script)
这里是连接函数,如果需要的话:
def connection(self):
con = self.load_json_file()
cfg_dsn = con['config']['dsn']
cfg_usr = con['config']['username']
cfg_pwd = con['config']['password']
udaExec = teradata.UdaExec(appName="DataAnalysis", version="1.0", logConsole=False)
session = udaExec.connect(method="odbc", dsn=cfg_dsn, username=cfg_usr, password=cfg_pwd)
return session
考虑 运行 Series.apply
在 SQL 个字符串的列上。
def run_tests(self):
s = self.connection()
c = s.cursor() # OPEN CURSOR
df = self.retrieve_sql()
df_type = df.loc[df['STEP_TYPE'] == 'T']
df_to_list = df_type[['TABLE_NM', 'TEST_TABLE_NM', 'SQL_SCRIPT']]
print(df_to_list)
# NEW METHOD TO RUN QUERY
def sql_run(x):
c.execute(x)
if c.rowcount > 0:
res = c.fetchone()[0]
else:
res = np.nan
return res
df_to_list['RESULTS'] = df_to_list['SQL_SCRIPT'].apply(sql_run)
print(df_to_list)
我有一个类似于上图的数据框。我想要做的是循环遍历 SQL_SCRIPT 下的 SQL 语句,执行它们,并将结果存储在下一列中,该列将被称为 'RESULTS'。当我尝试执行它(不将其存储在任何地方)时,它运行良好,但是当我尝试将结果存储在新的数据框列中时,它会出错:
ValueError: cannot set a row with mismatched columns
代码如下:
def run_tests(self):
s = self.connection()
df = self.retrieve_sql()
df_type = df.loc[df['STEP_TYPE'] == 'T']
df_to_list = df_type[['TABLE_NM', 'TEST_TABLE_NM', 'SQL_SCRIPT']]
print(df_to_list)
for sql_script in df_to_list['SQL_SCRIPT']:
df_to_list.loc['RESULTS'] = pd.read_sql(sql_script,s)
print(df_to_list)
我也试过只使用会话执行,而不是 read_sql,这也有效,但我不确定如何将结果存储到该路径的数据帧中:
def run_tests(self):
s = self.connection()
df = self.retrieve_sql()
df_type = df.loc[df['STEP_TYPE'] == 'T']
df_to_list = df_type[['TABLE_NM', 'TEST_TABLE_NM', 'SQL_SCRIPT']]
print(df_to_list)
for sql_script in df_to_list['SQL_SCRIPT']:
s.execute(sql_script)
这里是连接函数,如果需要的话:
def connection(self):
con = self.load_json_file()
cfg_dsn = con['config']['dsn']
cfg_usr = con['config']['username']
cfg_pwd = con['config']['password']
udaExec = teradata.UdaExec(appName="DataAnalysis", version="1.0", logConsole=False)
session = udaExec.connect(method="odbc", dsn=cfg_dsn, username=cfg_usr, password=cfg_pwd)
return session
考虑 运行 Series.apply
在 SQL 个字符串的列上。
def run_tests(self):
s = self.connection()
c = s.cursor() # OPEN CURSOR
df = self.retrieve_sql()
df_type = df.loc[df['STEP_TYPE'] == 'T']
df_to_list = df_type[['TABLE_NM', 'TEST_TABLE_NM', 'SQL_SCRIPT']]
print(df_to_list)
# NEW METHOD TO RUN QUERY
def sql_run(x):
c.execute(x)
if c.rowcount > 0:
res = c.fetchone()[0]
else:
res = np.nan
return res
df_to_list['RESULTS'] = df_to_list['SQL_SCRIPT'].apply(sql_run)
print(df_to_list)