使用 python 在多个字符串中查找和计算子字符串的实例
Find and count instance of substring within multiple strings using python
我在 python 中有一个 sql 查询,其中包含多个子查询。所以设置是一个更大的字符串中的多个子字符串。我想检查子字符串中字符串的实例数。比我看到的更复杂一些,感谢您的帮助。
这样设置 -
qry = '''
with
qry_1 as (
SELECT ID,
NAME
FROM ( ... other code...
),
qry_2 as (
SELECT coalesce (table1.ID, table2.ID) as ID,
NAME
FROM (...other code...
),
qry_3 as (
SELECT id.WEATHER AS WEATHER_MORN,
ROW_NUMBER() OVER(PARTITION BY id.SUN
ORDER BY id.TIME) AS SUN_TIME,
id.RAIN,
id.MIST
FROM (...other code..
)
'''
我想在 qry_1, qry_2, qry_3
内计算 ID
的实例。
我认为可以利用 re.findall
然后进行子字符串搜索?
re.findall(r'as \( select (.+?) from \(',qry)
然后在其中查找和计算 ID
的实例?输出为 2。但我不确定如何...
您可以拆分 CTE 查询,然后在子查询的截断版本上使用 re.findall
:
qry = '''
with
qry_1 as (
SELECT ID,
NAME
FROM ( ... other code...
),
qry_2 as (
SELECT coalesce (table1.ID, table2.ID) as ID,
NAME
FROM (...other code...
),
qry_3 as (
SELECT WEATHER
FROM (...other code..
)
'''
def get_cols(s):
[cte_name] = re.findall('^\w+(?=\sas)|(?<=with\s)\w+(?=\sas)', s)
cols = re.findall('(?<=as\s)[\w\.]+|(?<=SELECT\s)[\w\.]+|(?<=,\s)[\w\.]+', s)
return [cte_name, cols]
#dictionary with the cte name as the key, and the columns as the values
v = dict(get_cols(re.sub('coalesce\s\(.+\)|[\s\n]+', ' ', i)) for i in re.split('(?<=\)),(?:\s+)*\n', qry))
#filter the dictionary above to only include desired column names
r = {a:k if (k:=[i for i in b if i in {'NAME', 'ID'}]) else None for a, b in v.items()}
输出:
{'qry_1': ['ID', 'NAME'], 'qry_2': ['ID', 'NAME'], 'qry_3': None}
我在 python 中有一个 sql 查询,其中包含多个子查询。所以设置是一个更大的字符串中的多个子字符串。我想检查子字符串中字符串的实例数。比我看到的更复杂一些,感谢您的帮助。
这样设置 -
qry = '''
with
qry_1 as (
SELECT ID,
NAME
FROM ( ... other code...
),
qry_2 as (
SELECT coalesce (table1.ID, table2.ID) as ID,
NAME
FROM (...other code...
),
qry_3 as (
SELECT id.WEATHER AS WEATHER_MORN,
ROW_NUMBER() OVER(PARTITION BY id.SUN
ORDER BY id.TIME) AS SUN_TIME,
id.RAIN,
id.MIST
FROM (...other code..
)
'''
我想在 qry_1, qry_2, qry_3
内计算 ID
的实例。
我认为可以利用 re.findall
然后进行子字符串搜索?
re.findall(r'as \( select (.+?) from \(',qry)
然后在其中查找和计算 ID
的实例?输出为 2。但我不确定如何...
您可以拆分 CTE 查询,然后在子查询的截断版本上使用 re.findall
:
qry = '''
with
qry_1 as (
SELECT ID,
NAME
FROM ( ... other code...
),
qry_2 as (
SELECT coalesce (table1.ID, table2.ID) as ID,
NAME
FROM (...other code...
),
qry_3 as (
SELECT WEATHER
FROM (...other code..
)
'''
def get_cols(s):
[cte_name] = re.findall('^\w+(?=\sas)|(?<=with\s)\w+(?=\sas)', s)
cols = re.findall('(?<=as\s)[\w\.]+|(?<=SELECT\s)[\w\.]+|(?<=,\s)[\w\.]+', s)
return [cte_name, cols]
#dictionary with the cte name as the key, and the columns as the values
v = dict(get_cols(re.sub('coalesce\s\(.+\)|[\s\n]+', ' ', i)) for i in re.split('(?<=\)),(?:\s+)*\n', qry))
#filter the dictionary above to only include desired column names
r = {a:k if (k:=[i for i in b if i in {'NAME', 'ID'}]) else None for a, b in v.items()}
输出:
{'qry_1': ['ID', 'NAME'], 'qry_2': ['ID', 'NAME'], 'qry_3': None}