搜索具有多个 AND 和 OR 条件的匹配内容
Search for matching content with several AND and OR conditions
我有一个 content="[...]" 变量列表 (str)。该变量必须至少匹配这些列表中的每个值之一(不区分大小写)。您对如何做到最好有什么建议吗?
react_terms = ["reactjs", "react.js", "react"] (OR condition)
AND
python_terms = ["python", "django"] (OR condition)
AND
cities_countries = ["london", "UK"] (OR condition)
我正在尝试的(不工作)
for content_str in content:
if content_str in any(react_terms) and any(python_terms) and any(cities_countries):
print(content_str, "match!")
数据示例
content = [
"Lorem Ipsum reactjs, python in London",
"Lorem Ipsum reactjs, python in United States",
"Lorem Ipsum Vue, python in London, UK",
]
结果
content[0]
匹配
content[1]
& content[2]
不匹配,因为:
content[1]
不匹配,因为它不包含任何 cities_countries
个字词
- 内容[2] 不匹配,因为它不包含任何
react_terms
初始响应
如果您希望 content_str
完全匹配三个列表中的任何项目,您可以使用:
if content_str.lower() in (react_terms + python_terms +cities_countries):
# Do stuff
any
功能将无法像您使用的那样工作。它将 return 一个布尔值。具体来说,True
如果参数中的任何项的计算结果为真表达式(反过来,str
是非空的)。因此,您编写的代码类似于:
if content_str in True and content_str in True and content_str in True:
#...
最后一条评论:如果您不打算动态更改列表中的项目,那么只构建一次“所有项目”列表会更有效:
ITEMS_TO_MATCH = react_terms + python_terms +cities_countries
if content_str.lower() in ITEMS_TO_MATCH:
# Do stuff
注意:我忽略了您尝试使用的 and
运算符,因为根据您提供的数据,三个列表中没有任何项目。如果您实际上计划在两个列表中都有项目,并且如果 content_str
在所有列表中,您想要做一些事情,只需重新计算 ITEMS_TO_MATCH
即可:
ITEMS_TO_MATCH = [item for item in react_terms if item in python_terms and item in cities_countries]
编辑
现在您提供了一些示例数据,我可以更清楚地了解您要做什么。这是满足您要求的脚本:
from typing import Iterable
CONTENT = [
"Lorem Ipsum reactjs, python in London",
"Lorem Ipsum reactjs, python in United States",
"Lorem Ipsum Vue, python in London, UK",
]
CITIES_COUNTRIES = ("london", "UK")
PYTHON_TERMS = ("python", "django")
REACT_TERMS = ("reactjs", "react.js", "react")
MATCHES = (CITIES_COUNTRIES, PYTHON_TERMS, REACT_TERMS)
def word_in_match(word: str, match: Iterable[str]) -> bool:
for word_to_match in match:
if word_to_match in word.lower():
return True
return False
def contains_items_from_all(str_to_match: str, matches: Iterable[Iterable[str]]) -> bool:
results = [False for _ in matches]
for word in str_to_match.split():
for i, match in enumerate(matches):
if not results[i]:
results[i] = word_in_match(word, match)
return all(results)
for str_to_match in CONTENT:
print(contains_items_from_all(str_to_match, MATCHES))
更有效的方法
def contains_item(str_to_match: str, match: Iterable[str]) -> bool:
for word_in_match in match:
if word_in_match in str_to_match:
return True
return False
def contains_items_from_all(str_to_match: str, matches: Iterable[Iterable[str]]) -> bool:
str_to_match = str_to_match.lower()
results = [False for _ in matches]
for i, match in enumerate(matches):
if contains_item(str_to_match, match):
results[i] = True
else:
return False
return all(results)
我有一个 content="[...]" 变量列表 (str)。该变量必须至少匹配这些列表中的每个值之一(不区分大小写)。您对如何做到最好有什么建议吗?
react_terms = ["reactjs", "react.js", "react"] (OR condition)
AND
python_terms = ["python", "django"] (OR condition)
AND
cities_countries = ["london", "UK"] (OR condition)
我正在尝试的(不工作)
for content_str in content:
if content_str in any(react_terms) and any(python_terms) and any(cities_countries):
print(content_str, "match!")
数据示例
content = [
"Lorem Ipsum reactjs, python in London",
"Lorem Ipsum reactjs, python in United States",
"Lorem Ipsum Vue, python in London, UK",
]
结果
content[0]
匹配
content[1]
& content[2]
不匹配,因为:
content[1]
不匹配,因为它不包含任何cities_countries
个字词- 内容[2] 不匹配,因为它不包含任何
react_terms
初始响应
如果您希望 content_str
完全匹配三个列表中的任何项目,您可以使用:
if content_str.lower() in (react_terms + python_terms +cities_countries):
# Do stuff
any
功能将无法像您使用的那样工作。它将 return 一个布尔值。具体来说,True
如果参数中的任何项的计算结果为真表达式(反过来,str
是非空的)。因此,您编写的代码类似于:
if content_str in True and content_str in True and content_str in True:
#...
最后一条评论:如果您不打算动态更改列表中的项目,那么只构建一次“所有项目”列表会更有效:
ITEMS_TO_MATCH = react_terms + python_terms +cities_countries
if content_str.lower() in ITEMS_TO_MATCH:
# Do stuff
注意:我忽略了您尝试使用的 and
运算符,因为根据您提供的数据,三个列表中没有任何项目。如果您实际上计划在两个列表中都有项目,并且如果 content_str
在所有列表中,您想要做一些事情,只需重新计算 ITEMS_TO_MATCH
即可:
ITEMS_TO_MATCH = [item for item in react_terms if item in python_terms and item in cities_countries]
编辑
现在您提供了一些示例数据,我可以更清楚地了解您要做什么。这是满足您要求的脚本:
from typing import Iterable
CONTENT = [
"Lorem Ipsum reactjs, python in London",
"Lorem Ipsum reactjs, python in United States",
"Lorem Ipsum Vue, python in London, UK",
]
CITIES_COUNTRIES = ("london", "UK")
PYTHON_TERMS = ("python", "django")
REACT_TERMS = ("reactjs", "react.js", "react")
MATCHES = (CITIES_COUNTRIES, PYTHON_TERMS, REACT_TERMS)
def word_in_match(word: str, match: Iterable[str]) -> bool:
for word_to_match in match:
if word_to_match in word.lower():
return True
return False
def contains_items_from_all(str_to_match: str, matches: Iterable[Iterable[str]]) -> bool:
results = [False for _ in matches]
for word in str_to_match.split():
for i, match in enumerate(matches):
if not results[i]:
results[i] = word_in_match(word, match)
return all(results)
for str_to_match in CONTENT:
print(contains_items_from_all(str_to_match, MATCHES))
更有效的方法
def contains_item(str_to_match: str, match: Iterable[str]) -> bool:
for word_in_match in match:
if word_in_match in str_to_match:
return True
return False
def contains_items_from_all(str_to_match: str, matches: Iterable[Iterable[str]]) -> bool:
str_to_match = str_to_match.lower()
results = [False for _ in matches]
for i, match in enumerate(matches):
if contains_item(str_to_match, match):
results[i] = True
else:
return False
return all(results)