搜索具有多个 AND 和 OR 条件的匹配内容

Search for matching content with several AND and OR conditions

我有一个 content="[...]" 变量列表 (str)。该变量必须至少匹配这些列表中的每个值之一(不区分大小写)。您对如何做到最好有什么建议吗?

react_terms = ["reactjs", "react.js", "react"] (OR condition)

AND

python_terms = ["python", "django"] (OR condition)

AND

cities_countries = ["london", "UK"] (OR condition)

我正在尝试的(不工作)

for content_str in content:
    if content_str in any(react_terms) and any(python_terms) and any(cities_countries):
        print(content_str, "match!")

数据示例

content = [
    "Lorem Ipsum reactjs, python in London",
    "Lorem Ipsum reactjs, python in United States",
    "Lorem Ipsum Vue, python in London, UK",
]

结果

content[1] & content[2] 不匹配,因为:

初始响应

如果您希望 content_str 完全匹配三个列表中的任何项目,您可以使用:

if content_str.lower() in (react_terms + python_terms +cities_countries):
  # Do stuff

any 功能将无法像您使用的那样工作。它将 return 一个布尔值。具体来说,True 如果参数中的任何项的计算结果为真表达式(反过来,str 是非空的)。因此,您编写的代码类似于:

if content_str in True and content_str in True and content_str in True:
  #...

最后一条评论:如果您不打算动态更改列表中的项目,那么只构建一次“所有项目”列表会更有效:

ITEMS_TO_MATCH = react_terms + python_terms +cities_countries
if content_str.lower() in ITEMS_TO_MATCH:
  # Do stuff

注意:我忽略了您尝试使用的 and 运算符,因为根据您提供的数据,三个列表中没有任何项目。如果您实际上计划在两个列表中都有项目,并且如果 content_str 在所有列表中,您想要做一些事情,只需重新计算 ITEMS_TO_MATCH 即可:

ITEMS_TO_MATCH = [item for item in react_terms if item in python_terms and item in cities_countries]

编辑

现在您提供了一些示例数据,我可以更清楚地了解您要做什么。这是满足您要求的脚本:

from typing import Iterable

CONTENT = [
    "Lorem Ipsum reactjs, python in London",
    "Lorem Ipsum reactjs, python in United States",
    "Lorem Ipsum Vue, python in London, UK",
]

CITIES_COUNTRIES = ("london", "UK")
PYTHON_TERMS = ("python", "django")
REACT_TERMS = ("reactjs", "react.js", "react")
MATCHES = (CITIES_COUNTRIES, PYTHON_TERMS, REACT_TERMS)


def word_in_match(word: str, match: Iterable[str]) -> bool:
    for word_to_match in match:
        if word_to_match in word.lower():
            return True
    return False


def contains_items_from_all(str_to_match: str, matches: Iterable[Iterable[str]]) -> bool:
    results = [False for _ in matches]
    for word in str_to_match.split():
        for i, match in enumerate(matches):
            if not results[i]:
                results[i] = word_in_match(word, match)
    return all(results)


for str_to_match in CONTENT:
    print(contains_items_from_all(str_to_match, MATCHES))

更有效的方法

def contains_item(str_to_match: str, match: Iterable[str]) -> bool:
    for word_in_match in match:
        if word_in_match in str_to_match:
            return True
    return False


def contains_items_from_all(str_to_match: str, matches: Iterable[Iterable[str]]) -> bool:
    str_to_match = str_to_match.lower()
    results = [False for _ in matches]
    for i, match in enumerate(matches):
        if contains_item(str_to_match, match):
            results[i] = True
        else:
            return False
    return all(results)