如何迭代 python 列表并比较字符串或另一个列表中的项目

Question

根据我之前的问题，如果某个列表中的搜索词在要按如下方式 return 的字符串中，我尝试编写代码 return 字符串。

import re
from nltk import tokenize
from nltk.tokenize import sent_tokenize
def foo():
    List1 = ['risk','cancer','ocp','hormone','OCP',]
    txt = "Risk factors for breast cancer have been well characterized. Breast cancer is 100 times more frequent in women than in men.\
    Factors associated with an increased exposure to estrogen have also been elucidated including early menarche, late menopause, later age\
    at first pregnancy, or nulliparity. The use of hormone replacement therapy has been confirmed as a risk factor, although mostly limited to \
    the combined use of estrogen and progesterone, as demonstrated in the WHI (2). Analysis showed that the risk of breast cancer among women using \
    estrogen and progesterone was increased by 24% compared to placebo. A separate arm of the WHI randomized women with a prior hysterectomy to \
    conjugated equine estrogen (CEE) versus placebo, and in that study, the use of CEE was not associated with an increased risk of breast cancer (3).\
    Unlike hormone replacement therapy, there is no evidence that oral contraceptive (OCP) use increases risk. A large population-based case-control study \
    examining the risk of breast cancer among women who previously used or were currently using OCPs included over 9,000 women aged 35 to 64 \
    (half of whom had breast cancer) (4). The reported relative risk was 1.0 (95% CI, 0.8 to 1.3) among women currently using OCPs and 0.9 \
    (95% CI, 0.8 to 1.0) among prior users. In addition, neither race nor family history was associated with a greater risk of breast cancer among OCP users."
    words = txt
    corpus = " ".join(words).lower()
    sentences1 = sent_tokenize(corpus)
    a = [" ".join([sentences1[i-1],j]) for i,j in enumerate(sentences1) if [item in List1] in word_tokenize(j)]   


    for i in a:
        print i,'\n','\n'

foo()

问题是 python IDLE 不打印任何东西。我做错了什么。它所做的是运行代码，我得到了这个

> >

Answer 1

你的问题对我来说不是很清楚所以如果我理解错了请纠正我。您是否尝试将关键字列表（在 list1 中）与文本（在 txt 中）进行匹配？也就是说，

对于列表 1 中的每个关键字
对 txt 中的每个句子进行匹配。
如果匹配则打印句子？

我没有编写复杂的正则表达式来解决您的问题，而是将其分解为两部分。

首先，我将所有文本分成一个句子列表。然后编写简单的正则表达式来遍历每个句子。这种方法的问题是它不是很有效，但是嘿它解决了你的问题。

希望这一小段代码可以帮助您找到真正的解决方案。

def foo():
    List1 = ['risk','cancer','ocp','hormone','OCP',]
    txt = "blah blah blah - truncated"
    words = txt

    matches = []
    sentences = re.split(r'\.', txt)
    keyword = List1[0]
    pattern = keyword 
    re.compile(pattern)

    for sentence in sentences:
        if re.search(pattern, sentence):
            matches.append(sentence)

    print("Sentence matching the word (" + keyword + "):")
    for match in matches:
        print (match)

------------生成随机数-----

from random import randint

List1 = ['risk','cancer','ocp','hormone','OCP',]
print(randint(0, len(List1) - 1)) # gives u random index - use index to access List1

如何迭代 python 列表并比较字符串或另一个列表中的项目

How to iterate a python list and compare items in a string or another list

nltk

python-2.7