我如何通过 python 中的 RegEx 搜索获得完整的字符串,但只捕获部分单词?

How do i get the full strings with a RegEx search in python that only captures part of the word?

我的任务是搜索文档,并抓取包含 ch、cH、Ch、CH、sh、sH、Sh 和 SH 的词。抓住整个词最有效的方法是什么?现在使用 re.findall() 我得到了正确的单词数和位置,但我只能打印 ch 或 sh,而不是包含这些字母的整个单词。 这是我的代码!

import re

#f = open("dreamMLK.txt",'r')

with open("dreamMLK.txt",'r') as fp:
    line = fp.readline()
    count = 1
    while line:
        x = re.findall("ch|sh",line)
        if(len(x) > 0):
            print(x)
            print(str(count) +": "+line)
        line = fp.readline()
        count += 1

这是输出:

['sh']
3: Five score years ago, a great American, in whose symbolic shadow we stand today, signed the Emancipation Proclamation. This momentous decree came as a great beacon light of hope to millions of Negro slaves [Audience:] (Yeah) who had been seared in the flames of withering injustice. It came as a joyous daybreak to end the long night of their captivity. (Hmm)

['ch', 'sh', 'sh']
5: But one hundred years later (All right), the Negro still is not free. (My Lord, Yeah) One hundred years later, the life of the Negro is still sadly crippled by the manacles of segregation and the chains of discrimination. (Hmm) One hundred years later (All right), the Negro lives on a lonely island of poverty in the midst of a vast ocean of material prosperity. One hundred years later (My Lord) [applause], the Negro is still languished in the corners of American society and finds himself in exile in his own land. (Yes, yes) And so we’ve come here today to dramatize a shameful condition.

我希望第 3 行打印值 Shadow,而不是 'sh'。第 5 行打印 Chains、Languished 和 Shameful。如果有兴趣,这是逐字的作业:

打开文件并使用 while 循环读取每一行,使用正则表达式 (re.search()) 查找包含任何 lower/upper 大小写版本的行字符串 "ch" 或 sh”,即 {ch Ch cH CH sh sH Sh SH}。注意 - 不要在正则表达式中枚举所有 8 种可能性,相反,您的正则表达式应为 7 个字符长,包括 [ ] 字符。对于包含 "ch" 或 "sh"(或 Ch 或 CH 或 cH 等)的每个句子,打印出:a) 行号和句子;和 b) 中的单词列表该句子包含某些版本的 "sh" 或 "ch".

在不区分大小写的模式下,尝试使用以下正则表达式模式:

\b\S*[cs]h\S*\b

这将匹配所有包含 chsh 的单词。这是一个示例脚本:

# -*- coding: utf-8 -*-
inp = """3: Five score years ago, a great American, in whose symbolic shadow we stand today, signed the Emancipation Proclamation. This momentous decree came as a great beacon light of hope to millions of Negro slaves [Audience:] (Yeah) who had been seared in the flames of withering injustice. It came as a joyous daybreak to end the long night of their captivity.
5: But one hundred years later (All right), the Negro still is not free. (My Lord, Yeah) One hundred years later, the life of the Negro is still sadly crippled by the manacles of segregation and the chains of discrimination. (Hmm) One hundred years later (All right), the Negro lives on a lonely island of poverty in the midst of a vast ocean of material prosperity. One hundred years later (My Lord) [applause], the Negro is still languished in the corners of American society and finds himself in exile in his own land. (Yes, yes) And so we’ve come here today to dramatize a shameful condition."""

matches = re.findall(r'\b\S*[cs]h\S*\b', inp, flags=re.IGNORECASE)
print(matches)

这会打印:

['shadow', 'chains', 'languished', 'shameful']