收集具有一定长度字母的单词

Gathering words with a set length of letters

有没有办法在 Python 中对给定字母长度的单词进行分组?

我开始研究这个功能:

lenght_words(a,b,text):
returnlist = []

在 return 列表中我想要长度为:

的单词

a <= 长度 <= b

所以我在想:

  1. 拆分文本行以使函数在文本的不同行上运行
  2. 删除行中的标点符号
  3. 如果一行中有长度合适的单词,该函数必须将它们放入 return 列表中,每个单词之间有一个 space(例如 'cat dog'),否则函数 put ''

我知道有 splitlines() 方法,但我不知道如何使用它(甚至在阅读之后)。

我想举例说明该函数的工作原理:

function(6,7,'All in the golden afternoon\nFull leisurely we glide;\nFor  both our oars, with little skill,\nBy little arms are plied.')

此函数应分隔行:

All in the golden afternoon

Full leisurely we glide;

For both our oars,

with little skill,

By little arms are plied.

--> 删除标点和return:

['golden','','little','little']

我知道我必须将单词附加到 return 列表中,但我不知道如何进行。

你可以像这样写一个列表理解:

[token for token in s.split(" ") if a <= len(token) <= b]

它将return 变量s (str) 中字符长度在a (int) 和b (int) 之间的所有单词。关于如何使用它的一个例子是

s = 'All in the golden afternoon\nFull leisurely we glide;'
s += '\nFor  both our oars, with little skill,\nBy little arms are plied.'
a = 6
b = 7
result = [token for token in s.split(" ") if a <= len(token) <= b]

结果为:

['golden', 'little', 'little', 'plied.']

要删除标点符号,只需添加

import string
s = "".join([char for char in s if char not in string.punctuation])

在最后一行之上。结果是:

['golden', 'little', 'little']

希望这对你有用!

编辑:

如果您想分别搜索不同的行,我会建议这样的解决方案:

import string


def split_by_line_and_find_words_with_length(min, max, s):
    #store result
    result = []

    # separate string lines
    lines = s.splitlines()

    for line in lines:
        # remove punctuation
        l = "".join([char for char in line if char not in string.punctuation])

        # find words with length between a and b
        find = [token for token in l.split(" ") if a <= len(token) <= b]

        # add empty string to result if no match
        if find == []: find.append("")

        # add any findings to result
        result += find

    return result

对于您的示例字符串和首选字长,这将 return ['golden', '', 'little', 'little'].

当您考虑范围时,您是在正确的轨道上。这是我编写函数的方式。

  • 创建一个具有三个参数的函数:startstop 用于范围,sentence 用于目标句子。
  • 在函数内部,创建一个名为 word_list 的列表。
  • 通过 .splitlines().
  • 分割句子来遍历句子中的每一行
  • 过滤掉您迭代的每一行中的所有标点符号。
  • 然后您通过列表理解遍历当前行中的每个单词,并测试您遍历的每个单词是否在给定范围内:tmp = [word for word in line.split() if start <= len(word) <= stop]。将列表推导的结果分配给名为 tmp.
  • 的列表
  • 如果tmp的长度大于1
    • 通过 space 连接 tmp 中的每个单词,并将连接的字符串添加到 word_list
  • 否则,如果tmp列表只有一个元素长
    • 只需将其添加到 word_list
  • 否则为空的话
    • 将空字符串添加到 word_list
  • return word_list

使用上述步骤,我将如何编写您的函数:

# create a function with the parameters `start`, `stop` and `sentence`
# `start` and `stop` are for the range, and `sentence` is the
# target sentence to iterate over.
def group_words_by_length(start: int, stop: int, sentence: str) -> list:
    # import the string module so we can use its punctuation attribute.
    import string

    # create a list to hold words that
    # are in the given `start`-`stop` range
    word_list = []

    # iterate over each line in the sentence
    # using the string attribute `.splitlines()`
    # which splits the string at every new line
    for line in sentence.splitlines():

        # filter out punctuation from
        # every line.
        line = ''.join([char for char in line if char not in string.punctuation])

        # iterate over every word in each line
        # via list comprehension. Inside the list comprehension
        # we only add a word if is is in the given range.
        tmp = [word for word in line.split() if start <= len(word) <= stop]

        # if we found more than one valid word
        # in the current line...
        if len(tmp) > 1:

            # join each word in the
            # list by a space, and add
            # the joined string to the `word_list`.
            tmp = ' '.join(tmp)
            word_list.append(tmp)

        # if we found only
        # one valid word...
        elif len(tmp) == 1:

            # simply add the word
            # to the `word_list`.
            word_list.extend(tmp)

        # otherwise...
        else:
            # add an empty string to the
            # `word_list`.
            word_list.append("")

    # return the `word_list`
    return word_list

# testing of the function with
# your test string.
print(group_words_by_length(6, 7, 'All in the golden afternoon\nFull leisurely we glide;\nFor  both our oars, with little skill,\nBy little arms are plied.'))

输出:

['golden', '', 'little', 'little']