收集具有一定长度字母的单词

Question

有没有办法在 Python 中对给定字母长度的单词进行分组？

我开始研究这个功能：

lenght_words(a,b,text):
returnlist = []

在 return 列表中我想要长度为：

的单词

a <= 长度 <= b

所以我在想：

拆分文本行以使函数在文本的不同行上运行
删除行中的标点符号
如果一行中有长度合适的单词，该函数必须将它们放入 return 列表中，每个单词之间有一个 space（例如 'cat dog'），否则函数 put ''

我知道有 splitlines() 方法，但我不知道如何使用它（甚至在阅读之后）。

我想举例说明该函数的工作原理：

function(6,7,'All in the golden afternoon\nFull leisurely we glide;\nFor  both our oars, with little skill,\nBy little arms are plied.')

此函数应分隔行：

All in the golden afternoon

Full leisurely we glide;

For both our oars,

with little skill,

By little arms are plied.

--> 删除标点和return:

['golden','','little','little']

我知道我必须将单词附加到 return 列表中，但我不知道如何进行。

Answer 1

你可以像这样写一个列表理解：

[token for token in s.split(" ") if a <= len(token) <= b]

它将return 变量s (str) 中字符长度在a (int) 和b (int) 之间的所有单词。关于如何使用它的一个例子是

s = 'All in the golden afternoon\nFull leisurely we glide;'
s += '\nFor  both our oars, with little skill,\nBy little arms are plied.'
a = 6
b = 7
result = [token for token in s.split(" ") if a <= len(token) <= b]

结果为：

['golden', 'little', 'little', 'plied.']

要删除标点符号，只需添加

import string
s = "".join([char for char in s if char not in string.punctuation])

在最后一行之上。结果是：

['golden', 'little', 'little']

希望这对你有用！

编辑：

如果您想分别搜索不同的行，我会建议这样的解决方案：

import string


def split_by_line_and_find_words_with_length(min, max, s):
    #store result
    result = []

    # separate string lines
    lines = s.splitlines()

    for line in lines:
        # remove punctuation
        l = "".join([char for char in line if char not in string.punctuation])

        # find words with length between a and b
        find = [token for token in l.split(" ") if a <= len(token) <= b]

        # add empty string to result if no match
        if find == []: find.append("")

        # add any findings to result
        result += find

    return result

对于您的示例字符串和首选字长，这将 return ['golden', '', 'little', 'little'].

Answer 2

当您考虑范围时，您是在正确的轨道上。这是我编写函数的方式。

创建一个具有三个参数的函数：start 和 stop 用于范围，sentence 用于目标句子。
在函数内部，创建一个名为 word_list 的列表。
通过 .splitlines().
过滤掉您迭代的每一行中的所有标点符号。
然后您通过列表理解遍历当前行中的每个单词，并测试您遍历的每个单词是否在给定范围内：tmp = [word for word in line.split() if start <= len(word) <= stop]。将列表推导的结果分配给名为 tmp.
如果tmp的长度大于1
- 通过 space 连接 tmp 中的每个单词，并将连接的字符串添加到 word_list。
否则，如果tmp列表只有一个元素长
- 只需将其添加到 word_list
否则为空的话
- 将空字符串添加到 word_list
return word_list

使用上述步骤，我将如何编写您的函数：

# create a function with the parameters `start`, `stop` and `sentence`
# `start` and `stop` are for the range, and `sentence` is the
# target sentence to iterate over.
def group_words_by_length(start: int, stop: int, sentence: str) -> list:
    # import the string module so we can use its punctuation attribute.
    import string

    # create a list to hold words that
    # are in the given `start`-`stop` range
    word_list = []

    # iterate over each line in the sentence
    # using the string attribute `.splitlines()`
    # which splits the string at every new line
    for line in sentence.splitlines():

        # filter out punctuation from
        # every line.
        line = ''.join([char for char in line if char not in string.punctuation])

        # iterate over every word in each line
        # via list comprehension. Inside the list comprehension
        # we only add a word if is is in the given range.
        tmp = [word for word in line.split() if start <= len(word) <= stop]

        # if we found more than one valid word
        # in the current line...
        if len(tmp) > 1:

            # join each word in the
            # list by a space, and add
            # the joined string to the `word_list`.
            tmp = ' '.join(tmp)
            word_list.append(tmp)

        # if we found only
        # one valid word...
        elif len(tmp) == 1:

            # simply add the word
            # to the `word_list`.
            word_list.extend(tmp)

        # otherwise...
        else:
            # add an empty string to the
            # `word_list`.
            word_list.append("")

    # return the `word_list`
    return word_list

# testing of the function with
# your test string.
print(group_words_by_length(6, 7, 'All in the golden afternoon\nFull leisurely we glide;\nFor  both our oars, with little skill,\nBy little arms are plied.'))

输出：

['golden', '', 'little', 'little']

收集具有一定长度字母的单词

Gathering words with a set length of letters

python

python-3.5