在没有标点符号的 .txt 文件中查找最长的单词

Question

我正在做 Python 文件 I/O 练习，尽管我在尝试找到 .txt 文件每一行中最长的单词的练习中取得了巨大进步，我无法摆脱 标点符号 .

这是我的代码：

with open("original-3.txt", 'r') as file1:
lines = file1.readlines()
for line in lines:
    if not line == "\n":
        print(max(line.split(), key=len))

This is the output I get

这是我从中读取数据的original-3.txt文件

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.

"Beware the Jabberwock, my son!
The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
The frumious Bandersnatch!"

He took his vorpal sword in hand:
Long time the manxome foe he sought,
So rested he by the Tumtum tree,
And stood a while in thought.

And, as in uffish thought he stood,
The Jabberwock, with eyes of flame,
Came whiffling through the tulgey wood,
And burbled as it came!

One two! One two! And through and through
The vorpal blade went snicker-snack!
He left it dead, and with its head
He went galumphing back.

"And hast thou slain the Jabberwock?
Come to my arms, my beamish boy!"
"Oh frabjous day! Callooh! Callay!"
He chortled in his joy.

'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe:
All mimsy were the borogoves,
And the mome raths outgrabe.

如您所见，我得到的标点符号如 ["," ";" "?" "!"]

你觉得我怎么只能得到自己的话？

谢谢

Answer 1

你必须 strip 单词中的那些字符：

with open("original-3.txt", 'r') as file1:
    lines = file1.readlines()
for line in lines:
    if not line == "\n":
        print(max(word.strip(",?;!\"") for word in line.split()), key=len))

或者你使用正则表达式来提取所有看起来像单词的东西（即由字母组成）：

import re


for line in lines: 
    words = re.findall(r"\w+", line) 
    if words: 
        print(max(words, key=len))

Answer 2

使用正则表达式很容易得到什么是 length of longest word:

import re

for line in lines:
    found_strings = re.findall(r'\w+', line)
    print(max([len(txt) for txt in found_strings]))

Answer 3

此解决方案不使用正则表达式。它将行拆分为单词，然后清理每个单词，使其只包含字母字符。

with open("original-3.txt", 'r') as file1:
    lines = file1.readlines()
    for line in lines:
        if not line == "\n":
            words = line.split()
            for i, word in enumerate(words):
                words[i] = "".join([letter for letter in word if letter.isalpha()])
            print(max(words, key=len))

在没有标点符号的 .txt 文件中查找最长的单词

Finding the longest word in a .txt file without punctuation marks

python

parsing

text-processing

text-parsing