Python 删除包含 "l" 的单词

Python remove word containing "l"

我目前正在做一个小程序。

该程序的目的是从文件中获取输入,编辑文件以删除包含字母 "l" 的任何单词,然后将其输出到输出文件中。

我目前的代码有效,但是,它不会删除包含字母 "l" 的单词,只是删除字母本身。

这是我的代码

def my_main(ifile_name, ofile_name):
    ifile_name = open(ifile_name, 'r')
    ofile_name = open(ofile_name, "w+")
    delete_list = ['l']
    for line in ifile_name:
        for word in delete_list:
            line = line.replace(word, "")
        ofile_name.write(line)
    ifile_name.close()
    ofile_name.close()

谢谢

更新

这是输入文件的样子:

The first line never changes. 
The second line was a bit much longer. 
The third line was short. 
The fourth line was nearly the longer line. 
The fifth was tiny. 
The sixth line is just one line more.
The seventh line was the last line of the original file.

如果代码正确,输出文件应该如下所示

The first never changes. 
The second was a bit much. 
The third was short. 
The fourth was the. 
The fifth was tiny. 
The sixth is just one more.
The seventh was the of the.

好好想想,你在循环什么?

for line in ifile_name: #line == every line in the file
    for word in delete_list: #word is equal to every 'word' (although it is mroe a letter) in delete_list
        line = line.replace(word, "") #you are replacing word (which is 'l') with a space

您可能想要更多类似的东西:

for line in ifile_name:
        for word in line.split(): #iterate through words in your line, not delete_list
            if any(x in word for x in delete_list): #check if any of the letters in delete_list are in word
                line = line.replace(word,'') #replace the whole word with blanks

请注意,使用此代码,您将留下额外的空格:

this_line_is -> this__is
    ^    ^          ^^

因此您可以调用:line = line.replace(word+' ', '') 但这可能会导致 'wordwithl.'

等情况出现问题

在没有看到你的文件是什么样子的情况下,很难说出确切的用途,所以如果你能更新问题那就太好了

但目前你正在遍历每个字母而不是单词...使用 split() 将单词拆分成一个列表并更改该列表然后将单词重新连接在一起以获得一个没有包含你的单词的字符串字母

words = ''
with open(ifile_name,"r") as file:
    for line in file:
        list_of_words = line.split(' ')
        for key, word in enumerate(list_of_words):
            if 'l' in word:
                list_of_words[key] = ''

        words += ' '.join(w for w in list_of_words if w != '')
        words += '\n'

with open(ofile_name, "w+") as file:
    file.write(words)

这样做的好处是您对白色 space 没有任何问题。你会得到一个带有单个 spaces

的常规字符串

编辑:正如评论中指出的那样,更好的方法(整个文件不在内存中)是内联

with open(ifile_name,"r") as in_file, open(ofile_name, "w+") as out_file:
    for line in file:
        list_of_words = line.split(' ')
        for key, word in enumerate(list_of_words):
            if 'l' in word:
                list_of_words[key] = ''

        out_file.write(' '.join(w for w in list_of_words if w != ''))

如果您只想要一个完整的新文件而不需要记录删除的单词,那么这是一个非常简单的解决方案,不需要您将所有数据存储在内存中:

def remove_words(in_file, to_remove, out_file):
    with open(in_file) as f, open(out_file, "w") as f2:
        f2.writelines(" ".join([word for word in line.split()
                         if not to_remove.issubset(word)]) + "\n"
                             for line in f)


remove_words("test.txt", {"l"}, "removed.txt")

现在删除包含您更新的行:

In [23]: cat test.txt
The first line never changes.
The second line was a bit much longer.
The third line was short.
The fourth line was nearly the longer line.
The fifth was tiny.
The sixth line is just one line more.
The seventh line was the last line of the original file.

In [24]: remove_words("test.txt",{"l"},"removed.txt")

In [25]: cat removed.txt
The first never changes.
The second was a bit much
The third was short.
The fourth was the
The fifth was tiny.
The sixth is just one more.
The seventh was the of the

一个想法可能是使用 regular expression re.sub(r'\S*l\S*',r'',text),然后完整的程序显示为:

import re

def my_main(ifile_name, ofile_name):
    with open (ifile_name,"r") as ifile_name :
        text=ifile_name.read()
    text2 = re.sub(r'\S*l\S*',r'',text)
    with open(ofile_name, "w+") as ofile_name :
        ofile_name.write(text2)

一个问题是只有单词本身会被删除,而不是它周围的 spaces。一个潜在的解决方案是在单词旁边(或之前)捕获 space:

re.sub(r'\S*l\S*\s*',r'',text)

读取的程序:

import re

def my_main(ifile_name, ofile_name):
    with open (ifile_name,"r") as ifile_name :
        text=ifile_name.read()
    text2 = re.sub(r'\S*l\S*\s*',r'',text)
    with open(ofile_name, "w+") as ofile_name :
        ofile_name.write(text2)

这种方法的一个潜在缺点是文件需要放入(虚拟)内存:对于大文件 (1 GiB+),进程可能会变慢,甚至会因为使用太多而被操作系统杀死资源。