字符串删除

Question

希望有人能帮助我。我是 python 的新手，正在学习。我想知道如何从字符串中删除不需要的字符。

例如，

我在文本文件中有一些字符串，例如 'dogs op care 6A domain, cats op pv=2 domain 3, pig op care2 domain 3'

从 op 开始我就什么都不需要了。即，我想要得到的只是 'dogs, cats, pig'

我将 'op' 视为所有这些字符串中的模式，因此尝试了以下代码

import re
f = open('animalsop.txt','r')
s = f.read()
p = re.compile('op')
match = p.search(s)
print (s[:match.start()])

我得到的输出只是 'dog'

为什么我没有得到猫和猪，因为它们也包含 'op'。

任何帮助将不胜感激，因为我需要代码来分析我得到的大量类似数据。

以上代码来源于String splitting in Python using regex

归功于 Varuna 和 kragniz

Answer 1

根据您提供的示例，我建议使用简单的 .split() 字符串方法和 select 第一部分 - 例如“op”之前的部分。

partOfYourInterest = "dogs op care 6A domain".split(" op")[0]

你可以迭代更多，例如通过循环

text = ["dogs op care 6A domain","cats op pv=2 domain 3", "pig op care2 domain 3"]

for part in text:
    animal = part.split(" op")[0]
    print(animal)

对于你的 txt，它可能是这样的

with open('animalsop.txt', 'r') as f:
    for line in f:
       animal = part.split(" op")[0]
       print(animal)

Answer 2

不使用正则表达式来解决问题可能是最简单的。

假设一个名为 animalsop.txt 的文件如下所示：

dogs op care 6A domain
cats op pv=2 domain 3
pig op care2 domain 3

针对您的问题的 pythonic 解决方案类似于：

with open('animalsop.txt', 'r') as f:
    for line in f:
        before_op = line.split(' op ')[0]
        print(before_op)

在 python 中打开文件的 with 构造的好处在于它确保您在完成后关闭文件。

如果相反，您的 animalsop.txt 文件只是一长行，其中包含以逗号分隔的各种子句，例如：

dogs op care 6A domain, cats op pv=2 domain 3, pig op care2 domain 3

然后你可以这样做：

with open('animalsop.txt', 'r') as f:
    for line in f:
        for clause in line.split(','):
            before_op = clause.strip().split(' op')[0]
            print(before_op)

（clause.strip() 删除逗号后的空格）。

Answer 3

如果你想使用正则表达式，你可以使用：

re.findall('\w+?(?= op)', s)

['dogs', 'cats', 'pig']

Answer 4

如果你只想要第一个词，你可以使用如果字符串是你的字符串

string='dog fgfdggf fgs, cat afgfg, pig fggag'
strings=string.split(', ')
newstring=strings[0].split(' ', 1)[0]
for stri in strings[1:]:
    newstring=newstring+', '+stri.split(' ', 1)[0]

字符串删除

String deleting

python

regex

string

regex-negation