如何删除字符串中引号内的内容？

Question

我有这个

<a href="http://helloword.com"><img src="hola.png" alt="hola"></a>

我需要

 <a href=""><img src="" alt= ""></a>

Answer 1

我试过下面的程序，它很适合你的输入。看看就知道了。

 import re
 s='<a href="http://helloword.com"><img src="hola.png" alt="hola"></a>'
 r=re.sub('".*?"','""',s)
 print r

它会打印出：

<a href=""><img src="" alt=""></a>

Answer 2

用正则表达式试了一下，没有得到预期的结果。我最终解决问题的是这段代码。我真的是更灵活，更有活力。还允许将结果保存到新的 html 文件中

import random
import os
import subprocess
from lxml import html
from lxml.html.clean import clean_html
from lxml.html import tostring, html5parser
import glob
from lxml import html

#print glob.glob("*.html")
for itemfile in glob.glob("*.html"):
    if os.path.isfile(itemfile):
        f = open(itemfile, 'rb')
        data = f.read()
        f.close()
        dochtml = html.fromstring(data)
        for element, attribute, link, pos in dochtml.iterlinks():
      if element.tag in ("img","a"):
        if attribute == "src":
          element.set('src', "")
          element.set('alt', "")
        if attribute == "href":
          element.set('href', "")
      #print tostring(dochtml)
      parser = tostring(dochtml, method='html')
      f = open(itemfile[:itemfile.find(".html")] + "_parser.html", 'wb')
      f.write(parser)
      f.close()           
    else:
        print 'not file.'

Answer 3

使用 BeautifulSoup 非常简单...我不知道您为什么使用这么多代码。这会将此 href、alt 和 src 标签中的内容替换为空字符串。

我会用这个代替 lxml ...

from bs4 import BeautifulSoup

soup = BeautifulSoup('<a href="http://helloword.com"><img src="hola.png" alt="hola"></a>', 'html.parser')
href = soup.find('a').attrs.get('href')
alt = soup.find('img').attrs.get('alt')
src= soup.find('img').attrs.get('src')

text = str(soup).replace(href, '').replace(src,'').replace(alt,'')
print text

如何删除字符串中引号内的内容？

How can I delete the content inside a quotation mark in my string?

python

html-parsing