如何查找和复制两个标签之间的所有内容（包括标签）

Question

我正在寻找一种更快的方法来查找和复制我正在处理的许多 html 文件中两个标签（包括标签）之间的所有内容。我目前正在使用 sublime 在每个文件中手动复制。 html 标签是常量 (<center> </center>) 。我曾尝试使用正则表达式来完成此操作，但没有成功。 “<center>(.*)</center>”...我会在 sublime 中输入什么来完成这个？或者，如果有初学者可以轻松学习的更好方法，我愿意接受建议！

</head>

<body style="background-color:#9b9b9b;">
**<center>
<table width="580" border="0" cellspacing="0" cellpadding="0" align="center"  class ="responsive-table" style="background-color:#3e5b3e;border:solid thin #3e5b3e;" >
  <tbody>
    <tr>
      <td background="http://app.randomsite.com/js/ckfinder/userfiles//images/banner.jpg" style="padding-top:20px;padding-right:20px;padding-left:20px;" class="hideForMobile"><h1 style="font-family:Arial, Helvetica, sans-serif;font-size:20px;font-weight:bold;text-align:right;color:#eee;vertical-align:bottom;text-decoration:none;margin-top:0;margin-bottom:0;margin-right:0;margin-left:0;" >some message</h1></td>
    </tr>
    <tr>
</center>**
    <!---Start of Banner Image--->
      <td><a href="{{Custom1}}" style="color:inherit;text-decoration:none;" ><img src="http://app.clientcommand.com/js/ckfinder/userfiles//images/top-dollar-ford-banner.jpg" alt="" class="table.responsiveImage" style="display:block;width:100%;border-style:none;" /></a></td>
    <!---End of Banner Image--->
    </tr>
    <tr>

温柔 - 我是编码新手

Answer 1

我认为您的正则表达式缺少某些内容。使用 .* 你得到所有字符但不是换行符（换行符）尝试这样的事情

<center>(.|\n)*<\/center>

breakdown of the changed part
.= all characters
| = or
\n = line-feed(newlines)
(.|\n)* = zero or more times the line above (greedy so as manny times as posible
see demo

如果您一次有更多部分，您可以使用 <center>(.|\n)*?<\/center>

breakdown of the changed part
the ?will make it non-greedy so it will return at the first occurence of </center>
see demo

Answer 2

避免使用正则表达式来解析标记文件。
考虑使用 Beautifulsoup 解析 html 文件并提取内部标记内容。

在你的情况下应该是这样的： from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') for centered_content in soup.find_all('center'): ...(do what you want)...

如何查找和复制两个标签之间的所有内容（包括标签）

How to find & copy everything between two tags (Including tags)

html

regex

replace

batch-processing