匹配正则表达式直到遇到特定字符串或字符串结尾

Question

我正在尝试创建正确的正则表达式以在 python 中用于以下场景的多行匹配。我需要在匹配到字符串 Description\s 后跳过一行，然后获取第一次出现 \s 之前的所有文本。\n 或字符串 Homepage: 或字符串结尾。

我正在尝试使用以下正则表达式，但是缺少某些内容并且并未涵盖所有场景：Description\s*:\s*.*\n(?P<description>[\w\s$\&\+\,\:\;\=\?\@\#\|\'\<\>\.\^\*\%\!\-]*\n\s*)\s\.

场景 1： 预期结果：“libX11-xcb 提供了客户端需要的功能，这些功能利用了 Xlib/XCB 通过同一个 X 连接混合调用 Xlib 和 XCB。

Pre-Depends: multiarch-support
Description: Xlib/XCB interface library
 libX11-xcb provides functions needed by clients which take advantage of
 Xlib/XCB to mix calls to both Xlib and XCB over the same X connection.
 .
 More information about X.Org can be found at:
 <URL:http://www.X.org>
 .
 More information about XCB can be found at:
 <URL:http://xcb.freedesktop.org>
 .
 This module can be found at
 git://anongit.freedesktop.org/git/xorg/lib/libX11

场景二： 预期结果：“这个包包含许多重要的实用程序，其中大部分面向维护您的系统。一些更此软件包中包含的重要实用程序允许您分区您的硬盘，查看内核消息，并创建新的文件系统。"

Essential: yes
Installed-Size: 2999
Replaces: bash-completion (<< 1:2.1-4.1~), initscripts (<< 2.88dsf-59.2~), mount (= 2.26.2-3), mount (= 2.26.2-3ubuntu1), sysvinit-utils (<< 2.88dsf-59.1~)
Pre-Depends: libblkid1 (>= 2.25), libc6 (>= 2.15), libfdisk1 (>= 2.29~rc2), libmount1 (>= 2.25), libncursesw5 (>= 6), libpam0g (>= 0.99.7.1), libselinux1 (>= 2.6-3~), libsmartcols1 (>= 2.28~rc1), libsystemd0, libtinfo5 (>= 6), libudev1 (>= 183), libuuid1 (>= 2.16), zlib1g (>= 1:1.1.4)
Conffiles:
 /etc/default/hwclock 3916544450533eca69131f894db0ca12
Description: miscellaneous system utilities
 This package contains a number of important utilities, most of which
 are oriented towards maintenance of your system. Some of the more
 important utilities included in this package allow you to partition
 your hard disk, view kernel messages, and create new filesystems.

场景 3： 预期结果： "libcurl是一个易于使用的客户端URL传输库，支持DICT，文件，FTP，FTPS，GOPHER，HTTP，HTTPS，IMAP，IMAPS，LDAP，LDAPS，POP3，POP3S， RTMP、RTSP、SCP、SFTP、SMTP、SMTPS、TELNET 和 TFTP。"

Architecture: blob
Multi-Arch: same
Recommends: ca-certificates
Description: easy-to-use client-side URL transfer library (OpenSSL flavour)
 libcurl is an easy-to-use client-side URL transfer library, supporting DICT,
 FILE, FTP, FTPS, GOPHER, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S,
 RTMP, RTSP, SCP, SFTP, SMTP, SMTPS, TELNET and TFTP.
 .
 libcurl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP
 form based upload, proxies, cookies, user+password authentication (Basic,
 Digest, NTLM, Negotiate, Kerberos), file transfer resume, http proxy tunneling
 and more!
 .
 libcurl is free, thread-safe, IPv6 compatible, feature rich, well supported,
 fast, thoroughly documented and is already used by many known, big and
 successful companies and numerous applications.
 .
 SSL support is provided by OpenSSL.
Homepage: http://curl.haxx.se

如果您能提供正确的表达方式，我们将不胜感激。

Answer 1

这应该有效。

import re
match = re.search(r'Description:.*?\n(.*?)(\s.\n|$)', str1, re.DOTALL)
print(match.group(1))

Answer 2

作为替代方案，您也可以在不使用 re.DOTALL 匹配所有不以 space 和点开头的行的情况下获得匹配，主页的行尾使用否定前瞻来防止使用 .*?

进行不必要的回溯

注意转义点 \. 以按字面意思匹配它。

\bDescription:.*\r?\n(?P<description>(?:(?! \.|$|Homepage).*(?:\r?\n)?)*)

部分：

\bDescription:.*\r?\n 匹配说明：和该行的其余部分和一个换行符
(?P<description> 命名组描述
- (?:非捕获组
  - (?! \.|$|Homepage) 断言直接在右边的不是备选方案之一
  - .*(?:\r?\n)? 匹配除换行符以外的任何字符 0+ 次并匹配可选的换行符
- )*关闭非捕获组并重复0+次
) 关闭组 1

Regex demo

匹配正则表达式直到遇到特定字符串或字符串结尾

Regex to match until specific string or end of string met

python

regex

regex-lookarounds