正则表达式捕获最后 2 行之前的所有单词

Question

我有一堆线数据需要像这样捕获：

Level production data TD Index
Total Agriculture\Production data TS Index

我需要捕获最后两个单词之前的所有内容，例如，在这种情况下，我的正则表达式输出应该是 Level production data 以表示第一个匹配项。我怎样才能做到这一点，同时还假设 TD Index 之前的单词数量不同。谢谢！

Answer 1

你可以试试这个：

import re
s = ["Level production data TD Index", "Total Agriculture\Production data TS Index"]
new_s = [re.findall('[\w\s\W]{1,}(?=\s\w+\s\w+$)', i)[0] for i in s]

输出：

['Level production data', 'Total Agriculture\Production data']

Answer 2

试试这个正则表达式：

^.*(?=(?:\s+\S+){2}$)

Click for Demo

解释：

^ - 断言字符串的开头
.* - 匹配出现次数超过 0 次的除换行符以外的任何字符
(?=(?:\s+\S+){2}$) - 正向前瞻以验证当前位置后跟 2 个词（1+ 白色 space 后跟 1+ 次非白色 space）X2 就在之前字符串的结尾

Answer 3

代码

See regex in use here

.*(?= \S+ \S+)

或者：.*(?= [\w\/]+ [\w\/]+) 将 \S 替换为您定义的有效字符集。

如果有超过 1 个 space 存在的可能性，您也可以在 space 之后添加 +：.*(?= +\S+ +\S+)

用法

See code in use here

import re

r = r".*(?= \S+ \S+)"

l = [
    "Level production data TD Index",
    "Total Agriculture\Production data TS Index"
]

for s in l:
    m = re.match(r, s)
    if m:
        print m.group(0)

说明

.* 匹配任意字符任意次数
(?= \S+ \S+) 确保后续匹配的正向前瞻
- </code> 匹配文字 space</li> <li><code>\S+ 匹配任何非白色space 字符一次或多次
- </code> 匹配文字 space</li> <li><code>\S+ 匹配任何非白色space 字符一次或多次

正则表达式捕获最后 2 行之前的所有单词

Regex capture all words in a line before the last 2

python

regex

regex-negation

python-2.7

代码

用法

说明