正则表达式:如何在下一个匹配模式(或文件结尾)之前捕获所有内容

Regex: how to capture all content before next matching pattern (or end of file)

我正在尝试处理使用 PCRE-Library 的 icons.yml (from the FontAwesome-project) with regular expressions. (Language is "Dyalog APL"。我正在为 "case insensitive" 和 "dot matches lines breaks" 设置标志。) 因此,使用以下输入:

  - name:       Glass
    id:         glass
    unicode:    f000
    created:    1.0
    categories:
      - Web Application Icons
      - Test1
      - Test2

  - name:       Music
    id:         music
    unicode:    f001
    created:    1.0
    categories:
      - Web Application Icons

  - name:       Search
    id:         search
    unicode:    f002
    created:    1.0
    categories:
      - Web Application Icons

我正在寻找能给我“name”、“id”、“unicode”、“created”和“created”内容的 rx最后是“categories”(我需要在下一个“-name”开始或 EOF 之前的所有内容)。

已成功组合 returns 前 4 个表达式,但 "categories" 组合失败。不知何故,这个“EOF 或不是“-name””让我精神泛滥;-)

.*-\sname:\s*([a-z\-]*)\s*id:\s*([a-z\-]*)\s*unicode:\s*([0-9a-f]{4})\s*created:\s*([0-9\.]*)\s*categories:\s*((?!-\sname:))

你可以试试这个:

name:(.*?)id:(.*?)unicode:(.*?)created:(.*?)categories:(.*?)(?=- name|$)

Explanation

Perl 示例:

   #!/usr/bin/perl
# your code goes here

use strict;

my $str = '- name:      Glass
id:         glass
unicode:    f000
created:    1.0
categories:
  - Web Application Icons
  - Test1
  - Test2

- name:       Music
id:         music
unicode:    f001
created:    1.0
categories:
  - Web Application Icons

- name:       Search
id:         search
unicode:    f002
created:    1.0
categories:
  - Web Application Icons1
';
my $regex = qr/name:(.*?)id:(.*?)unicode:(.*?)created:(.*?)categories:(.*?)(?=- name|$)/sp;

while ( $str =~ /$regex/g ) {
  print "Whole match is ${^MATCH}\n";

}

Run the code here