从标题下的降价列表中获取每个项目

Question

这是降价文件示例

# Test

## First List

* Hello World
* Lorem
* foo

## Second List

- item

## Third List

+ item 1
part of item 1
+ item 2

## Not a List

bla bla bla

## Empty

## Another List

bla bla bla



bla

* ITEM

## Nested List

### Inside Nested

* foo
* bar

到目前为止我有这个代码：

const markdown = await fs.promises.readFile(path.join(__dirname, 'test.md'), 'utf8');
const regexp = /^#{1,6} (.*)[.\n]*[*\-+] (.*)/gm;
const result = markdown.matchAll(regexp);
console.log([...result].map(m => m.slice(1)));

[
  [ 'First List', 'Hello World' ],
  [ 'Second List', 'item' ],
  [ 'Third List', 'item 1' ],
  [ 'Inside Nested', 'foo' ]
]

第一个问题是它只抓取第一个项目，第二个问题是如果项目是多行的它只会抓取第一行，最后它不包括 Another List 因为标题和列表之间有文本。

我是正则表达式的新手，不确定我当前的正则表达式是否可以安全使用。

所以基本上我想找到 markdown 文件中的每个列表，将其项目放入一个数组中，然后查看上面是否有标题，而不是某种类型的另一个列表，然后将该标题放在该数组的开头（所有人都认为没有必要采用那种格式，也可以是 object，我只是认为数组会更简单）

想要的结果：

[
  ['First List', 'Hello World', 'Lorem', 'foo'],
  ['Second List', 'item'],
  ['Third List', 'item 1\npart of item 1', 'item 2'],
  ['Another List', 'ITEM'],
  ['Inside Nested', 'foo', 'bar']
]

Answer 1

你可以试试这个正则表达式：

/(?<=#{1,6} (.*)\n(?:(?!#).*\n)*)(?=[+*-] (.*(?:\n(?![#+*-]).+)?))/g

基本上它匹配所有宽度为0的字符并测试它前面是否有列表项（例如* item）和它前面的任何标题（例如# Title）并将它们都放在单独的组。它们之间的任何内容都无关紧要，除非它是另一个标题。

你可以see the test cases here

matchAll 结果将是

[
    ["", "First List", "Hello World"],
    ["", "First List", "Lorem"],
    ["", "First List", "foo"],
    ["", "Second List", "item"],
    ["", "Third List", "item 1\npart of item 1"],
    ["", "Third List", "item 2"],
    ["", "Another List", "ITEM"],
    ["", "Inside Nested", "foo"],
    ["", "Inside Nested", "bar"]
]

由于您无法制作具有动态金额匹配组的正则表达式，因此您需要手动将它们组合在一起。

这里是完整的例子：

const markdown = `
# Test

## First List

* Hello World
* Lorem
* foo

## Second List

- item

## Third List

+ item 1
part of item 1
+ item 2

## Not a List

bla bla bla

## Empty

## Another List

bla bla bla



bla

* ITEM

## Nested List

### Inside Nested

* foo
* bar
`;

const regexp = /(?<=#{1,6} (.*)\n(?:(?!#).*\n)*)(?=[+*-] (.*(?:\n(?![#+*-]).+)?))/g;
const matches = [...markdown.matchAll(regexp)];
const result = matches.reduce((acc, cur) => {
    const [title, item] = cur.slice(1);
    const target = acc.find(e => e[0] === title);
    if(target) {
        target.push(item);
    } else {
        acc.push([title, item]);
    }
    return acc;
}, []);
console.log(result);

从标题下的降价列表中获取每个项目

Get every item from markdown list under headings

javascript

regex

markdown

node.js

typescript