pyparsing 递归语法 space 逗号分隔列表中的分隔列表

Question

有以下我要解析的字符串：

((K00134,K00150) K00927,K11389) (K00234,K00235)

每个步骤由 space 分隔，交替由逗号表示。我被困在字符串的第一部分，括号内有一个 space 。我正在寻找的所需输出是：

[[['K00134', 'K00150'], 'K00927'], 'K11389'], ['K00234', 'K00235']

到目前为止我得到的是进行递归解析的基本设置，但我对如何将 space 分隔列表编码到括号表达式

感到困惑

from pyparsing import Word, Literal, Combine, nums, \
    Suppress, delimitedList, Group, Forward, ZeroOrMore

ortholog = Combine(Literal('K') + Word(nums, exact=5))
exp = Forward()
ortholog_group = Suppress('(') + Group(delimitedList(ortholog)) + Suppress(')')
atom = ortholog | ortholog_group | Group(Suppress('(') + exp + Suppress(')'))
exp <<= atom + ZeroOrMore(exp)

Answer 1

你走在正确的轨道上，但我认为你只需要一个包含 () 分组的地方，而不是两个。

import pyparsing as pp 

LPAR,RPAR = map(pp.Suppress, "()")
ortholog = pp.Combine('K' + pp.Word(pp.nums, exact=5))

ortholog_group = pp.Forward()
ortholog_group <<= pp.Group(LPAR + pp.OneOrMore(ortholog_group | pp.delimitedList(ortholog)) + RPAR)
expr = pp.OneOrMore(ortholog_group)

tests = """\
((K00134,K00150) K00927,K11389) (K00234,K00235)
"""
expr.runTests(tests)

给出：

((K00134,K00150) K00927,K11389) (K00234,K00235)
[[['K00134', 'K00150'], 'K00927', 'K11389'], ['K00234', 'K00235']]
[0]:
  [['K00134', 'K00150'], 'K00927', 'K11389']
  [0]:
    ['K00134', 'K00150']
  [1]:
    K00927
  [2]:
    K11389
[1]:
  ['K00234', 'K00235']

这与您所说的不完全相同：

you wanted: [[['K00134', 'K00150'], 'K00927'], 'K11389'], ['K00234', 'K00235']
I output  : [[['K00134', 'K00150'], 'K00927', 'K11389'], ['K00234', 'K00235']]

我不确定为什么您想要的输出围绕 space 分隔部分 (K00134,K00150) K00927 进行分组。这是你的意图还是打字错误？如果有意，您需要修改 ortholog_group 的定义，除了括号中的分组之外，还需要做一个 space 分隔组的分隔列表。我能得到的最接近的是：

[[[[['K00134', 'K00150']], 'K00927'], ['K11389']], [['K00234', 'K00235']]]

这需要一些恶作剧才能在 space 上分组，但在与其他组分组时不分组裸直系同源物。这是它的样子：

ortholog_group <<= pp.Group(LPAR + pp.delimitedList(pp.Group(ortholog_group*(1,) & ortholog*(0,))) + RPAR) | pp.delimitedList(ortholog)

& 运算符与重复运算符结合使用 space 分隔分组（*(1,) 等同于 OneOrMore，*(0,) 与 ZeroOrMore，但也支持 *(10,) 表示“10 或更多”，或 *(3,5) 表示 "at least 3 and no more than 5"）。这也不完全符合您的要求，但如果您确实需要对 space 分隔的位进行分组，则可能会让您更接近。

但我必须说 spaces 上的分组是模棱两可的 - 或者至少是令人困惑的。 “(A,B) C D”应该是 [[A,B],C,D] 或 [[A,B],C],[D] 还是 [[A,B],[C,D]]？我认为，如果可能的话，你应该允许逗号分隔的列表，并且或许 space 也被分隔，但需要 ()'s when items should be grouped.

pyparsing 递归语法 space 逗号分隔列表中的分隔列表

pyparsing recursive grammar space separated list inside a comma separated list

recursion

pyparsing