使用 Python 计算字符串中的嵌入级别

Question

我有如下的字符串，我想计算字符串可以进入的最大深度。

//node[@cat="ssub"]//node[@cat and node[@rel="cmp" and @root="te" and @pos="comp"] and node[@rel="body" and @cat="inf" and node[@rel="hd" and @wvorm="inf" and @pos="verb"] and node[@rel="vc" and @cat="ppart" and node[@rel="hd" and @pos="verb"]]]]

如果您将上述代码输入此 beautifier（免责声明：我的工具），您就会明白我所说的级别的意思：缩进级别。因此，上面的字符串应该 return 3. 我明白了，用这个 Python 脚本：

def count_depth(xpath):
    depth = 0
    max_depth = 0
    for c in xpath:
        if c == "[":
            depth += 1
        elif c == "]":
            depth -= 1

        if depth > max_depth:
            max_depth = depth

    return max_depth

很简单。但是，有些字符串还包含我不想计算的 [ 和 ]，例如：

//node[@cat="smain" and node[@rel="mod" and @cat="mwu" and node[@rel="mwp" and @pt="lid" and number(@begin) < ../number(../node[@rel="predc" and @pt="adj"]/@begin)]] and node[@rel="predc" and @pt="adj"]]

如您所见，有一个参数 number 导致美化器中的额外缩进，因为存在 node[ 字符串。因此，我上面的函数也计算了这些 [，这是我不想要的。在我刚刚发布的示例中，我想要输出 2，即使我现在得到 3。

我猜我需要编写一个完全不同的函数，使用正则表达式和后视，and/or 前视而不是遍历每个字符？

Answer 1

我通过首先从字符串中删除罪魁祸首（即 number(...）然后才开始计数来解决它。 lv_xpath 是原始的 XPath 字符串。

r_numberless = re.compile(r"(?:and )?number\(@begin\).*?\/@begin\)")
numberless_xpath = r_numberless.sub("", lv_xpath)
lv_x_levels = count_depth(numberless_xpath)

def count_depth(xpath):
    depth = 0
    max_depth = 0
    for c in xpath:
        if c == "[":
            depth += 1
        elif c == "]":
            depth -= 1

        if depth > max_depth:
            max_depth = depth

    return max_depth

使用 Python 计算字符串中的嵌入级别

Counting levels of embedding in string with Python

python

regex

match