如何获得平衡括号之间的表达式
How to get an expression between balanced parentheses
假设我得到以下类型的字符串:
"(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"
我想提取最顶层括号中包含的子字符串。 IE。我想获取字符串:"this is (haha) a string(()and it's sneaky)"
和 "lorem"
.
是否有一个很好的 pythonic 方法来做到这一点?正则表达式 显然 不能胜任这项任务,但也许有办法让 xml 解析器来完成这项工作?对于我的应用程序,我可以假设括号格式正确,即不是类似 (()(().
您确定正则表达式不够好吗?
>>> x=re.compile(r'\((?:(?:\(.*?\))|(?:[^\(\)]*?))\)')
>>> x.findall("(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla")
["(this is (haha) a string(()and it's sneaky)", '(lorem)']
>>> x.findall("((((this is (haha) a string((a(s)d)and ((it's sneaky))))))) ipsom (lorem) bla")
["((((this is (haha) a string((a(s)d)and ((it's sneaky))", '(lorem)']
这不是很 "pythonic"...但是
def find_strings_inside(what_open,what_close,s):
stack = []
msg = []
for c in s:
s1=""
if c == what_open:
stack.append(c)
if len(stack) == 1:
continue
elif c == what_close and stack:
stack.pop()
if not stack:
yield "".join(msg)
msg[:] = []
if stack:
msg.append(c)
x= list(find_strings_inside("(",")","(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"))
print x
这是堆栈的标准用例:按字符读取字符串,每当遇到左括号时,就将符号压入堆栈;如果遇到右括号,则从堆栈中弹出符号。
因为你只有一种括号,所以你实际上不需要堆栈;相反,只要记住有多少个左括号就足够了。
此外,为了提取文本,我们还会记住第一层括号打开时部分开始的位置,并在遇到匹配的右括号时收集结果字符串。
这可能是这样的:
string = "(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"
stack = 0
startIndex = None
results = []
for i, c in enumerate(string):
if c == '(':
if stack == 0:
startIndex = i + 1 # string to extract starts one index later
# push to stack
stack += 1
elif c == ')':
# pop stack
stack -= 1
if stack == 0:
results.append(string[startIndex:i])
print(results)
# ["this is (haha) a string(()and it's sneaky)", 'lorem']
这或多或少重复了已经说过的内容,但可能更容易阅读:
def extract(string):
flag = 0
result, accum = [], []
for c in string:
if c == ')':
flag -= 1
if flag:
accum.append(c)
if c == '(':
flag += 1
if not flag and accum:
result.append(''.join(accum))
accum = []
return result
>> print extract(test)
["this is (haha) a string(()and it's sneaky)", 'lorem']
假设我得到以下类型的字符串:
"(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"
我想提取最顶层括号中包含的子字符串。 IE。我想获取字符串:"this is (haha) a string(()and it's sneaky)"
和 "lorem"
.
是否有一个很好的 pythonic 方法来做到这一点?正则表达式 显然 不能胜任这项任务,但也许有办法让 xml 解析器来完成这项工作?对于我的应用程序,我可以假设括号格式正确,即不是类似 (()(().
您确定正则表达式不够好吗?
>>> x=re.compile(r'\((?:(?:\(.*?\))|(?:[^\(\)]*?))\)')
>>> x.findall("(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla")
["(this is (haha) a string(()and it's sneaky)", '(lorem)']
>>> x.findall("((((this is (haha) a string((a(s)d)and ((it's sneaky))))))) ipsom (lorem) bla")
["((((this is (haha) a string((a(s)d)and ((it's sneaky))", '(lorem)']
这不是很 "pythonic"...但是
def find_strings_inside(what_open,what_close,s):
stack = []
msg = []
for c in s:
s1=""
if c == what_open:
stack.append(c)
if len(stack) == 1:
continue
elif c == what_close and stack:
stack.pop()
if not stack:
yield "".join(msg)
msg[:] = []
if stack:
msg.append(c)
x= list(find_strings_inside("(",")","(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"))
print x
这是堆栈的标准用例:按字符读取字符串,每当遇到左括号时,就将符号压入堆栈;如果遇到右括号,则从堆栈中弹出符号。
因为你只有一种括号,所以你实际上不需要堆栈;相反,只要记住有多少个左括号就足够了。
此外,为了提取文本,我们还会记住第一层括号打开时部分开始的位置,并在遇到匹配的右括号时收集结果字符串。
这可能是这样的:
string = "(this is (haha) a string(()and it's sneaky)) ipsom (lorem) bla"
stack = 0
startIndex = None
results = []
for i, c in enumerate(string):
if c == '(':
if stack == 0:
startIndex = i + 1 # string to extract starts one index later
# push to stack
stack += 1
elif c == ')':
# pop stack
stack -= 1
if stack == 0:
results.append(string[startIndex:i])
print(results)
# ["this is (haha) a string(()and it's sneaky)", 'lorem']
这或多或少重复了已经说过的内容,但可能更容易阅读:
def extract(string):
flag = 0
result, accum = [], []
for c in string:
if c == ')':
flag -= 1
if flag:
accum.append(c)
if c == '(':
flag += 1
if not flag and accum:
result.append(''.join(accum))
accum = []
return result
>> print extract(test)
["this is (haha) a string(()and it's sneaky)", 'lorem']