我们如何使用 python 正则表达式解析文本?
How can we parse the text using python regex?
我有以下文本,我想要字典格式的输出。
text = '''
146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554
156.127.178.177 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
100.32.205.59 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048
'''
我尝试了以下方法,但能够获得 2 部词典,而我期望 return 4.
names = []
for item in re.finditer("(?P<host>[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s-\s(?P<user_name>[a-zA-Z0-9]+)\s\[(?P<time>\d{2}\/[a-zA-Z]+\/[0-9]+\:[0-9]+\:[0-9]+\:[0-9]+\s-\d{4})\]\s\"(?P<request>[a-zA-Z]+\s\/[a-zA-Z]+\s[a-zA-Z]+\/\d{1}\.\d{1})\"", text):
item.groupdict()
names.append(item.groupdict())
print(names)
任何人都可以帮我解决这个问题吗?
您要匹配的这部分字符串:
"DELETE /virtual/solutions/target/web+services HTTP/2.0"
与您的正则表达式不匹配,因为它期望 DELETE /
之后的所有内容都是字母顺序的。匹配的请求是:
POST /incentivize HTTP/1.1
PATCH /architectures HTTP/1.0
没有的是
DELETE /virtual/solutions/target/web+services HTTP/2.0
DELETE /interactive/transparent/niches/revolutionize HTTP/1.1
更改正则表达式的 request
部分以识别 /
和 +
除了字母字符:
"[a-zA-Z]+\s\/[a-zA-Z/+]+\s[a-zA-Z]+\/\d{1}\.\d{1}\"
↑↑
而不是
"[a-zA-Z]+\s\/[a-zA-Z]+\s[a-zA-Z]+\/\d{1}\.\d{1}\"
我有以下文本,我想要字典格式的输出。
text = '''
146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554
156.127.178.177 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701
100.32.205.59 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048
'''
我尝试了以下方法,但能够获得 2 部词典,而我期望 return 4.
names = []
for item in re.finditer("(?P<host>[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s-\s(?P<user_name>[a-zA-Z0-9]+)\s\[(?P<time>\d{2}\/[a-zA-Z]+\/[0-9]+\:[0-9]+\:[0-9]+\:[0-9]+\s-\d{4})\]\s\"(?P<request>[a-zA-Z]+\s\/[a-zA-Z]+\s[a-zA-Z]+\/\d{1}\.\d{1})\"", text):
item.groupdict()
names.append(item.groupdict())
print(names)
任何人都可以帮我解决这个问题吗?
您要匹配的这部分字符串:
"DELETE /virtual/solutions/target/web+services HTTP/2.0"
与您的正则表达式不匹配,因为它期望 DELETE /
之后的所有内容都是字母顺序的。匹配的请求是:
POST /incentivize HTTP/1.1
PATCH /architectures HTTP/1.0
没有的是
DELETE /virtual/solutions/target/web+services HTTP/2.0
DELETE /interactive/transparent/niches/revolutionize HTTP/1.1
更改正则表达式的 request
部分以识别 /
和 +
除了字母字符:
"[a-zA-Z]+\s\/[a-zA-Z/+]+\s[a-zA-Z]+\/\d{1}\.\d{1}\"
↑↑
而不是
"[a-zA-Z]+\s\/[a-zA-Z]+\s[a-zA-Z]+\/\d{1}\.\d{1}\"