我们如何使用 python 正则表达式解析文本?

How can we parse the text using python regex?

我有以下文本,我想要字典格式的输出。

text = '''
146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622

197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554

156.127.178.177 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701

100.32.205.59 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048
'''

我尝试了以下方法,但能够获得 2 部词典,而我期望 return 4.

names = []

for item in re.finditer("(?P<host>[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s-\s(?P<user_name>[a-zA-Z0-9]+)\s\[(?P<time>\d{2}\/[a-zA-Z]+\/[0-9]+\:[0-9]+\:[0-9]+\:[0-9]+\s-\d{4})\]\s\"(?P<request>[a-zA-Z]+\s\/[a-zA-Z]+\s[a-zA-Z]+\/\d{1}\.\d{1})\"", text):
    item.groupdict()
    names.append(item.groupdict())
            
print(names)

任何人都可以帮我解决这个问题吗?

您要匹配的这部分字符串:

"DELETE /virtual/solutions/target/web+services HTTP/2.0"

与您的正则表达式不匹配,因为它期望 DELETE / 之后的所有内容都是字母顺序的。匹配的请求是:

POST /incentivize HTTP/1.1
PATCH /architectures HTTP/1.0

没有的是

DELETE /virtual/solutions/target/web+services HTTP/2.0
DELETE /interactive/transparent/niches/revolutionize HTTP/1.1

更改正则表达式的 request 部分以识别 /+ 除了字母字符:

"[a-zA-Z]+\s\/[a-zA-Z/+]+\s[a-zA-Z]+\/\d{1}\.\d{1}\"
                     ↑↑

而不是

"[a-zA-Z]+\s\/[a-zA-Z]+\s[a-zA-Z]+\/\d{1}\.\d{1}\"