如何通过 python 正则表达式从文本文件中搜索 href?
how to search for a href from a text file through python regEx?
我通过执行一些 CLI 实用程序得到了一堆输出信息消息,并且在文件末尾有一个 Web URL。我需要使用 Python 正则表达式来找到 link 并显示为输出。下面是我为我的目的编写的 3 行代码:
file = str('/root/PycharmProjects/rest_project/sponge_link')
with open(file, 'r') as fo:
fo.read().__str__()
urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', fo)
print(urls)
文件内容如下
INFO: Streaming results to http://abc/56659bf3-a66d-482b-80e8-6484cafc650d
INFO: Analyzed target <path/path/path> (73 packages loaded, 10521 targets configured).
INFO: Found 1 target...
Target <path>/dence up-to-date:
utility-<path>/dence_0.0-5_amd64.deb
utility-<path>/dence_0.4-5_amd64.changes
INFO: Elapsed time: 23.669s, Critical Path: 0.47s, Remote (0.00% of the time): [queue: 0.00%, setup: 0.00%, process: 0.00%]
INFO: Build Event Protocol files produced successfully.
INFO: Build completed successfully, 1 total action
INFO: Still uploading to http://abc/56659bf3-a66d-482b-80e8-6484cafc650d
但是,当我执行程序时,出现以下错误:
Traceback (most recent call last):
File "/root/PycharmProjects/rest_project/sel.py", line 24, in <module>
urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', fo)
File "/usr/lib/python3.6/re.py", line 222, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object
抱怨数据类型应该是字符串。所以,我在文件路径上使用了 str()
,但即使那样也不起作用。
您将 file object
传递给 re.findall
,而不是 string
。您需要将文件读取的结果分配给一个变量并将其传递给 re.findall
.
fo.read().__str__()
应该类似于 lines = fo.read()
urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', fo)
应该是 urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', lines)
我通过执行一些 CLI 实用程序得到了一堆输出信息消息,并且在文件末尾有一个 Web URL。我需要使用 Python 正则表达式来找到 link 并显示为输出。下面是我为我的目的编写的 3 行代码:
file = str('/root/PycharmProjects/rest_project/sponge_link')
with open(file, 'r') as fo:
fo.read().__str__()
urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', fo)
print(urls)
文件内容如下
INFO: Streaming results to http://abc/56659bf3-a66d-482b-80e8-6484cafc650d
INFO: Analyzed target <path/path/path> (73 packages loaded, 10521 targets configured).
INFO: Found 1 target...
Target <path>/dence up-to-date:
utility-<path>/dence_0.0-5_amd64.deb
utility-<path>/dence_0.4-5_amd64.changes
INFO: Elapsed time: 23.669s, Critical Path: 0.47s, Remote (0.00% of the time): [queue: 0.00%, setup: 0.00%, process: 0.00%]
INFO: Build Event Protocol files produced successfully.
INFO: Build completed successfully, 1 total action
INFO: Still uploading to http://abc/56659bf3-a66d-482b-80e8-6484cafc650d
但是,当我执行程序时,出现以下错误:
Traceback (most recent call last):
File "/root/PycharmProjects/rest_project/sel.py", line 24, in <module>
urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', fo)
File "/usr/lib/python3.6/re.py", line 222, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object
抱怨数据类型应该是字符串。所以,我在文件路径上使用了 str()
,但即使那样也不起作用。
您将 file object
传递给 re.findall
,而不是 string
。您需要将文件读取的结果分配给一个变量并将其传递给 re.findall
.
fo.read().__str__()
应该类似于lines = fo.read()
urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', fo)
应该是urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', lines)