如何通过 python 正则表达式从文本文件中搜索 href?

how to search for a href from a text file through python regEx?

我通过执行一些 CLI 实用程序得到了一堆输出信息消息,并且在文件末尾有一个 Web URL。我需要使用 Python 正则表达式来找到 link 并显示为输出。下面是我为我的目的编写的 3 行代码:

file = str('/root/PycharmProjects/rest_project/sponge_link')

with open(file, 'r') as fo:
    fo.read().__str__()
    urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', fo)
    print(urls)

文件内容如下

INFO: Streaming results to http://abc/56659bf3-a66d-482b-80e8-6484cafc650d
INFO: Analyzed target <path/path/path> (73 packages loaded, 10521 targets configured).
INFO: Found 1 target...
Target <path>/dence up-to-date:
 utility-<path>/dence_0.0-5_amd64.deb
 utility-<path>/dence_0.4-5_amd64.changes
INFO: Elapsed time: 23.669s, Critical Path: 0.47s, Remote (0.00% of the time): [queue: 0.00%, setup: 0.00%, process: 0.00%]
INFO: Build Event Protocol files produced successfully.
INFO: Build completed successfully, 1 total action
INFO: Still uploading to http://abc/56659bf3-a66d-482b-80e8-6484cafc650d

但是,当我执行程序时,出现以下错误:

Traceback (most recent call last):
  File "/root/PycharmProjects/rest_project/sel.py", line 24, in <module>
    urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', fo)
  File "/usr/lib/python3.6/re.py", line 222, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object

抱怨数据类型应该是字符串。所以,我在文件路径上使用了 str(),但即使那样也不起作用。

您将 file object 传递给 re.findall,而不是 string。您需要将文件读取的结果分配给一个变量并将其传递给 re.findall.

  1. fo.read().__str__() 应该类似于 lines = fo.read()
  2. urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', fo) 应该是 urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', lines)