删除最后一个正斜杠之前的部分字符串

Question

我目前正在开发的程序从网站检索 URLs 并将它们放入列表中。我要得到的是URL的最后一段。

因此，如果我的 URL 列表中的第一个元素是 "https://docs.python.org/3.4/tutorial/interpreter.html"，我想删除 "interpreter.html" 之前的所有内容。

是否有函数、库或正则表达式可用于实现此目的？我查看了其他 Stack Overflow 帖子，但解决方案似乎不起作用。

这是我几次尝试中的两个：

for link in link_list:
   file_names.append(link.replace('/[^/]*$',''))
print(file_names)

&

for link in link_list:
   file_names.append(link.rpartition('//')[-1])
print(file_names)

Answer 1

看看str.rsplit。

>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rsplit('/',1)
['https://docs.python.org/3.4/tutorial', 'interpreter.html']
>>> s.rsplit('/',1)[1]
'interpreter.html'

并使用正则表达式

>>> re.search(r'(.*)/(.*)',s).group(2)
'interpreter.html'

然后匹配位于最后 / 和 String 末尾之间的第二组。这是RegEx中贪婪技术的贪婪用法。

Debuggex Demo

小记 - 您的代码中 link.rpartition('//')[-1] 的问题是您试图匹配 // 而不是 /。所以删除额外的 / 如 link.rpartition('/')[-1].

Answer 2

不需要正则表达式。

import os

for link in link_list:
    file_names.append(os.path.basename(link))

Answer 3

只需使用string.split:

url = "/some/url/with/a/file.html"

print url.split("/")[-1]

# Result should be "file.html"

split 为您提供一个由“/”分隔的字符串数组。 [-1] 为您提供数组中的最后一个元素，这就是您想要的。

Answer 4

如果您打算使用正则表达式，这应该有效

 for link in link_list:
    file_names.append(link.replace('.*/',''))
 print(file_names)

Answer 5

您可以使用 rpartition():

>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rpartition('/')
('https://docs.python.org/3.4/tutorial', '/', 'interpreter.html')

并取返回的 3 元素元组的最后一部分：

>>> s.rpartition('/')[2]
'interpreter.html'

Answer 6

这是一个更通用的正则表达式方法：

    re.sub(r'^.+/([^/]+)$', r'', "http://test.org/3/files/interpreter.html")
    'interpreter.html'

删除最后一个正斜杠之前的部分字符串

Remove Part of String Before the Last Forward Slash

python

regex

string

replace