如何仅从字符串中获取 URL
How to get only URL from a string
https://www.some.com/7e3a729f86efd33fe9c727b02cdcc44692bf8520?redirect=http%3A%2F%2Fwww.danforthmainstreetclinic.ca%2F
如何只得到danforthmainstreetclinic.ca
link 总是这样变化: https://www.some.com/8343b54b1a55dbf1a003af0d0c7e9ba4ea762245?redirect=http%3A%2F%2Ffacebook.com%2F782596948538540
只需要facebook.com/782596948538540
如何格式化 %2F = /
, %3A= :
或其他格式。
您可以使用 unquote
来自 urllib.parse
将 %xx
转义符替换为其等效的单字符。
from urllib.parse import unquote
res = unquote(url).split('redirect=')[-1]
结果:
'http://facebook.com/782596948538540'
试试这个,
import urllib.parse
url = "https://www.some.com/8343b54b1a55dbf1a003af0d0c7e9ba4ea762245?redirect=http%3A%2F%2Ffacebook.com%2F782596948538540".split('redirect=')[-1]
print(urllib.parse.unquote(url))
输出:
您可以使用这个直接的答案:
url = 'https://www.some.com/8343b54b1a55dbf1a003af0d0c7e9ba4ea762245?
redirect=http%3A%2F%2Ffacebook.com%2F782596948538540'
url_split = url.split('%')
new_url = url_split[-2].strip('2F') + '/'+ url_split[-1].strip('2F')
print(new_url)
$ facebook.com/782596948538540
是这样的吗?
import re
urls = [
"https://www.some.com/7e3a729f86efd33fe9c727b02cdcc44692bf8520?redirect=http%3A%2F%2Fwww.danforthmainstreetclinic.ca%2F",
"https://www.some.com/8343b54b1a55dbf1a003af0d0c7e9ba4ea762245?redirect=http%3A%2F%2Ffacebook.com%2F782596948538540"
]
pattern = r'.+http%3A%2F%2F(.+)'
regex = re.compile(pattern)
for url in urls:
matched = regex.match(url);
if (matched):
found_url = matched.groups()[0]
print(re.sub(r'%2F', '/', found_url))
www.danforthmainstreetclinic.ca/
facebook.com/782596948538540
https://www.some.com/7e3a729f86efd33fe9c727b02cdcc44692bf8520?redirect=http%3A%2F%2Fwww.danforthmainstreetclinic.ca%2F
如何只得到danforthmainstreetclinic.ca
link 总是这样变化: https://www.some.com/8343b54b1a55dbf1a003af0d0c7e9ba4ea762245?redirect=http%3A%2F%2Ffacebook.com%2F782596948538540
只需要facebook.com/782596948538540
如何格式化 %2F = /
, %3A= :
或其他格式。
您可以使用 unquote
来自 urllib.parse
将 %xx
转义符替换为其等效的单字符。
from urllib.parse import unquote
res = unquote(url).split('redirect=')[-1]
结果:
'http://facebook.com/782596948538540'
试试这个,
import urllib.parse
url = "https://www.some.com/8343b54b1a55dbf1a003af0d0c7e9ba4ea762245?redirect=http%3A%2F%2Ffacebook.com%2F782596948538540".split('redirect=')[-1]
print(urllib.parse.unquote(url))
输出:
您可以使用这个直接的答案:
url = 'https://www.some.com/8343b54b1a55dbf1a003af0d0c7e9ba4ea762245?
redirect=http%3A%2F%2Ffacebook.com%2F782596948538540'
url_split = url.split('%')
new_url = url_split[-2].strip('2F') + '/'+ url_split[-1].strip('2F')
print(new_url)
$ facebook.com/782596948538540
是这样的吗?
import re
urls = [
"https://www.some.com/7e3a729f86efd33fe9c727b02cdcc44692bf8520?redirect=http%3A%2F%2Fwww.danforthmainstreetclinic.ca%2F",
"https://www.some.com/8343b54b1a55dbf1a003af0d0c7e9ba4ea762245?redirect=http%3A%2F%2Ffacebook.com%2F782596948538540"
]
pattern = r'.+http%3A%2F%2F(.+)'
regex = re.compile(pattern)
for url in urls:
matched = regex.match(url);
if (matched):
found_url = matched.groups()[0]
print(re.sub(r'%2F', '/', found_url))
www.danforthmainstreetclinic.ca/
facebook.com/782596948538540