在正则表达式的帮助下将这些 url 提取到单独的行中
Extract these url in separate line with the help of Regex
想要移除 ""
,[]
也
"image":["https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg","https://assets.adidas.com/videos/w_600,f_auto,q_auto/dd37d9bb5cd54406b36faa8d00fb8c22_d98c/Continental_80_Shoes_Black_G27707_video.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/690c7ca0531a450187cda91500a0dffa_9366/Continental_80_Shoes_Black_G27707_02_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fde9d7c8cde6427aae8ca91500a0ec61_9366/Continental_80_Shoes_Black_G27707_03_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/01d85160ccd442a59954a91500a120cf_9366/Continental_80_Shoes_Black_G27707_04_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/73c3160607ab42a3816ca91500a12de3_9366/Continental_80_Shoes_Black_G27707_05_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/46bcee5e26084cffb1aba91500a0d487_9366/Continental_80_Shoes_Black_G27707_06_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/5ecde2b36fb4425ca67aa97b012ee1e4_9366/Continental_80_Shoes_Black_G27707_07_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/50f65fa2a60946a8990ba91500a13a53_9366/Continental_80_Shoes_Black_G27707_41_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/3c7047a003154900905fa91500a1449f_9366/Continental_80_Shoes_Black_G27707_42_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fec75a9048404de0b32ca91500a14f19_9366/Continental_80_Shoes_Black_G27707_43_detail.jpg"]
你可以试试这个正则表达式:
import re
s = """
"image":["https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg","https://assets.adidas.com/videos/w_600,f_auto,q_auto/dd37d9bb5cd54406b36faa8d00fb8c22_d98c/Continental_80_Shoes_Black_G27707_video.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/690c7ca0531a450187cda91500a0dffa_9366/Continental_80_Shoes_Black_G27707_02_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fde9d7c8cde6427aae8ca91500a0ec61_9366/Continental_80_Shoes_Black_G27707_03_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/01d85160ccd442a59954a91500a120cf_9366/Continental_80_Shoes_Black_G27707_04_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/73c3160607ab42a3816ca91500a12de3_9366/Continental_80_Shoes_Black_G27707_05_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/46bcee5e26084cffb1aba91500a0d487_9366/Continental_80_Shoes_Black_G27707_06_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/5ecde2b36fb4425ca67aa97b012ee1e4_9366/Continental_80_Shoes_Black_G27707_07_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/50f65fa2a60946a8990ba91500a13a53_9366/Continental_80_Shoes_Black_G27707_41_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/3c7047a003154900905fa91500a1449f_9366/Continental_80_Shoes_Black_G27707_42_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fec75a9048404de0b32ca91500a14f19_9366/Continental_80_Shoes_Black_G27707_43_detail.jpg"]
"""
for url in re.findall("(https?://[^\"]+)", s):
print(url)
输出:
https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg
https://assets.adidas.com/videos/w_600,f_auto,q_auto/dd37d9bb5cd54406b36faa8d00fb8c22_d98c/Continental_80_Shoes_Black_G27707_video.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/690c7ca0531a450187cda91500a0dffa_9366/Continental_80_Shoes_Black_G27707_02_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/fde9d7c8cde6427aae8ca91500a0ec61_9366/Continental_80_Shoes_Black_G27707_03_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/01d85160ccd442a59954a91500a120cf_9366/Continental_80_Shoes_Black_G27707_04_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/73c3160607ab42a3816ca91500a12de3_9366/Continental_80_Shoes_Black_G27707_05_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/46bcee5e26084cffb1aba91500a0d487_9366/Continental_80_Shoes_Black_G27707_06_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/5ecde2b36fb4425ca67aa97b012ee1e4_9366/Continental_80_Shoes_Black_G27707_07_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/50f65fa2a60946a8990ba91500a13a53_9366/Continental_80_Shoes_Black_G27707_41_detail.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/3c7047a003154900905fa91500a1449f_9366/Continental_80_Shoes_Black_G27707_42_detail.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/fec75a9048404de0b32ca91500a14f19_9366/Continental_80_Shoes_Black_G27707_43_detail.jpg
更实用的方法可能是使用 json.loads()
:
import json
s = """
"image":["https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg","https://assets.adidas.com/videos/w_600,f_auto,q_auto/dd37d9bb5cd54406b36faa8d00fb8c22_d98c/Continental_80_Shoes_Black_G27707_video.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/690c7ca0531a450187cda91500a0dffa_9366/Continental_80_Shoes_Black_G27707_02_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fde9d7c8cde6427aae8ca91500a0ec61_9366/Continental_80_Shoes_Black_G27707_03_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/01d85160ccd442a59954a91500a120cf_9366/Continental_80_Shoes_Black_G27707_04_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/73c3160607ab42a3816ca91500a12de3_9366/Continental_80_Shoes_Black_G27707_05_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/46bcee5e26084cffb1aba91500a0d487_9366/Continental_80_Shoes_Black_G27707_06_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/5ecde2b36fb4425ca67aa97b012ee1e4_9366/Continental_80_Shoes_Black_G27707_07_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/50f65fa2a60946a8990ba91500a13a53_9366/Continental_80_Shoes_Black_G27707_41_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/3c7047a003154900905fa91500a1449f_9366/Continental_80_Shoes_Black_G27707_42_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fec75a9048404de0b32ca91500a14f19_9366/Continental_80_Shoes_Black_G27707_43_detail.jpg"]
"""
for url in json.loads(f"{{{s}}}")["image"]:
print(url)
这也会产生相同的输出。
通过网络抓取:
import requests
import re
endpoint = "https://www.adidas.com.au/continental-80-shoes/G27707.html"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
}
response = requests.get(endpoint, headers = headers)
data = response.text
for url in re.findall("(https?://[^\"]+)", data):
if url.endswith(".jpg"):
print(url)
输出:
https://assets.adidas.com/images/h_840,f_auto,q_auto:sensitive,fl_lossy/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg
https://assets.adidas.com/images/h_840,f_auto,q_auto:sensitive,fl_lossy/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg
https://assets.adidas.com/videos/w_600,f_auto,q_auto/dd37d9bb5cd54406b36faa8d00fb8c22_d98c/Continental_80_Shoes_Black_G27707_video.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/690c7ca0531a450187cda91500a0dffa_9366/Continental_80_Shoes_Black_G27707_02_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/fde9d7c8cde6427aae8ca91500a0ec61_9366/Continental_80_Shoes_Black_G27707_03_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/01d85160ccd442a59954a91500a120cf_9366/Continental_80_Shoes_Black_G27707_04_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/73c3160607ab42a3816ca91500a12de3_9366/Continental_80_Shoes_Black_G27707_05_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/46bcee5e26084cffb1aba91500a0d487_9366/Continental_80_Shoes_Black_G27707_06_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/5ecde2b36fb4425ca67aa97b012ee1e4_9366/Continental_80_Shoes_Black_G27707_07_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/50f65fa2a60946a8990ba91500a13a53_9366/Continental_80_Shoes_Black_G27707_41_detail.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/3c7047a003154900905fa91500a1449f_9366/Continental_80_Shoes_Black_G27707_42_detail.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/fec75a9048404de0b32ca91500a14f19_9366/Continental_80_Shoes_Black_G27707_43_detail.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw3836ffca/UB21.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw3836ffca/UB21.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw2168293e/001Predator.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dwba05b161/UB19_FLY_NAV.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dwe71a6606/basketball-nav-image-harden-vol-4.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dwd01baad1/header-redesign/Header_images_training.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw65d0286b/originals-header-nav-09072018.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw8f5aa9d6/adidas-logo-menu-2.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dwb2983276/fos_ath_172x80%20%281%29.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw0157489a/172x80_ASMC_FW20.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw28fcced9/skateheaderline.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw921d9920/skateboarding.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw81dcc2a2/Adidas_2020_Sustainability_nav_headline.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dwded2c44e/Adidas_2020_Sustainability_nav_image.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg
https://assets.adidas.com/images/w_320,f_auto,q_auto:sensitive,fl_lossy/8df5ab4346d7475ebb08a91500a047d3_9366/Continental_80_Shoes_White_G27706_01_standard.jpg
https://assets.adidas.com/images/w_320,f_auto,q_auto:sensitive,fl_lossy/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg
试试下面的代码
import re
s = '''your string'''
URL_RE = re.compile('https://[^"]+')
urls = URL_RE.findall(s)
print(urls)
正则表达式 https://[^"]+
表示匹配 https:// 后跟一个或多个 非双引号 字符。
不太确定您是如何获取数据的,但有些事情是这样的:
data = {"image":["https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg","https://assets.adidas.com/videos/w_600,f_auto,q_auto/dd37d9bb5cd54406b36faa8d00fb8c22_d98c/Continental_80_Shoes_Black_G27707_video.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/690c7ca0531a450187cda91500a0dffa_9366/Continental_80_Shoes_Black_G27707_02_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fde9d7c8cde6427aae8ca91500a0ec61_9366/Continental_80_Shoes_Black_G27707_03_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/01d85160ccd442a59954a91500a120cf_9366/Continental_80_Shoes_Black_G27707_04_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/73c3160607ab42a3816ca91500a12de3_9366/Continental_80_Shoes_Black_G27707_05_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/46bcee5e26084cffb1aba91500a0d487_9366/Continental_80_Shoes_Black_G27707_06_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/5ecde2b36fb4425ca67aa97b012ee1e4_9366/Continental_80_Shoes_Black_G27707_07_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/50f65fa2a60946a8990ba91500a13a53_9366/Continental_80_Shoes_Black_G27707_41_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/3c7047a003154900905fa91500a1449f_9366/Continental_80_Shoes_Black_G27707_42_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fec75a9048404de0b32ca91500a14f19_9366/Continental_80_Shoes_Black_G27707_43_detail.jpg"]}
数据现在是一个带有单个键图像的字典, 再次包含一个 url 列表。
只需像这样访问项目:data["image"][0]
第一项。
或者您可以循环遍历它们:
for image in data["image"]:
#do stuff with image
想要移除 ""
,[]
也
"image":["https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg","https://assets.adidas.com/videos/w_600,f_auto,q_auto/dd37d9bb5cd54406b36faa8d00fb8c22_d98c/Continental_80_Shoes_Black_G27707_video.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/690c7ca0531a450187cda91500a0dffa_9366/Continental_80_Shoes_Black_G27707_02_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fde9d7c8cde6427aae8ca91500a0ec61_9366/Continental_80_Shoes_Black_G27707_03_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/01d85160ccd442a59954a91500a120cf_9366/Continental_80_Shoes_Black_G27707_04_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/73c3160607ab42a3816ca91500a12de3_9366/Continental_80_Shoes_Black_G27707_05_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/46bcee5e26084cffb1aba91500a0d487_9366/Continental_80_Shoes_Black_G27707_06_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/5ecde2b36fb4425ca67aa97b012ee1e4_9366/Continental_80_Shoes_Black_G27707_07_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/50f65fa2a60946a8990ba91500a13a53_9366/Continental_80_Shoes_Black_G27707_41_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/3c7047a003154900905fa91500a1449f_9366/Continental_80_Shoes_Black_G27707_42_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fec75a9048404de0b32ca91500a14f19_9366/Continental_80_Shoes_Black_G27707_43_detail.jpg"]
你可以试试这个正则表达式:
import re
s = """
"image":["https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg","https://assets.adidas.com/videos/w_600,f_auto,q_auto/dd37d9bb5cd54406b36faa8d00fb8c22_d98c/Continental_80_Shoes_Black_G27707_video.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/690c7ca0531a450187cda91500a0dffa_9366/Continental_80_Shoes_Black_G27707_02_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fde9d7c8cde6427aae8ca91500a0ec61_9366/Continental_80_Shoes_Black_G27707_03_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/01d85160ccd442a59954a91500a120cf_9366/Continental_80_Shoes_Black_G27707_04_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/73c3160607ab42a3816ca91500a12de3_9366/Continental_80_Shoes_Black_G27707_05_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/46bcee5e26084cffb1aba91500a0d487_9366/Continental_80_Shoes_Black_G27707_06_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/5ecde2b36fb4425ca67aa97b012ee1e4_9366/Continental_80_Shoes_Black_G27707_07_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/50f65fa2a60946a8990ba91500a13a53_9366/Continental_80_Shoes_Black_G27707_41_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/3c7047a003154900905fa91500a1449f_9366/Continental_80_Shoes_Black_G27707_42_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fec75a9048404de0b32ca91500a14f19_9366/Continental_80_Shoes_Black_G27707_43_detail.jpg"]
"""
for url in re.findall("(https?://[^\"]+)", s):
print(url)
输出:
https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg
https://assets.adidas.com/videos/w_600,f_auto,q_auto/dd37d9bb5cd54406b36faa8d00fb8c22_d98c/Continental_80_Shoes_Black_G27707_video.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/690c7ca0531a450187cda91500a0dffa_9366/Continental_80_Shoes_Black_G27707_02_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/fde9d7c8cde6427aae8ca91500a0ec61_9366/Continental_80_Shoes_Black_G27707_03_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/01d85160ccd442a59954a91500a120cf_9366/Continental_80_Shoes_Black_G27707_04_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/73c3160607ab42a3816ca91500a12de3_9366/Continental_80_Shoes_Black_G27707_05_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/46bcee5e26084cffb1aba91500a0d487_9366/Continental_80_Shoes_Black_G27707_06_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/5ecde2b36fb4425ca67aa97b012ee1e4_9366/Continental_80_Shoes_Black_G27707_07_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/50f65fa2a60946a8990ba91500a13a53_9366/Continental_80_Shoes_Black_G27707_41_detail.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/3c7047a003154900905fa91500a1449f_9366/Continental_80_Shoes_Black_G27707_42_detail.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/fec75a9048404de0b32ca91500a14f19_9366/Continental_80_Shoes_Black_G27707_43_detail.jpg
更实用的方法可能是使用 json.loads()
:
import json
s = """
"image":["https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg","https://assets.adidas.com/videos/w_600,f_auto,q_auto/dd37d9bb5cd54406b36faa8d00fb8c22_d98c/Continental_80_Shoes_Black_G27707_video.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/690c7ca0531a450187cda91500a0dffa_9366/Continental_80_Shoes_Black_G27707_02_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fde9d7c8cde6427aae8ca91500a0ec61_9366/Continental_80_Shoes_Black_G27707_03_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/01d85160ccd442a59954a91500a120cf_9366/Continental_80_Shoes_Black_G27707_04_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/73c3160607ab42a3816ca91500a12de3_9366/Continental_80_Shoes_Black_G27707_05_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/46bcee5e26084cffb1aba91500a0d487_9366/Continental_80_Shoes_Black_G27707_06_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/5ecde2b36fb4425ca67aa97b012ee1e4_9366/Continental_80_Shoes_Black_G27707_07_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/50f65fa2a60946a8990ba91500a13a53_9366/Continental_80_Shoes_Black_G27707_41_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/3c7047a003154900905fa91500a1449f_9366/Continental_80_Shoes_Black_G27707_42_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fec75a9048404de0b32ca91500a14f19_9366/Continental_80_Shoes_Black_G27707_43_detail.jpg"]
"""
for url in json.loads(f"{{{s}}}")["image"]:
print(url)
这也会产生相同的输出。
通过网络抓取:
import requests
import re
endpoint = "https://www.adidas.com.au/continental-80-shoes/G27707.html"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
}
response = requests.get(endpoint, headers = headers)
data = response.text
for url in re.findall("(https?://[^\"]+)", data):
if url.endswith(".jpg"):
print(url)
输出:
https://assets.adidas.com/images/h_840,f_auto,q_auto:sensitive,fl_lossy/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg
https://assets.adidas.com/images/h_840,f_auto,q_auto:sensitive,fl_lossy/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg
https://assets.adidas.com/videos/w_600,f_auto,q_auto/dd37d9bb5cd54406b36faa8d00fb8c22_d98c/Continental_80_Shoes_Black_G27707_video.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/690c7ca0531a450187cda91500a0dffa_9366/Continental_80_Shoes_Black_G27707_02_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/fde9d7c8cde6427aae8ca91500a0ec61_9366/Continental_80_Shoes_Black_G27707_03_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/01d85160ccd442a59954a91500a120cf_9366/Continental_80_Shoes_Black_G27707_04_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/73c3160607ab42a3816ca91500a12de3_9366/Continental_80_Shoes_Black_G27707_05_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/46bcee5e26084cffb1aba91500a0d487_9366/Continental_80_Shoes_Black_G27707_06_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/5ecde2b36fb4425ca67aa97b012ee1e4_9366/Continental_80_Shoes_Black_G27707_07_standard.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/50f65fa2a60946a8990ba91500a13a53_9366/Continental_80_Shoes_Black_G27707_41_detail.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/3c7047a003154900905fa91500a1449f_9366/Continental_80_Shoes_Black_G27707_42_detail.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/fec75a9048404de0b32ca91500a14f19_9366/Continental_80_Shoes_Black_G27707_43_detail.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw3836ffca/UB21.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw3836ffca/UB21.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw2168293e/001Predator.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dwba05b161/UB19_FLY_NAV.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dwe71a6606/basketball-nav-image-harden-vol-4.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dwd01baad1/header-redesign/Header_images_training.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw65d0286b/originals-header-nav-09072018.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw8f5aa9d6/adidas-logo-menu-2.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dwb2983276/fos_ath_172x80%20%281%29.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw0157489a/172x80_ASMC_FW20.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw28fcced9/skateheaderline.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw921d9920/skateboarding.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dw81dcc2a2/Adidas_2020_Sustainability_nav_headline.jpg
https://www.adidas.com.au/on/demandware.static/-/Sites-adidas-AU-Library/default/dwded2c44e/Adidas_2020_Sustainability_nav_image.jpg
https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg
https://assets.adidas.com/images/w_320,f_auto,q_auto:sensitive,fl_lossy/8df5ab4346d7475ebb08a91500a047d3_9366/Continental_80_Shoes_White_G27706_01_standard.jpg
https://assets.adidas.com/images/w_320,f_auto,q_auto:sensitive,fl_lossy/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg
试试下面的代码
import re
s = '''your string'''
URL_RE = re.compile('https://[^"]+')
urls = URL_RE.findall(s)
print(urls)
正则表达式 https://[^"]+
表示匹配 https:// 后跟一个或多个 非双引号 字符。
不太确定您是如何获取数据的,但有些事情是这样的:
data = {"image":["https://assets.adidas.com/images/w_600,f_auto,q_auto/c6f0aede76f849a18a27a91500a0c8c9_9366/Continental_80_Shoes_Black_G27707_01_standard.jpg","https://assets.adidas.com/videos/w_600,f_auto,q_auto/dd37d9bb5cd54406b36faa8d00fb8c22_d98c/Continental_80_Shoes_Black_G27707_video.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/690c7ca0531a450187cda91500a0dffa_9366/Continental_80_Shoes_Black_G27707_02_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fde9d7c8cde6427aae8ca91500a0ec61_9366/Continental_80_Shoes_Black_G27707_03_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/01d85160ccd442a59954a91500a120cf_9366/Continental_80_Shoes_Black_G27707_04_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/73c3160607ab42a3816ca91500a12de3_9366/Continental_80_Shoes_Black_G27707_05_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/46bcee5e26084cffb1aba91500a0d487_9366/Continental_80_Shoes_Black_G27707_06_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/5ecde2b36fb4425ca67aa97b012ee1e4_9366/Continental_80_Shoes_Black_G27707_07_standard.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/50f65fa2a60946a8990ba91500a13a53_9366/Continental_80_Shoes_Black_G27707_41_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/3c7047a003154900905fa91500a1449f_9366/Continental_80_Shoes_Black_G27707_42_detail.jpg","https://assets.adidas.com/images/w_600,f_auto,q_auto/fec75a9048404de0b32ca91500a14f19_9366/Continental_80_Shoes_Black_G27707_43_detail.jpg"]}
数据现在是一个带有单个键图像的字典, 再次包含一个 url 列表。
只需像这样访问项目:data["image"][0]
第一项。
或者您可以循环遍历它们:
for image in data["image"]:
#do stuff with image