如何删除 link 的开头并在 python 中添加正斜杠
how to remove the start of a link and add a forward slash in python
我有一些网站链接是我从网站上抓取的,问题是这些链接并不完全正确,因为它们不会自动下载数据,除非我 two 变化:
1) 我去掉了开头的VM300:1
2) 我在 .au
后面放了一个 /
有没有办法自动执行此操作?我有大约一千个链接,因此最好不要手动执行此操作。
下面是我的 url 的
的例子
urls = [
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0011/172775/Market_Information_System_Control_daily_trading_day_190130.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0004/172732/Market_Information_System_Control_daily_trading_day_190129.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0010/172675/Market_Information_System_Control_daily_trading_day_190128.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0009/172674/Market_Information_System_Control_daily_trading_day_190127.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0008/172673/Market_Information_System_Control_daily_trading_day_190126.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0007/172672/Market_Information_System_Control_daily_trading_day_190125.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0011/172595/Market_Information_System_Control_daily_trading_day_190124.xlsx"
]
编辑1
from pathlib import Path
import requests
urls = [
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0011/172775/Market_Information_System_Control_daily_trading_day_190130.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0004/172732/Market_Information_System_Control_daily_trading_day_190129.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0010/172675/Market_Information_System_Control_daily_trading_day_190128.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0009/172674/Market_Information_System_Control_daily_trading_day_190127.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0008/172673/Market_Information_System_Control_daily_trading_day_190126.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0007/172672/Market_Information_System_Control_daily_trading_day_190125.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0011/172595/Market_Information_System_Control_daily_trading_day_190124.xlsx"
]
urls = [x.replace('VM300:1 ','').replace('.au__', '.au/__') for x in urls]
for url in urls:
r = requests.get(urls)
with open(Path(urls).name, 'wb') as f:
f.write(r.content)
错误:
Traceback (most recent call last):
File "C:/Users/george/Desktop/NT/stack NT.py", line 19, in <module>
r = requests.get(urls)
File "C:\Python27\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 640, in send
adapter = self.get_adapter(url=request.url)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 731, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
InvalidSchema: No connection adapters were found for '['https://www.powerwater.com.au/__data/assets/excel_doc/0011/172775/Market_Information_System_Control_daily_trading_day_190130.xlsx', 'https://www.powerwater.com.au/__data/assets/excel_doc/0004/172732/Market_Information_System_Control_daily_trading_day_190129.xlsx', 'https://www.powerwater.com.au/__data/assets/excel_doc/0010/172675/Market_Information_System_Control_daily_trading_day_190128.xlsx', 'https://www.powerwater.com.au/__data/assets/excel_doc/0009/172674/Market_Information_System_Control_daily_trading_day_190127.xlsx', 'https://www.powerwater.com.au/__data/assets/excel_doc/0008/172673/Market_Information_System_Control_daily_trading_day_190126.xlsx', 'https://www.powerwater.com.au/__data/assets/excel_doc/0007/172672/Market_Information_System_Control_daily_trading_day_190125.xlsx', 'https://www.powerwater.com.au/__data/assets/excel_doc/0011/172595/Market_Information_System_Control_daily_trading_day_190124.xlsx']'
谢谢
将列表理解与 split
和 replace
结合使用:
urls = [x.split()[1].replace('.au__', '.au/__') for x in urls]
双replace
的另一个想法:
urls = [x.replace('VM300:1 ','').replace('.au__', '.au/__') for x in urls]
我有一些网站链接是我从网站上抓取的,问题是这些链接并不完全正确,因为它们不会自动下载数据,除非我 two 变化:
1) 我去掉了开头的VM300:1
2) 我在 .au
/
有没有办法自动执行此操作?我有大约一千个链接,因此最好不要手动执行此操作。
下面是我的 url 的
的例子urls = [
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0011/172775/Market_Information_System_Control_daily_trading_day_190130.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0004/172732/Market_Information_System_Control_daily_trading_day_190129.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0010/172675/Market_Information_System_Control_daily_trading_day_190128.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0009/172674/Market_Information_System_Control_daily_trading_day_190127.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0008/172673/Market_Information_System_Control_daily_trading_day_190126.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0007/172672/Market_Information_System_Control_daily_trading_day_190125.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0011/172595/Market_Information_System_Control_daily_trading_day_190124.xlsx"
]
编辑1
from pathlib import Path
import requests
urls = [
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0011/172775/Market_Information_System_Control_daily_trading_day_190130.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0004/172732/Market_Information_System_Control_daily_trading_day_190129.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0010/172675/Market_Information_System_Control_daily_trading_day_190128.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0009/172674/Market_Information_System_Control_daily_trading_day_190127.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0008/172673/Market_Information_System_Control_daily_trading_day_190126.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0007/172672/Market_Information_System_Control_daily_trading_day_190125.xlsx",
"VM300:1 https://www.powerwater.com.au__data/assets/excel_doc/0011/172595/Market_Information_System_Control_daily_trading_day_190124.xlsx"
]
urls = [x.replace('VM300:1 ','').replace('.au__', '.au/__') for x in urls]
for url in urls:
r = requests.get(urls)
with open(Path(urls).name, 'wb') as f:
f.write(r.content)
错误:
Traceback (most recent call last):
File "C:/Users/george/Desktop/NT/stack NT.py", line 19, in <module>
r = requests.get(urls)
File "C:\Python27\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 640, in send
adapter = self.get_adapter(url=request.url)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 731, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
InvalidSchema: No connection adapters were found for '['https://www.powerwater.com.au/__data/assets/excel_doc/0011/172775/Market_Information_System_Control_daily_trading_day_190130.xlsx', 'https://www.powerwater.com.au/__data/assets/excel_doc/0004/172732/Market_Information_System_Control_daily_trading_day_190129.xlsx', 'https://www.powerwater.com.au/__data/assets/excel_doc/0010/172675/Market_Information_System_Control_daily_trading_day_190128.xlsx', 'https://www.powerwater.com.au/__data/assets/excel_doc/0009/172674/Market_Information_System_Control_daily_trading_day_190127.xlsx', 'https://www.powerwater.com.au/__data/assets/excel_doc/0008/172673/Market_Information_System_Control_daily_trading_day_190126.xlsx', 'https://www.powerwater.com.au/__data/assets/excel_doc/0007/172672/Market_Information_System_Control_daily_trading_day_190125.xlsx', 'https://www.powerwater.com.au/__data/assets/excel_doc/0011/172595/Market_Information_System_Control_daily_trading_day_190124.xlsx']'
谢谢
将列表理解与 split
和 replace
结合使用:
urls = [x.split()[1].replace('.au__', '.au/__') for x in urls]
双replace
的另一个想法:
urls = [x.replace('VM300:1 ','').replace('.au__', '.au/__') for x in urls]