如何使用 python 下载 csv 列表?
How to download a list of csv using python?
我是 运行 一个 python 程序,用于从 canada.ca 下载选定的 CSV 文件列表。我有我需要的所有网址,但我不知道如何将它们下载到我的本地目录。我相信我必须使用请求,并循环写入文件。但我有点迷失了如何去做,提前致谢。
en_urls = []
for link in soup.find_all('a'):
if 'EN.csv' in link.get('href', []):
en_urls.append(link.get('href'))
Output
['http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/Positive_Employers_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2015_Positive_Employers_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2016_Positive_Employer_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q1Q2_Positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q3_Positive_Employer_Stream_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2018Q1_Positive_Employer_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2018Q2_Positive_Employer_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q4_Positive_Employer_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2018Q3_Positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2018Q4_Positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2019Q1_employer_positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2019Q2_employer_positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2019Q3_Positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2019Q4_Positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2020Q1_Positive_EN.csv']
您可以在循环中使用 urllib.request.urlretrieve()
。
例如:
import urllib.request
lst = ['http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/Positive_Employers_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2015_Positive_Employers_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2016_Positive_Employer_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q1Q2_Positive_EN.csv']
for i in lst:
print('Downloading {}..'.format(i))
local_filename, _ = urllib.request.urlretrieve(i, filename=i.split('/')[-1])
print('File saved as {}'.format(local_filename))
打印:
Downloading http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/Positive_Employers_EN.csv..
File saved as Positive_Employers_EN.csv
Downloading http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2015_Positive_Employers_EN.csv..
File saved as 2015_Positive_Employers_EN.csv
Downloading http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2016_Positive_Employer_EN.csv..
File saved as 2016_Positive_Employer_EN.csv
Downloading http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q1Q2_Positive_EN.csv..
File saved as 2017Q1Q2_Positive_EN.csv
试试这个:
en_urls = []
for link in soup.find_all('a'):
if 'EN.csv' in link.get('href', []):
en_urls.append(link.get('href'))
for link in en_urls:
with open(f'{link.split("/")[-1]}', 'wb') as file:
r = requests.get(link, stream=True)
if r.ok:
for block in r.iter_content(2*1024**2):
file.write(block)
else:
print(f'Download faild on {link} with {r}')
我是 运行 一个 python 程序,用于从 canada.ca 下载选定的 CSV 文件列表。我有我需要的所有网址,但我不知道如何将它们下载到我的本地目录。我相信我必须使用请求,并循环写入文件。但我有点迷失了如何去做,提前致谢。
en_urls = []
for link in soup.find_all('a'):
if 'EN.csv' in link.get('href', []):
en_urls.append(link.get('href'))
Output
['http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/Positive_Employers_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2015_Positive_Employers_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2016_Positive_Employer_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q1Q2_Positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q3_Positive_Employer_Stream_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2018Q1_Positive_Employer_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2018Q2_Positive_Employer_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q4_Positive_Employer_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2018Q3_Positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2018Q4_Positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2019Q1_employer_positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2019Q2_employer_positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2019Q3_Positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2019Q4_Positive_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/imt-lmi/TFWP_2020Q1_Positive_EN.csv']
您可以在循环中使用 urllib.request.urlretrieve()
。
例如:
import urllib.request
lst = ['http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/Positive_Employers_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2015_Positive_Employers_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2016_Positive_Employer_EN.csv',
'http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q1Q2_Positive_EN.csv']
for i in lst:
print('Downloading {}..'.format(i))
local_filename, _ = urllib.request.urlretrieve(i, filename=i.split('/')[-1])
print('File saved as {}'.format(local_filename))
打印:
Downloading http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/Positive_Employers_EN.csv..
File saved as Positive_Employers_EN.csv
Downloading http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2015_Positive_Employers_EN.csv..
File saved as 2015_Positive_Employers_EN.csv
Downloading http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2016_Positive_Employer_EN.csv..
File saved as 2016_Positive_Employer_EN.csv
Downloading http://www.edsc-esdc.gc.ca/ouvert-open/bca-seb/ae-ei/2017Q1Q2_Positive_EN.csv..
File saved as 2017Q1Q2_Positive_EN.csv
试试这个:
en_urls = []
for link in soup.find_all('a'):
if 'EN.csv' in link.get('href', []):
en_urls.append(link.get('href'))
for link in en_urls:
with open(f'{link.split("/")[-1]}', 'wb') as file:
r = requests.get(link, stream=True)
if r.ok:
for block in r.iter_content(2*1024**2):
file.write(block)
else:
print(f'Download faild on {link} with {r}')