我想让列表中的每个 url 成为一个字符串
I want to make each url in a list a string
我有 35000 个 url,我不能单独添加 ""(给每个 url)并在每个 url 的末尾添加,[= 中是否有任何快捷键24=] 帮助我 select 列表内代码块中的所有 url,然后字符串,取消 url 的字符串,就像我们可以评论取消注释
我希望 35000 url 看起来像这样:
start_urls=[
'https://dawaai.pk/all-medicines/a',
'https://dawaai.pk/all-medicines/b',
'https://dawaai.pk/all-medicines/c',
'https://dawaai.pk/all-medicines/d',
'https://dawaai.pk/all-medicines/e',
'https://dawaai.pk/all-medicines/f',
'https://dawaai.pk/all-medicines/g',
'https://dawaai.pk/all-medicines/h',
'https://dawaai.pk/all-medicines/i',
'https://dawaai.pk/all-medicines/j',
'https://dawaai.pk/all-medicines/k',
'https://dawaai.pk/all-medicines/l',
'https://dawaai.pk/all-medicines/m',
'https://dawaai.pk/all-medicines/n',
'https://dawaai.pk/all-medicines/o',
'https://dawaai.pk/all-medicines/p',
'https://dawaai.pk/all-medicines/q',
'https://dawaai.pk/all-medicines/r',
'https://dawaai.pk/all-medicines/s',
'https://dawaai.pk/all-medicines/t',
'https://dawaai.pk/all-medicines/u',
'https://dawaai.pk/all-medicines/v',
'https://dawaai.pk/all-medicines/w',
'https://dawaai.pk/all-medicines/x',
'https://dawaai.pk/all-medicines/y',
'https://dawaai.pk/all-medicines/z',
# 'https://dawaai.pk/all-medicines/',
]
这就是抓取工具的当前代码库的样子:
import scrapy
class DawaaiSpider(scrapy.Spider):
name='dawaai'
start_urls=[
https://dawaai.pk/medicine/vitrum-1-38514.html
https://dawaai.pk/medicine/ventek-38552.html
https://dawaai.pk/medicine/valid-1-41158.html
https://dawaai.pk/medicine/verger-2-38699.html
https://dawaai.pk/medicine/valvin-1-38910.html
https://dawaai.pk/medicine/verger-5-38953.html
https://dawaai.pk/medicine/vexnil-8-39028.html
https://dawaai.pk/medicine/virocil-41083.html
https://dawaai.pk/medicine/voltral-emulgel-2-39942.html
https://dawaai.pk/medicine/vasocord-40099.html
https://dawaai.pk/medicine/vasocord-1-40100.html
https://dawaai.pk/medicine/Zestril-Tablet10-55.html
https://dawaai.pk/medicine/Zestril-Tablet20-56.html
https://dawaai.pk/medicine/zultra-1-12104.html
https://dawaai.pk/medicine/Zofrantab-Tablet8-128.html
https://dawaai.pk/medicine/Zeegapcap-Capsule50-176.html
https://dawaai.pk/medicine/Zeegapcap-Capsule75-177.html
https://dawaai.pk/medicine/Zeegapcap-Capsule150-178.html
https://dawaai.pk/medicine/zopent-40mg-590.html
https://dawaai.pk/medicine/zopent-40mg-591.html
https://dawaai.pk/medicine/zoloft-50mg-592.html
https://dawaai.pk/medicine/zocor-10mg-593.html
https://dawaai.pk/medicine/zocor-10mg-594.html
]
def parse(self,response):
for medicine in response.css('div.card-body'):
yield{
'name': medicine.css('a::text').get(),
'price_now': medicine.css('h4::text').get().replace('Rs ','') }
问题是当所有 url 都是一个字符串并且中间有一个逗号时 start_urls
将开始被抓取
有 3 种方法可以做到这一点:-
- 在 VSCode 中使用正则表达式(我不是正则表达式专业人士)
- 将 url 放入文本文件中,让 python 脚本对其进行迭代,将
"
添加到行首和行尾,并将 ,
添加到每行末尾。
- 使用多select :-
转到 URL 的左上角,按住 Alt+Shift
并拖动到 URL 的右下角,这将为您提供多个光标,您可以一次编辑所有行。
然后按左箭头键将所有光标移到左侧,然后输入'
现在再次 select 所有 url,这次按向右箭头键将所有光标移至每行的末尾,然后键入 ',
完成!
我有 35000 个 url,我不能单独添加 ""(给每个 url)并在每个 url 的末尾添加,[= 中是否有任何快捷键24=] 帮助我 select 列表内代码块中的所有 url,然后字符串,取消 url 的字符串,就像我们可以评论取消注释
我希望 35000 url 看起来像这样:
start_urls=[
'https://dawaai.pk/all-medicines/a',
'https://dawaai.pk/all-medicines/b',
'https://dawaai.pk/all-medicines/c',
'https://dawaai.pk/all-medicines/d',
'https://dawaai.pk/all-medicines/e',
'https://dawaai.pk/all-medicines/f',
'https://dawaai.pk/all-medicines/g',
'https://dawaai.pk/all-medicines/h',
'https://dawaai.pk/all-medicines/i',
'https://dawaai.pk/all-medicines/j',
'https://dawaai.pk/all-medicines/k',
'https://dawaai.pk/all-medicines/l',
'https://dawaai.pk/all-medicines/m',
'https://dawaai.pk/all-medicines/n',
'https://dawaai.pk/all-medicines/o',
'https://dawaai.pk/all-medicines/p',
'https://dawaai.pk/all-medicines/q',
'https://dawaai.pk/all-medicines/r',
'https://dawaai.pk/all-medicines/s',
'https://dawaai.pk/all-medicines/t',
'https://dawaai.pk/all-medicines/u',
'https://dawaai.pk/all-medicines/v',
'https://dawaai.pk/all-medicines/w',
'https://dawaai.pk/all-medicines/x',
'https://dawaai.pk/all-medicines/y',
'https://dawaai.pk/all-medicines/z',
# 'https://dawaai.pk/all-medicines/',
]
这就是抓取工具的当前代码库的样子:
import scrapy
class DawaaiSpider(scrapy.Spider):
name='dawaai'
start_urls=[
https://dawaai.pk/medicine/vitrum-1-38514.html
https://dawaai.pk/medicine/ventek-38552.html
https://dawaai.pk/medicine/valid-1-41158.html
https://dawaai.pk/medicine/verger-2-38699.html
https://dawaai.pk/medicine/valvin-1-38910.html
https://dawaai.pk/medicine/verger-5-38953.html
https://dawaai.pk/medicine/vexnil-8-39028.html
https://dawaai.pk/medicine/virocil-41083.html
https://dawaai.pk/medicine/voltral-emulgel-2-39942.html
https://dawaai.pk/medicine/vasocord-40099.html
https://dawaai.pk/medicine/vasocord-1-40100.html
https://dawaai.pk/medicine/Zestril-Tablet10-55.html
https://dawaai.pk/medicine/Zestril-Tablet20-56.html
https://dawaai.pk/medicine/zultra-1-12104.html
https://dawaai.pk/medicine/Zofrantab-Tablet8-128.html
https://dawaai.pk/medicine/Zeegapcap-Capsule50-176.html
https://dawaai.pk/medicine/Zeegapcap-Capsule75-177.html
https://dawaai.pk/medicine/Zeegapcap-Capsule150-178.html
https://dawaai.pk/medicine/zopent-40mg-590.html
https://dawaai.pk/medicine/zopent-40mg-591.html
https://dawaai.pk/medicine/zoloft-50mg-592.html
https://dawaai.pk/medicine/zocor-10mg-593.html
https://dawaai.pk/medicine/zocor-10mg-594.html
]
def parse(self,response):
for medicine in response.css('div.card-body'):
yield{
'name': medicine.css('a::text').get(),
'price_now': medicine.css('h4::text').get().replace('Rs ','') }
问题是当所有 url 都是一个字符串并且中间有一个逗号时 start_urls
将开始被抓取
有 3 种方法可以做到这一点:-
- 在 VSCode 中使用正则表达式(我不是正则表达式专业人士)
- 将 url 放入文本文件中,让 python 脚本对其进行迭代,将
"
添加到行首和行尾,并将,
添加到每行末尾。 - 使用多select :-
转到 URL 的左上角,按住 Alt+Shift
并拖动到 URL 的右下角,这将为您提供多个光标,您可以一次编辑所有行。
然后按左箭头键将所有光标移到左侧,然后输入'
现在再次 select 所有 url,这次按向右箭头键将所有光标移至每行的末尾,然后键入 ',
完成!