将 mechanize 与正则表达式结合使用以 select 链接中的特定文本以跟随
Using mechanize with regular expression to select specific text in links to follow
我有一个页面如下:
#<Mechanize::Page::Link
"TCO11_IIIE"
"/me/secure/ViewSample.do?id=211112">
#<Mechanize::Page::Link
"TCO15_IIIE"
"/me/secure/do?id=211113">
#<Mechanize::Page::Link
"TCO16_IIC"
"/me/secure/ViewSample.do?id=211114">
#<Mechanize::Page::Link
"TCO17_IIC"
"/me/secure/ViewSample.do?id=211116">
#<Mechanize::Page::Link
"TCO17_IIIE"
"/me/secure/ViewSample.do?id=211115">
#<Mechanize::Page::Link
"TCO19_IID"
"/me/secure/ViewSample.do?id=211117">
#<Mechanize::Page::Link
"TCO21_IIC"
"/me/secure/ViewSample.do?id=211118">
#<Mechanize::Page::Link
"TCO21_IIIE"
"/me/secure/do?id=211119">
#<Mechanize::Page::Link
"TCO23_IIC"
"/me/secure/do?id=211120">
我正在编写一个脚本来尝试跟踪其中包含 'ViewSample' 的 links(然后下载以 fq 结尾但与此问题无关的特定 links ).
我对如何执行此操作感到有点困惑,因为我认为方法 .search
和 .links_with
需要整个 link 文本的确切字符串(?或者它是href???)。所以我想我需要在下面代码的第一行中使用正则表达式:
master_page.search("ViewSample") do |download_list_link|
download_list_page = agent.get(download_list_link[:href])
download_list_page.search("td > a") do |link|
if link.content.include?("fq.gz")
out_file = File.new("downloaded_file", "w")
out_file.puts($agent.get_file(link[:href]))
out_file.close
end
end
end
这就是 select
的用途:
page.links.select{|link| link.href[/ViewSample/]}
或
page.search('a').select{|a| a[:href][/ViewSample/]}
我有一个页面如下:
#<Mechanize::Page::Link
"TCO11_IIIE"
"/me/secure/ViewSample.do?id=211112">
#<Mechanize::Page::Link
"TCO15_IIIE"
"/me/secure/do?id=211113">
#<Mechanize::Page::Link
"TCO16_IIC"
"/me/secure/ViewSample.do?id=211114">
#<Mechanize::Page::Link
"TCO17_IIC"
"/me/secure/ViewSample.do?id=211116">
#<Mechanize::Page::Link
"TCO17_IIIE"
"/me/secure/ViewSample.do?id=211115">
#<Mechanize::Page::Link
"TCO19_IID"
"/me/secure/ViewSample.do?id=211117">
#<Mechanize::Page::Link
"TCO21_IIC"
"/me/secure/ViewSample.do?id=211118">
#<Mechanize::Page::Link
"TCO21_IIIE"
"/me/secure/do?id=211119">
#<Mechanize::Page::Link
"TCO23_IIC"
"/me/secure/do?id=211120">
我正在编写一个脚本来尝试跟踪其中包含 'ViewSample' 的 links(然后下载以 fq 结尾但与此问题无关的特定 links ).
我对如何执行此操作感到有点困惑,因为我认为方法 .search
和 .links_with
需要整个 link 文本的确切字符串(?或者它是href???)。所以我想我需要在下面代码的第一行中使用正则表达式:
master_page.search("ViewSample") do |download_list_link|
download_list_page = agent.get(download_list_link[:href])
download_list_page.search("td > a") do |link|
if link.content.include?("fq.gz")
out_file = File.new("downloaded_file", "w")
out_file.puts($agent.get_file(link[:href]))
out_file.close
end
end
end
这就是 select
的用途:
page.links.select{|link| link.href[/ViewSample/]}
或
page.search('a').select{|a| a[:href][/ViewSample/]}