Ruby 机械化表单输入字段文本
Ruby Mechanize form input field text
已解决 - "abc = list.scan(/[([^)]+)]/).last.first" 行是正确的,但也包含网站搜索表单不接受的引号。更正为 abc = list.scan(/\"([^)]+)\"/).join.
感谢大家的帮助。
我必须使用 csv 文件中包含 100 个关键字的列表来自动执行搜索。
借助 Mechanize,我可以使用此示例提交搜索 (http://mechanize.rubyforge.org/GUIDE_rdoc.html):
agent = Mechanize.new
page = agent.get('http://google.com/')
google_form = page.form('f')
google_form.q = 'ruby mechanize'
page = agent.submit(google_form)
pp page
但是,当我让它循环遍历 csv 文件时,它 returns 出错(在这个例子中,第一个 csv 条目将是 'ruby mechanize':
#i have already imported the csv list, now it is looping through the array "raw_list"
raw_list.each do |list|
abc = list.scan(/\[([^\)]+)\]/).last.first
# i tested a "puts abc" which returned "ruby mechanize", so I don't understand why the rest of this doesn't work
agent = Mechanize.new
page = agent.get('http://google.com/')
google_form = page.form('f')
google_form.q = abc
#even though abc = "ruby mechanize", an error occurs.
page = agent.submit(google_form)
pp page
它似乎没有使用变量 "abc",但是如果您手动输入 'ruby mechanize'[=34 就可以了=] 即使两者相同。
出现的错误是:
C:filename: in `block (2 levels) in <top (required)>': undefined method `text' for nil:NilClass (NoMethodError)
from C:/RailsInstaller/Ruby2.0.0/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize.rb:442:in `get'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:23:in `block in <top (required)>'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:19:in `each'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:19:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
如有任何帮助,我们将不胜感激。
您的错误告诉您代码中第 19 行的某些内容导致了 mechanize 中第 442 行的问题。
我在 IRB 中试用了你的样本,它似乎工作正常:
2.2.2 :001 > require 'mechanize'
=> true
2.2.2 :002 > agent = Mechanize.new
=> #<Mechanize:...
2.2.2 :003 > page = agent.get('http://google.com/')
=> #<Mechanize::Page
...
2.2.2 :004 > google_form = page.form('f')
=> #<Mechanize::Form
...
2.2.2 :005 > google_form.q
=> ""
2.2.2 :006 > abc = "ruby mechanize"
=> "ruby mechanize"
2.2.2 :007 > google_form.q = abc
=> "ruby mechanize"
2.2.2 :008 > page = agent.submit(google_form)
=> #<Mechanize::Page
...
如果未找到任何内容,扫描将 return 为零,因此您的错误发生在此处:
abc = list.scan(/\[([^\)]+)\]/).last.first
http://ruby-doc.org/stdlib-2.2.0/libdoc/strscan/rdoc/StringScanner.html
您可以将其替换为:
abc = list.scan(/\[([^\)]+)\]/).join
你总是会得到一个字符串,尽管它可能只是 ""。
已解决 - "abc = list.scan(/[([^)]+)]/).last.first" 行是正确的,但也包含网站搜索表单不接受的引号。更正为 abc = list.scan(/\"([^)]+)\"/).join.
感谢大家的帮助。
我必须使用 csv 文件中包含 100 个关键字的列表来自动执行搜索。
借助 Mechanize,我可以使用此示例提交搜索 (http://mechanize.rubyforge.org/GUIDE_rdoc.html):
agent = Mechanize.new
page = agent.get('http://google.com/')
google_form = page.form('f')
google_form.q = 'ruby mechanize'
page = agent.submit(google_form)
pp page
但是,当我让它循环遍历 csv 文件时,它 returns 出错(在这个例子中,第一个 csv 条目将是 'ruby mechanize':
#i have already imported the csv list, now it is looping through the array "raw_list"
raw_list.each do |list|
abc = list.scan(/\[([^\)]+)\]/).last.first
# i tested a "puts abc" which returned "ruby mechanize", so I don't understand why the rest of this doesn't work
agent = Mechanize.new
page = agent.get('http://google.com/')
google_form = page.form('f')
google_form.q = abc
#even though abc = "ruby mechanize", an error occurs.
page = agent.submit(google_form)
pp page
它似乎没有使用变量 "abc",但是如果您手动输入 'ruby mechanize'[=34 就可以了=] 即使两者相同。
出现的错误是:
C:filename: in `block (2 levels) in <top (required)>': undefined method `text' for nil:NilClass (NoMethodError)
from C:/RailsInstaller/Ruby2.0.0/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize.rb:442:in `get'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:23:in `block in <top (required)>'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:19:in `each'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:19:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'
如有任何帮助,我们将不胜感激。
您的错误告诉您代码中第 19 行的某些内容导致了 mechanize 中第 442 行的问题。
我在 IRB 中试用了你的样本,它似乎工作正常:
2.2.2 :001 > require 'mechanize'
=> true
2.2.2 :002 > agent = Mechanize.new
=> #<Mechanize:...
2.2.2 :003 > page = agent.get('http://google.com/')
=> #<Mechanize::Page
...
2.2.2 :004 > google_form = page.form('f')
=> #<Mechanize::Form
...
2.2.2 :005 > google_form.q
=> ""
2.2.2 :006 > abc = "ruby mechanize"
=> "ruby mechanize"
2.2.2 :007 > google_form.q = abc
=> "ruby mechanize"
2.2.2 :008 > page = agent.submit(google_form)
=> #<Mechanize::Page
...
如果未找到任何内容,扫描将 return 为零,因此您的错误发生在此处:
abc = list.scan(/\[([^\)]+)\]/).last.first
http://ruby-doc.org/stdlib-2.2.0/libdoc/strscan/rdoc/StringScanner.html
您可以将其替换为:
abc = list.scan(/\[([^\)]+)\]/).join
你总是会得到一个字符串,尽管它可能只是 ""。