Ruby 机械化表单输入字段文本

Ruby Mechanize form input field text

已解决 - "abc = list.scan(/[([^)]+)]/).last.first" 行是正确的,但也包含网站搜索表单不接受的引号。更正为 abc = list.scan(/\"([^)]+)\"/).join.

感谢大家的帮助。


我必须使用 csv 文件中包含 100 个关键字的列表来自动执行搜索。

借助 Mechanize,我可以使用此示例提交搜索 (http://mechanize.rubyforge.org/GUIDE_rdoc.html):

agent = Mechanize.new
page = agent.get('http://google.com/')
google_form = page.form('f')
google_form.q = 'ruby mechanize'
page = agent.submit(google_form)
pp page

但是,当我让它循环遍历 csv 文件时,它 returns 出错(在这个例子中,第一个 csv 条目将是 'ruby mechanize':

#i have already imported the csv list, now it is looping through the array "raw_list"

raw_list.each do |list|
abc = list.scan(/\[([^\)]+)\]/).last.first

# i tested a "puts abc" which returned "ruby mechanize", so I don't understand why the rest of this doesn't work


agent = Mechanize.new
page = agent.get('http://google.com/')
google_form = page.form('f')
google_form.q = abc

#even though abc = "ruby mechanize", an error occurs. 


page = agent.submit(google_form)
pp page

它似乎没有使用变量 "abc",但是如果您手动输入 'ruby mechanize'[=34 就可以了=] 即使两者相同。

出现的错误是:

C:filename: in `block (2 levels) in <top (required)>': undefined method `text' for nil:NilClass (NoMethodError)
from C:/RailsInstaller/Ruby2.0.0/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize.rb:442:in `get'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:23:in `block in <top (required)>'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:19:in `each'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:19:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'

如有任何帮助,我们将不胜感激。

您的错误告诉您代码中第 19 行的某些内容导致了 mechanize 中第 442 行的问题。

我在 IRB 中试用了你的样本,它似乎工作正常:

2.2.2 :001 > require 'mechanize'
 => true 
2.2.2 :002 > agent = Mechanize.new
 => #<Mechanize:...
2.2.2 :003 > page = agent.get('http://google.com/')
 => #<Mechanize::Page
  ...
2.2.2 :004 > google_form = page.form('f')
 => #<Mechanize::Form
 ...
2.2.2 :005 > google_form.q
 => "" 
2.2.2 :006 > abc = "ruby mechanize"
 => "ruby mechanize" 
2.2.2 :007 > google_form.q = abc
 => "ruby mechanize" 
2.2.2 :008 > page = agent.submit(google_form)
 => #<Mechanize::Page
 ...

如果未找到任何内容,扫描将 return 为零,因此您的错误发生在此处:

abc = list.scan(/\[([^\)]+)\]/).last.first

http://ruby-doc.org/stdlib-2.2.0/libdoc/strscan/rdoc/StringScanner.html

您可以将其替换为:

abc = list.scan(/\[([^\)]+)\]/).join

你总是会得到一个字符串,尽管它可能只是 ""。

http://ruby-doc.org/core-2.2.0/Array.html#method-i-join