如何使用 mechanize 和 bs4 更改网页的多个选项

how to change multiple options of webpage using mechanize and bs4

我必须抓取 here 中所有可用的选项。使用 mechanize 我有 select 前两个控件(报告类型和语言)。现在有三个下拉列表。 second 依赖于 first,而 thir 依赖于 second。我该如何解决。我的前两个字段的起始代码如下

import mechanize
from bs4 import BeautifulSoup   
br = mechanize.Browser()

url="http://ceojk.nic.in/ElectionPDF/Main.aspx"
response = br.open(url)
br.select_form(name="Form1")
control_1 = br.form.find_control("RadioButtonList1")
control_2 = br.form.find_control("RadioButtonList2")
submit = br.form.find_control("Button1")

br[control_1.name]=["PS Wise Report"]
br[control_2.name]=["English"]
response = br.submit()
soup=BeautifulSoup(response,'lxml')
for item in soup.find_all('option'):
    print item['value']

好的,调试起来非常令人兴奋(你无法想象我在尝试解决它时尝试并学到了多少东西)。

这是模拟浏览器行为的工作代码,逐步选择第一个地区、AC 和 PS(仅传递 ["1"] 值 - 您可能需要改进它- 例如,阅读选项并添加选项名称 -> 值映射):

import mechanize
from bs4 import BeautifulSoup

br = mechanize.Browser()

url = "http://ceojk.nic.in/ElectionPDF/Main.aspx"
response = br.open(url)

br.select_form(name="Form1")
br["RadioButtonList1"] = ["PS Wise Report"]
br["RadioButtonList2"] = ["English"]
br.submit()

# getting ACs
br.select_form(name="Form1")
br["DistlistP"] = ["1"]
br.submit(name="BtnPs")

# getting PSes
br.select_form(name="Form1")
br["AclistP"] = ["1"]
br.submit(name="BtnPs")

# getting report
br.select_form(name="Form1")
br["PslistP"] = ["1"]
response = br.submit(name="BtnPs")

soup = BeautifulSoup(response)
print(soup.find(id="Pnlfile"))

最后它打印 "File" 块的 HTML 代码,出现在浏览器的右侧。