使用 selenium 和 Beautifulsoup 执行 javascript 后爬网
Crawling web after executing javascript using selenium and Beautifulsoup
我想在执行完javascript "Click"事件后抓取网页
网页如下所示,
function initPage() {
initCorpInfo();
var Tree = Ext.tree;
var treeRoot = new Tree.TreeNode({
text: "total",
id: "root",
href: "javascript: viewDoc('20150515001896', '4671059', null, null, null, 'dart3.xsd')"
});
treeNode2 = new Tree.TreeNode({
text: "4. financial statement",
id: "17",
cls: "text",
listeners: {
click: function() {viewDoc('20150515001896', '4671059', '17', '1015699', '132786',
});
}
function viewDoc(rcpNo, dcmNo, eleId, offset, length, dtd) {
currentDocValues.rcpNo = rcpNo;
currentDocValues.dcmNo = dcmNo;
currentDocValues.eleId = eleId;
currentDocValues.offset = offset;
currentDocValues.length = length;
currentDocValues.dtd = dtd;
var params = "";
params += "?rcpNo=" + rcpNo;
params += "&dcmNo=" + dcmNo;
if (eleId != null)
params += "&eleId=" + eleId;
if (offset != null)
params += "&offset=" + offset;
if (length != null)
params += "&length=" + length;
params += "&dtd=" + dtd;
document.getElementById("ifrm").src = "/report/viewer.do" + params;
}
查看源代码:http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20150515001896
(点击左侧栏中的 4.재무제표)
我可以使用 selenium 和 beautifulsoup 执行 "click: function() {viewDoc('20150515001896', '4671059', '17', '1015699', '132786'," 吗?
我应该使用 scrapy 而不是 Beautifulsoup 来实现 javascript 的功能吗?
这么简单就解决了
from selenium import webdriver
from bs4 import BeautifulSoup
import time
browser=webdriver.Firefox()
browser.get("http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20150515001896")
soup=BeautifulSoup(browser.page_source)
browser.execute_script("viewDoc('20150515001896', '4671059', '17', '1015699', '132786', 'dart3.xsd');")
我想在执行完javascript "Click"事件后抓取网页
网页如下所示,
function initPage() {
initCorpInfo();
var Tree = Ext.tree;
var treeRoot = new Tree.TreeNode({
text: "total",
id: "root",
href: "javascript: viewDoc('20150515001896', '4671059', null, null, null, 'dart3.xsd')"
});
treeNode2 = new Tree.TreeNode({
text: "4. financial statement",
id: "17",
cls: "text",
listeners: {
click: function() {viewDoc('20150515001896', '4671059', '17', '1015699', '132786',
});
}
function viewDoc(rcpNo, dcmNo, eleId, offset, length, dtd) {
currentDocValues.rcpNo = rcpNo;
currentDocValues.dcmNo = dcmNo;
currentDocValues.eleId = eleId;
currentDocValues.offset = offset;
currentDocValues.length = length;
currentDocValues.dtd = dtd;
var params = "";
params += "?rcpNo=" + rcpNo;
params += "&dcmNo=" + dcmNo;
if (eleId != null)
params += "&eleId=" + eleId;
if (offset != null)
params += "&offset=" + offset;
if (length != null)
params += "&length=" + length;
params += "&dtd=" + dtd;
document.getElementById("ifrm").src = "/report/viewer.do" + params;
}
查看源代码:http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20150515001896 (点击左侧栏中的 4.재무제표)
我可以使用 selenium 和 beautifulsoup 执行 "click: function() {viewDoc('20150515001896', '4671059', '17', '1015699', '132786'," 吗?
我应该使用 scrapy 而不是 Beautifulsoup 来实现 javascript 的功能吗?
这么简单就解决了
from selenium import webdriver from bs4 import BeautifulSoup import time browser=webdriver.Firefox() browser.get("http://dart.fss.or.kr/dsaf001/main.do?rcpNo=20150515001896") soup=BeautifulSoup(browser.page_source) browser.execute_script("viewDoc('20150515001896', '4671059', '17', '1015699', '132786', 'dart3.xsd');")