如何等待页面加载完成?
How to wait for page load to complete?
获取可用的靴子尺寸(低于 $('option.addedOption'))
我尝试了下面的代码,但它总是在获取大小之前返回。
# config.url = 'http://www.neimanmarcus.com/Stuart-Weitzman-Reserve-Suede-Over-the-Knee-Boot-Black/prod179890262/p.prod'
import urllib2
import requests
import config
import time
from lxml.cssselect import CSSSelector
from lxml.html import fromstring
print config.url
headers = {
"Host": "www.neimanmarcus.com",
"Connection": "keep-alive",
"Content-Length": 106,
"Pragma": "no-cache",
"Cache-Control": "no-cache",
"Accept": "*/*",
"Origin": "http://www.neimanmarcus.com",
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Referer": "http://www.neimanmarcus.com/Stuart-Weitzman-Reserve-Suede-Over-the-Knee-Boot-Black/prod179890262/p.prod",
"Accept-Language": "en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4,fr;q=0.2,cs;q=0.2,zh-TW;q=0.2"
}
request = urllib2.Request(config.url, headers=headers)
html = urllib2.urlopen(request)
time.sleep(10)
html = html.read()
print html
html = fromstring(html)
sel = CSSSelector('option.addedOption')
try:
options = sel(html)
print options
except Exception as e:
print e
我发现size是在一个请求'http://www.neimanmarcus.com/product.service'中得到的(实际上Header是根据这个请求的请求头创建的)。
如何获取整个页面信息(尤其是引导大小)?
我也尝试过直接请求 http://www.neimanmarcus.com/product.service 但也失败了。
像这样使用它:
with urllib2.urlopen(request) as response:
html = response.read()
print html
html = fromstring(html)
sel = CSSSelector('option.addedOption')
try:
options = sel(html)
print options
except Exception as e:
print e
而不是
html = urllib2.urlopen(request)
time.sleep(10)
html = html.read()
...
我的理解是正确的:无论代码休眠多久它仍然没有加载鞋码?
由于您没有使用无头浏览器,因此您不会在请求的页面上执行 javascript。尝试使用像 PhantomJS. Here a list of more headless browsers.
这样的无头浏览器
这是一种使用方法PhantomJS in Python。
我尝试了下面的代码,但它总是在获取大小之前返回。
# config.url = 'http://www.neimanmarcus.com/Stuart-Weitzman-Reserve-Suede-Over-the-Knee-Boot-Black/prod179890262/p.prod'
import urllib2
import requests
import config
import time
from lxml.cssselect import CSSSelector
from lxml.html import fromstring
print config.url
headers = {
"Host": "www.neimanmarcus.com",
"Connection": "keep-alive",
"Content-Length": 106,
"Pragma": "no-cache",
"Cache-Control": "no-cache",
"Accept": "*/*",
"Origin": "http://www.neimanmarcus.com",
"X-Requested-With": "XMLHttpRequest",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Referer": "http://www.neimanmarcus.com/Stuart-Weitzman-Reserve-Suede-Over-the-Knee-Boot-Black/prod179890262/p.prod",
"Accept-Language": "en-US,en;q=0.8,zh-CN;q=0.6,zh;q=0.4,fr;q=0.2,cs;q=0.2,zh-TW;q=0.2"
}
request = urllib2.Request(config.url, headers=headers)
html = urllib2.urlopen(request)
time.sleep(10)
html = html.read()
print html
html = fromstring(html)
sel = CSSSelector('option.addedOption')
try:
options = sel(html)
print options
except Exception as e:
print e
我发现size是在一个请求'http://www.neimanmarcus.com/product.service'中得到的(实际上Header是根据这个请求的请求头创建的)。
如何获取整个页面信息(尤其是引导大小)?
我也尝试过直接请求 http://www.neimanmarcus.com/product.service 但也失败了。
像这样使用它:
with urllib2.urlopen(request) as response:
html = response.read()
print html
html = fromstring(html)
sel = CSSSelector('option.addedOption')
try:
options = sel(html)
print options
except Exception as e:
print e
而不是
html = urllib2.urlopen(request)
time.sleep(10)
html = html.read()
...
我的理解是正确的:无论代码休眠多久它仍然没有加载鞋码?
由于您没有使用无头浏览器,因此您不会在请求的页面上执行 javascript。尝试使用像 PhantomJS. Here a list of more headless browsers.
这样的无头浏览器这是一种使用方法PhantomJS in Python。