Python Craiglist 抓取显示空列表
Python Craiglist Scraping shows empty list
您好,我正在使用以下代码来抓取 craiglist。
import pandas as pd
import requests
%pylab inline
url_base = 'http://houston.craigslist.org/search/apa'
params = dict(bedrooms=2)
rsp = requests.get(url_base, params=params)
print(rsp.text[:500])
from bs4 import BeautifulSoup as bs4
html = bs4(rsp.text, 'html.parser')
print(html.prettify()[:1000])
以上一切正常,输出为:-
<!DOCTYPE html>
<html class="no-js">
<head>
<title>
houston apartments / housing rentals - craigslist
</title>
<meta content="houston apartments / housing rentals - craigslist"
name="description">
<meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
<link href="https://houston.craigslist.org/search/apa" rel="canonical">
<link href="https://houston.craigslist.org/search/apa?
format=rss&min_bedrooms=2" rel="alternate" title="RSS feed for
craigslist | houston apartments / housing rentals - craigslist "
type="application/rss+xml">
<link href="https://houston.craigslist.org/search/apa?
s=120&min_bedrooms=2" rel="next">
<meta content="width=device-width,initial-scale=1" name="viewport">
<link href="//www.craigslist.org/styles/cl.css?
v=a14d0c65f7978c2bbc0d780a3ea7b7be" media="all" rel="stylesheet"
type="text/css">
<link href="//www.craigslist.org/styles/search.css?v=27e1d4246df60da5ffd1146d59a8107e" media="all" rel="stylesheet" type="
它清楚地表明列表不是空的,并且有我可以使用的项目。这是使用以下代码:-
apts = html.find_all('p', attrs={'class': 'row'})
print(len(apts))
上面print(len(apts))的输出是0..
任何人都可以帮助更正此代码。我相信 craiglist html 解析器有一些变化,但我不知道如何在这里实现它。
谢谢
没有 <p>
标签带有 'row'
class 而 <p>
有 'result-info'
class。
import requests
url_base = 'http://houston.craigslist.org/search/apa'
params = dict(bedrooms=2)
rsp = requests.get(url_base, params=params)
print(rsp.text[:500])
from bs4 import BeautifulSoup as bs4
html = bs4(rsp.text, 'html.parser')
print(html.prettify()[:1000])
apts = html.find_all('p', attrs={'class': 'result-info'})
print(len(apts))
您好,我正在使用以下代码来抓取 craiglist。
import pandas as pd
import requests
%pylab inline
url_base = 'http://houston.craigslist.org/search/apa'
params = dict(bedrooms=2)
rsp = requests.get(url_base, params=params)
print(rsp.text[:500])
from bs4 import BeautifulSoup as bs4
html = bs4(rsp.text, 'html.parser')
print(html.prettify()[:1000])
以上一切正常,输出为:-
<!DOCTYPE html>
<html class="no-js">
<head>
<title>
houston apartments / housing rentals - craigslist
</title>
<meta content="houston apartments / housing rentals - craigslist"
name="description">
<meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
<link href="https://houston.craigslist.org/search/apa" rel="canonical">
<link href="https://houston.craigslist.org/search/apa?
format=rss&min_bedrooms=2" rel="alternate" title="RSS feed for
craigslist | houston apartments / housing rentals - craigslist "
type="application/rss+xml">
<link href="https://houston.craigslist.org/search/apa?
s=120&min_bedrooms=2" rel="next">
<meta content="width=device-width,initial-scale=1" name="viewport">
<link href="//www.craigslist.org/styles/cl.css?
v=a14d0c65f7978c2bbc0d780a3ea7b7be" media="all" rel="stylesheet"
type="text/css">
<link href="//www.craigslist.org/styles/search.css?v=27e1d4246df60da5ffd1146d59a8107e" media="all" rel="stylesheet" type="
它清楚地表明列表不是空的,并且有我可以使用的项目。这是使用以下代码:-
apts = html.find_all('p', attrs={'class': 'row'})
print(len(apts))
上面print(len(apts))的输出是0.. 任何人都可以帮助更正此代码。我相信 craiglist html 解析器有一些变化,但我不知道如何在这里实现它。
谢谢
没有 <p>
标签带有 'row'
class 而 <p>
有 'result-info'
class。
import requests
url_base = 'http://houston.craigslist.org/search/apa'
params = dict(bedrooms=2)
rsp = requests.get(url_base, params=params)
print(rsp.text[:500])
from bs4 import BeautifulSoup as bs4
html = bs4(rsp.text, 'html.parser')
print(html.prettify()[:1000])
apts = html.find_all('p', attrs={'class': 'result-info'})
print(len(apts))