无法从地图中的不同盒状容器中抓取不同的所有者名称
Unable to scrape different owner names from different box-like containers out of a map
我正在尝试使用 selenium 单击地图,以便我可以从类似盒子的容器中抓取 parcel id
和 owner name
。在该地图上单击时,会显示类似盒子的容器。我想从这样的容器中刮取 parcel id
和 owner name
。这就是 box-like container 的样子。我尝试使用 requests 但找不到任何方法来定位此类容器中可用的信息,所以我现在正在尝试使用 selenium。下面的脚本既没有点击该地图,也没有抛出任何错误。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "http://app01.cityofboston.gov/parcelviewer/"
driver = webdriver.Chrome()
driver.get(link)
wait = WebDriverWait(driver, 20)
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "svg#mapDiv_gc"))):
item.click()
driver.quit()
如何从该地图中的不同盒状容器中获取包裹 ID 和所有者姓名?
尝试使用ActionChains
class中的.move_to_element_with_offset(to_element, xoffset, yoffset)
方法点击特定x y
位置的元素来解决。这将根据列表中指定的 x y
随机点击。
起始点x在左侧导航宽度后确定,即:
left_nav = driver.find_element_by_id('searchBox')
xstart = left_nav.size['width']
起点y是在顶部导航高度之后确定的,即:
top_nav = driver.find_element_by_id('headerFrame')
ystart = top_nav.size['height']
以下代码点击常量 y 位置:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#add following import
from selenium.webdriver import ActionChains
link = "http://app01.cityofboston.gov/parcelviewer/"
driver = webdriver.Chrome()
driver.get(link)
driver.maximize_window()
map_element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'svg#mapDiv_gc')))
left_nav = driver.find_element_by_id('searchBox')
xstart = left_nav.size['width']
top_nav = driver.find_element_by_id('headerFrame')
ystart = top_nav.size['height']
#random x y here
xlist_increment = [100, 200, 300, 400, 500, 600, 700, 800, 900]
ylist_increment = [300, 300, 300, 300, 300, 300, 300, 300, 300]
wait = WebDriverWait(driver, 1)
action = ActionChains(driver)
for x, y in zip(xlist_increment, ylist_increment):
xoffset = xstart + x
yoffset = ystart + y
action.move_to_element_with_offset(map_element, xoffset, yoffset)
action.click()
action.perform()
try:
parcel_id = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='esriPopupWrapper']//b[contains(text(), 'Parcel ID')]//parent::div")))
owner_name = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='esriPopupWrapper']//b[contains(text(), 'Owner')]//parent::div")))
print(parcel_id.text)
print(owner_name.text)
driver.find_element_by_css_selector('div.close').click()
except Exception:
print("popup doesn't appear")
driver.quit()
因为随机点击x y
位置,不保证每次点击都能弹出你说的parcel id
和owner name
弹窗,但我至少明白了不止一次。
输出:
Parcel ID: 0302895000
Owner: SIXTY3-65 COURT ST LLC
Land Use: C
弹出窗口没有出现
Parcel ID: 0302897000
Owner: SEARS CRESCENT BUILDING LLC
Land Use: C
弹出窗口没有出现
Parcel ID: 0303694000
Owner: TWENTY-8 STATE STREET LLC
Land Use: C
弹出窗口没有出现
Parcel ID: 0303685000
Owner: ANBECA 60 LLC
Land Use: C
弹出窗口没有出现
Parcel ID: 0303746000
Owner: STATE ENTERPRISES LIMITED PA
Land Use: C
这是来自 ArcGIS REST 服务的数据。
我找到了 returns 所需数据的 Argis 查询调用:
GET https://services.arcgis.com/sFnw0xNflSi8J0uh/arcgis/rest/services/Parcels19WMFull/FeatureServer/0/query
我检查了可能产生此 url 的原因并发现了以下内容:
- http://app01.cityofboston.gov/parcelviewer/config/ParcelViewer.json returns 具有地图 ID
values.webmap
的 JSON 对象:2da765769ee34446a396a9c9010f5631
- 此值用于生成以下 url : https://www.arcgis.com/sharing/rest/content/items/2da765769ee34446a396a9c9010f5631/data returns 查询 URL 以查找
当您在左上角的输入中搜索数据时调用此查询调用。您可以编辑 url 参数以匹配所有数据:
{
"f": "json",
"where": "1=1",
"returnGeometry": "true",
"spatialRel": "esriSpatialRelIntersects",
"outFields": "*",
"outSR": "102100"
}
它 returns 最多 2000 个项目,所以我们需要迭代。要知道如何迭代,我们可以检查 features
数组中的内容,检查 this query 它给出类似的内容:
{
"attributes": {
"FID": 1,
"FULL_ADDRE": "104 A 104 PUTNAM ST, 02128",
"PID": "0100001000"
}
},
{
"attributes": {
"FID": 2,
"FULL_ADDRE": "18 LEVERETT AV #10-B, 02128",
"PID": "0101399120"
}
},
{
"attributes": {
"FID": 3,
"FULL_ADDRE": "197 LEXINGTON ST, 02128",
"PID": "0100002000"
}
}
....
所以我们可以使用 where=FID > 2000
迭代 FID
字段,对于下一次迭代,我们可以只存储我们获得的最后一个 FID 并使用 FID > {last_fid}
[= 编辑 where 子句31=]
所以你可以像这样构建一个 python 脚本 :
import requests
base_url = "http://app01.cityofboston.gov/parcelviewer"
# get map id
r = requests.get(f"{base_url}/config/ParcelViewer.json")
map_id = r.json()["values"]["webmap"]
# get the query url
r = requests.get(f"https://www.arcgis.com/sharing/rest/content/items/{map_id}/data", params = {
"f": "json"
})
url = r.json()["operationalLayers"][0]["url"]
params = {
"f": "json",
"where": "1=1",
"returnGeometry": "true",
"spatialRel": "esriSpatialRelIntersects",
"outFields": "*",
"outSR": "102100"
}
data = []
count = 1
finish = False
while finish == False:
print(f"[{count}] requesting...")
r = requests.get(f"{url}/query", params = params)
entries = r.json()["features"]
if len(entries) < 2000:
finish = True
else:
last_fid = entries[-1]["attributes"]["FID"]
print(f"next fid : {last_fid}")
params["where"] = f"FID > {last_fid}"
data.extend(entries)
print(f"[{count}] received {len(entries)} items - total received : {len(data)}")
count +=1
print(f"TOTAL: {len(data)}")
# print the last element (just to check)
print(data[-1])
几分钟后,脚本提取了 171922 条记录:
这是条目的样子:
{
'attributes': {
'FID': 171922,
'PID_LONG': '2205670000',
'PID': '2205670000',
'GIS_ID': '2205670000',
'FULL_ADDRE': '2203 COMMONWEALTH AV, 02135',
'OWNER': 'COMMWLTH OF MASS',
'LAND_USE': 'E',
'LAND_SF': 34125,
'LIVING_ARE': 7386,
'AV_LAND': 1325400,
'AV_BLDG': 841100,
'AV_TOTAL': 2166500,
'GROSS_TAX': 0,
'ID': 0,
'SHAPE_Leng': 1003.12908156,
'SHAPE_Area': 33512.6220608,
'Shape__Area': 5702.6640625,
'Shape__Length': 414.046143349521
},
'geometry': {
'rings': [
[
[-7922244.91043368, 5212145.61745703],
[-7922247.98527419, 5212105.5446644],
[-7922243.75007186, 5212106.29247827],
[-7922235.83595224, 5212062.80771992],
[-7922239.05526106, 5212062.68000813],
[-7922327.54387782, 5212214.66112252],
[-7922281.74795739, 5212208.62518937],
[-7922266.82960043, 5212207.97287607],
[-7922241.02937963, 5212204.61661323],
[-7922244.0269726, 5212158.45234151],
[-7922244.91043368, 5212145.61745703]
]
]
}
}
最后一件事,只是为了直接在 API 上检查结果计数,我们可以使用 Arcgis 查询 UI 中的查询参数,例如 this one(这是地图顺便在网站上使用)。当仅按计数过滤时,它会添加字段 returnCountOnly=true
,让我们在查询端点中执行此操作:
哪个 returns 正确:
{"count":171922}
请注意,您可以将此脚本的某些变体应用于任何 Arcgis Rest 服务查询类型。我在 this gist 上做了一个例子,从地图(城市)中获取数据。请注意,API 返回的最大结果可能因服务而异
我正在尝试使用 selenium 单击地图,以便我可以从类似盒子的容器中抓取 parcel id
和 owner name
。在该地图上单击时,会显示类似盒子的容器。我想从这样的容器中刮取 parcel id
和 owner name
。这就是 box-like container 的样子。我尝试使用 requests 但找不到任何方法来定位此类容器中可用的信息,所以我现在正在尝试使用 selenium。下面的脚本既没有点击该地图,也没有抛出任何错误。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "http://app01.cityofboston.gov/parcelviewer/"
driver = webdriver.Chrome()
driver.get(link)
wait = WebDriverWait(driver, 20)
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "svg#mapDiv_gc"))):
item.click()
driver.quit()
如何从该地图中的不同盒状容器中获取包裹 ID 和所有者姓名?
尝试使用ActionChains
class中的.move_to_element_with_offset(to_element, xoffset, yoffset)
方法点击特定x y
位置的元素来解决。这将根据列表中指定的 x y
随机点击。
起始点x在左侧导航宽度后确定,即:
left_nav = driver.find_element_by_id('searchBox')
xstart = left_nav.size['width']
起点y是在顶部导航高度之后确定的,即:
top_nav = driver.find_element_by_id('headerFrame')
ystart = top_nav.size['height']
以下代码点击常量 y 位置:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#add following import
from selenium.webdriver import ActionChains
link = "http://app01.cityofboston.gov/parcelviewer/"
driver = webdriver.Chrome()
driver.get(link)
driver.maximize_window()
map_element = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'svg#mapDiv_gc')))
left_nav = driver.find_element_by_id('searchBox')
xstart = left_nav.size['width']
top_nav = driver.find_element_by_id('headerFrame')
ystart = top_nav.size['height']
#random x y here
xlist_increment = [100, 200, 300, 400, 500, 600, 700, 800, 900]
ylist_increment = [300, 300, 300, 300, 300, 300, 300, 300, 300]
wait = WebDriverWait(driver, 1)
action = ActionChains(driver)
for x, y in zip(xlist_increment, ylist_increment):
xoffset = xstart + x
yoffset = ystart + y
action.move_to_element_with_offset(map_element, xoffset, yoffset)
action.click()
action.perform()
try:
parcel_id = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='esriPopupWrapper']//b[contains(text(), 'Parcel ID')]//parent::div")))
owner_name = wait.until(EC.presence_of_element_located((By.XPATH, "//div[@class='esriPopupWrapper']//b[contains(text(), 'Owner')]//parent::div")))
print(parcel_id.text)
print(owner_name.text)
driver.find_element_by_css_selector('div.close').click()
except Exception:
print("popup doesn't appear")
driver.quit()
因为随机点击x y
位置,不保证每次点击都能弹出你说的parcel id
和owner name
弹窗,但我至少明白了不止一次。
输出:
Parcel ID: 0302895000 Owner: SIXTY3-65 COURT ST LLC Land Use: C
弹出窗口没有出现
Parcel ID: 0302897000 Owner: SEARS CRESCENT BUILDING LLC Land Use: C
弹出窗口没有出现
Parcel ID: 0303694000 Owner: TWENTY-8 STATE STREET LLC Land Use: C
弹出窗口没有出现
Parcel ID: 0303685000 Owner: ANBECA 60 LLC Land Use: C
弹出窗口没有出现
Parcel ID: 0303746000 Owner: STATE ENTERPRISES LIMITED PA Land Use: C
这是来自 ArcGIS REST 服务的数据。
我找到了 returns 所需数据的 Argis 查询调用:
GET https://services.arcgis.com/sFnw0xNflSi8J0uh/arcgis/rest/services/Parcels19WMFull/FeatureServer/0/query
我检查了可能产生此 url 的原因并发现了以下内容:
- http://app01.cityofboston.gov/parcelviewer/config/ParcelViewer.json returns 具有地图 ID
values.webmap
的 JSON 对象:2da765769ee34446a396a9c9010f5631
- 此值用于生成以下 url : https://www.arcgis.com/sharing/rest/content/items/2da765769ee34446a396a9c9010f5631/data returns 查询 URL 以查找
当您在左上角的输入中搜索数据时调用此查询调用。您可以编辑 url 参数以匹配所有数据:
{
"f": "json",
"where": "1=1",
"returnGeometry": "true",
"spatialRel": "esriSpatialRelIntersects",
"outFields": "*",
"outSR": "102100"
}
它 returns 最多 2000 个项目,所以我们需要迭代。要知道如何迭代,我们可以检查 features
数组中的内容,检查 this query 它给出类似的内容:
{
"attributes": {
"FID": 1,
"FULL_ADDRE": "104 A 104 PUTNAM ST, 02128",
"PID": "0100001000"
}
},
{
"attributes": {
"FID": 2,
"FULL_ADDRE": "18 LEVERETT AV #10-B, 02128",
"PID": "0101399120"
}
},
{
"attributes": {
"FID": 3,
"FULL_ADDRE": "197 LEXINGTON ST, 02128",
"PID": "0100002000"
}
}
....
所以我们可以使用 where=FID > 2000
迭代 FID
字段,对于下一次迭代,我们可以只存储我们获得的最后一个 FID 并使用 FID > {last_fid}
[= 编辑 where 子句31=]
所以你可以像这样构建一个 python 脚本 :
import requests
base_url = "http://app01.cityofboston.gov/parcelviewer"
# get map id
r = requests.get(f"{base_url}/config/ParcelViewer.json")
map_id = r.json()["values"]["webmap"]
# get the query url
r = requests.get(f"https://www.arcgis.com/sharing/rest/content/items/{map_id}/data", params = {
"f": "json"
})
url = r.json()["operationalLayers"][0]["url"]
params = {
"f": "json",
"where": "1=1",
"returnGeometry": "true",
"spatialRel": "esriSpatialRelIntersects",
"outFields": "*",
"outSR": "102100"
}
data = []
count = 1
finish = False
while finish == False:
print(f"[{count}] requesting...")
r = requests.get(f"{url}/query", params = params)
entries = r.json()["features"]
if len(entries) < 2000:
finish = True
else:
last_fid = entries[-1]["attributes"]["FID"]
print(f"next fid : {last_fid}")
params["where"] = f"FID > {last_fid}"
data.extend(entries)
print(f"[{count}] received {len(entries)} items - total received : {len(data)}")
count +=1
print(f"TOTAL: {len(data)}")
# print the last element (just to check)
print(data[-1])
几分钟后,脚本提取了 171922 条记录:
这是条目的样子:
{
'attributes': {
'FID': 171922,
'PID_LONG': '2205670000',
'PID': '2205670000',
'GIS_ID': '2205670000',
'FULL_ADDRE': '2203 COMMONWEALTH AV, 02135',
'OWNER': 'COMMWLTH OF MASS',
'LAND_USE': 'E',
'LAND_SF': 34125,
'LIVING_ARE': 7386,
'AV_LAND': 1325400,
'AV_BLDG': 841100,
'AV_TOTAL': 2166500,
'GROSS_TAX': 0,
'ID': 0,
'SHAPE_Leng': 1003.12908156,
'SHAPE_Area': 33512.6220608,
'Shape__Area': 5702.6640625,
'Shape__Length': 414.046143349521
},
'geometry': {
'rings': [
[
[-7922244.91043368, 5212145.61745703],
[-7922247.98527419, 5212105.5446644],
[-7922243.75007186, 5212106.29247827],
[-7922235.83595224, 5212062.80771992],
[-7922239.05526106, 5212062.68000813],
[-7922327.54387782, 5212214.66112252],
[-7922281.74795739, 5212208.62518937],
[-7922266.82960043, 5212207.97287607],
[-7922241.02937963, 5212204.61661323],
[-7922244.0269726, 5212158.45234151],
[-7922244.91043368, 5212145.61745703]
]
]
}
}
最后一件事,只是为了直接在 API 上检查结果计数,我们可以使用 Arcgis 查询 UI 中的查询参数,例如 this one(这是地图顺便在网站上使用)。当仅按计数过滤时,它会添加字段 returnCountOnly=true
,让我们在查询端点中执行此操作:
哪个 returns 正确:
{"count":171922}
请注意,您可以将此脚本的某些变体应用于任何 Arcgis Rest 服务查询类型。我在 this gist 上做了一个例子,从地图(城市)中获取数据。请注意,API 返回的最大结果可能因服务而异