如何从 python 中的动态下拉列表中 Extract/Scrape 选项值?
How to Extract/Scrape option values from dynamic dropdowns in python?
我正在尝试从网页中提取数据,其中下拉列表中的选项是根据我们的输入动态加载的。我正在使用 Selenium Webdriver 从下拉列表中提取数据。请参阅下面的屏幕截图。
Dropdown 1 - State
Dropdown 2 - City
Dropdown 3 - Station
城市下拉选项在 select 州和车站下拉列表在 select 城市后加载。
到目前为止,我已经能够使用此代码提取电台名称。
citiesList = []
stationNameList = []
siteIdList = []
for city in cityOptions[1:]:
citiesList.append(city.text)
stationDropDown = driver.find_element_by_xpath("//select[contains(@id,'stations')]")
stationOptions = stationDropDown.find_elements_by_tag_name('option')
for ele in citiesList:
cityDropdown.send_keys(ele, Keys.RETURN)
time.sleep(2)
stationDropDown.click()
print(stationDropDown.text)
State Options
City Options
Option values from station dropdown
任何人都可以帮我提取每个州和城市的 siteId 吗?
尝试使用以下方法 python - requests 简单、直接、可靠、快速且在涉及 [=21= 时需要更少的代码]请求。在检查 google chrome 浏览器的网络部分后,我从网站本身获取了 API URL。
下面的脚本到底在做什么:
- 首先需要 API URL 和负载 (非常重要的是做一个 POST 请求) 做一个 POST请求并获取return.
中的数据
- 获取数据后,脚本将使用 json.loads 库解析 JSON 数据。
- 最后它会一个一个地遍历整个站点列表并打印详细信息,例如州名称、城市名称、站点名称和站点 ID。
Network call tab
Output of below code.
def scrape_aqi_site_id():
URL = 'https://app.cpcbccr.com/aqi_dashboard/aqi_station_all_india' #API URL
payload = 'eyJ0aW1lIjoxNjAzMTA0NTczNDYzLCJ0aW1lWm9uZU9mZnNldCI6LTMzMH0=' #Unique payload fetched from the network request
response = requests.post(URL,data=payload,verify=False) #POST request to get the data using URL and Payload information
result = json.loads(response.text) # parse the JSON object using json library
extracted_states = result['stations']
for state in range(len(extracted_states)): # loop over extracted states and its stations data.
print('=' * 120)
print('Scraping station data for state : ' + extracted_states[state]['stateID'])
for station in range(len(extracted_states[state]['stationsInCity'])): # loop over each state station data to get the information of stations
print('-' * 100)
print('Scraping data for city and its station : City (' + extracted_states[state]['stationsInCity'][station]['cityID'] + ') & station (' + extracted_states[state]['stationsInCity'][station]['name'] + ')')
print('City :' + extracted_states[state]['stationsInCity'][station]['cityID'])
print('Station Name : ' + extracted_states[state]['stationsInCity'][station]['name'])
print('Station Site Id : ' + extracted_states[state]['stationsInCity'][station]['id'])
print('-' * 100)
print('Scraping of data for state : (' + extracted_states[state]['stateID'] + ') is conmpleted now going for another one...')
print('=' * 120)
scrape_aqi_site_id()
我正在尝试从网页中提取数据,其中下拉列表中的选项是根据我们的输入动态加载的。我正在使用 Selenium Webdriver 从下拉列表中提取数据。请参阅下面的屏幕截图。
Dropdown 1 - State
Dropdown 2 - City
Dropdown 3 - Station
城市下拉选项在 select 州和车站下拉列表在 select 城市后加载。
到目前为止,我已经能够使用此代码提取电台名称。
citiesList = []
stationNameList = []
siteIdList = []
for city in cityOptions[1:]:
citiesList.append(city.text)
stationDropDown = driver.find_element_by_xpath("//select[contains(@id,'stations')]")
stationOptions = stationDropDown.find_elements_by_tag_name('option')
for ele in citiesList:
cityDropdown.send_keys(ele, Keys.RETURN)
time.sleep(2)
stationDropDown.click()
print(stationDropDown.text)
State Options
City Options
Option values from station dropdown
任何人都可以帮我提取每个州和城市的 siteId 吗?
尝试使用以下方法 python - requests 简单、直接、可靠、快速且在涉及 [=21= 时需要更少的代码]请求。在检查 google chrome 浏览器的网络部分后,我从网站本身获取了 API URL。
下面的脚本到底在做什么:
- 首先需要 API URL 和负载 (非常重要的是做一个 POST 请求) 做一个 POST请求并获取return. 中的数据
- 获取数据后,脚本将使用 json.loads 库解析 JSON 数据。
- 最后它会一个一个地遍历整个站点列表并打印详细信息,例如州名称、城市名称、站点名称和站点 ID。
Network call tab
Output of below code.
def scrape_aqi_site_id():
URL = 'https://app.cpcbccr.com/aqi_dashboard/aqi_station_all_india' #API URL
payload = 'eyJ0aW1lIjoxNjAzMTA0NTczNDYzLCJ0aW1lWm9uZU9mZnNldCI6LTMzMH0=' #Unique payload fetched from the network request
response = requests.post(URL,data=payload,verify=False) #POST request to get the data using URL and Payload information
result = json.loads(response.text) # parse the JSON object using json library
extracted_states = result['stations']
for state in range(len(extracted_states)): # loop over extracted states and its stations data.
print('=' * 120)
print('Scraping station data for state : ' + extracted_states[state]['stateID'])
for station in range(len(extracted_states[state]['stationsInCity'])): # loop over each state station data to get the information of stations
print('-' * 100)
print('Scraping data for city and its station : City (' + extracted_states[state]['stationsInCity'][station]['cityID'] + ') & station (' + extracted_states[state]['stationsInCity'][station]['name'] + ')')
print('City :' + extracted_states[state]['stationsInCity'][station]['cityID'])
print('Station Name : ' + extracted_states[state]['stationsInCity'][station]['name'])
print('Station Site Id : ' + extracted_states[state]['stationsInCity'][station]['id'])
print('-' * 100)
print('Scraping of data for state : (' + extracted_states[state]['stateID'] + ') is conmpleted now going for another one...')
print('=' * 120)
scrape_aqi_site_id()