抓取具有 link javascript:void() 的页面上的内容

Crawl the content on the page which has link javascript:void()

我想抓取 https://www.gotouniversity.com/course/index 的前十页。 至此,我已经能够掌握第一页的内容了。

from selenium import webdriver
driver = webdriver.Chrome(executable_path='/Users/xx/Desktop/chromedriver')
driver.get('https://www.gotouniversity.com/course/index')
university_name = driver.find_elements_by_class_name("university-name")
university_name = [link.text for link in university_name]

print(university_name)

['Loyola University Chicago',
 'Queens University',
  ...
 'Yale University']

页面的link是javascript:void(),所以我不知道如何在一页一页地掌握每一页的内容后


<div class="pagination"><div aria-live="polite" role="status" style="float:left; height:14px; padding:8px">Showing 1 to 20 of 143981 entries</div><div style="float:right;"><ul class="pagination" id="pagin_count"><li class="active" p="1"><a>1</a></li><li p="2"><a href="javascript:void()" onclick="pagingcustom(2);">2</a></li><li p="3"><a href="javascript:void()" onclick="pagingcustom(3);">3</a></li><li p="4"><a href="javascript:void()" onclick="pagingcustom(4);">4</a></li><li p="5"><a href="javascript:void()" onclick="pagingcustom(5);">5</a></li><li p="6"><a href="javascript:void()" onclick="pagingcustom(6);">6</a></li><li p="7"><a href="javascript:void()" onclick="pagingcustom(7);">7</a></li><li p="8"><a href="javascript:void()" onclick="pagingcustom(8);">8</a></li><li p="9"><a href="javascript:void()" onclick="pagingcustom(9);">9</a></li><li p="10"><a href="javascript:void()" onclick="pagingcustom(10);">10</a></li><li p="1"><a href="javascript:void()" onclick="pagingcustom(1);">Next</a></li></ul></div></div>
</div>
<script>
function fn_advcount(id){
    $.ajax({
            url: 'https://www.gotouniversity.com/site/advertisement-count',
            data: { id : id },
            success: function(result){
    }});
  }
</script>

我要获取的相关内容

<a href="/university/loyola-university-chicago" target="_blank" title="University">
<p class="university-name" title="Loyola University Chicago">Loyola University Chicago</p>
</a>

我已经阅读了一些相关问题,但我仍然无法找到解决方案


我也用 bs4 测试过,它可以抓取第一页的内容

import bs4
import requests
bowl = requests.get('https://www.gotouniversity.com/course/index') 
soup = bs4.BeautifulSoup(bowl.text, 'html.parser')
UniversityName = [i.text for i in soup.find_all('p', attrs={'class': 'university-name'})]

使用 beautifulsoup,打印大学名称和链接的前 10 页:

import requests
from bs4 import BeautifulSoup

url = 'https://www.gotouniversity.com/course/index'

params = {'page': 1}

for page in range(1, 11):
    print('Page no.{}...'.format(page))
    print('-' * 120)
    print()

    params['page'] = page
    soup = BeautifulSoup( requests.post(url, data=params).text, 'html.parser' )

    for a in soup.select('a[title="University"]'):
        print('{: <60}{}'.format(a.get_text(strip=True), a['href']))

    print()

打印:

Page no.1...
------------------------------------------------------------------------------------------------------------------------

Loyola University Chicago                                   /university/loyola-university-chicago
Queens University                                           /university/queens-university
University of Wollongong                                    /university/university-of-wollongong
Nanyang Technological University                            /university/nanyang-technological-university
Kaunas University of Technology                             /university/kaunas-university-of-technology
University of Bristol                                       /university/university-of-bristol
University of Victoria                                      /university/university-of-victoria
National University of Singapore NUS                        /university/national-university-of-singapore-nus
Duke University                                             /university/duke-university
Queens University                                           /university/queens-university
New Jersey Institute of Technology                          /university/new-jersey-institute-of-technology
Swinburne University of Technology                          /university/swinburne-university-of-technology
University of Alberta                                       /university/university-of-alberta
Cardiff University                                          /university/cardiff-university
St Clair College                                            /university/st-clair-college
Stanford University                                         /university/stanford-university
McGill University                                           /university/mcgill-university
Arizona State University Tempe                              /university/arizona-state-university-tempe
University of North Carolina Greensboro                     /university/university-of-north-carolina-greensboro
Yale University                                             /university/yale-university

Page no.2...
------------------------------------------------------------------------------------------------------------------------

Cambrian College                                            /university/cambrian-college
Simon Fraser University Burnaby                             /university/simon-fraser-university-burnaby
University of Bologna                                       /university/university-of-bologna
Memorial University of Newfoundland                         /university/memorial-university-of-newfoundland
Centennial College                                          /university/centennial-college
University of Groningen                                     /university/university-of-groningen
Griffith University Gold Coast Campus                       /university/griffith-university-gold-coast-campus
Texas A and M University College Station                    /university/texas-a-and-m-university-college-station
University of Calgary                                       /university/university-of-calgary
University of Melbourne                                     /university/university-of-melbourne
Fanshawe College                                            /university/fanshawe-college
Zurich Swiss Federal Institute of Technology ETH            /university/zurich-swiss-federal-institute-of-technology-eth
Northeastern University                                     /university/northeastern-university
Adelphi University                                          /university/adelphi-university
Heriot Watt University Dubai                                /university/heriot-watt-university-dubai
University of Ottawa                                        /university/university-of-ottawa
University of Regina                                        /university/university-of-regina
University of Regina                                        /university/university-of-regina
Humber College North Campus                                 /university/humber-college-north-campus
Seneca College                                              /university/seneca-college

...and so on.

这是前 10 页使用 selenium 的代码。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome(executable_path='/Users/xx/Desktop/chromedriver')
driver.get('https://www.gotouniversity.com/course/index')
Page_number=1
Max_page=10

while Page_number<=Max_page:

   university_name =WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,'.university-name'))) 
   university_name = [link.text for link in university_name]
   print(university_name)
   Page_number = Page_number + 1
   element=WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.XPATH,'//a[text()="'+str(Page_number) +'"]')))
   driver.execute_script("arguments[0].click();", element)

输出:

['Loyola University Chicago', 'Queens University', 'University of Wollongong', 'Nanyang Technological University', 'Kaunas University of Technology', 'University of Bristol', 'University of Victoria', 'National University of Singapore NUS', 'Duke University', 'Queens University', 'New Jersey Institute of Technology', 'Swinburne University of Technology', 'University of Alberta', 'Cardiff University', 'St Clair College', 'Stanford University', 'McGill University', 'Arizona State University Tempe', 'University of North Carolina Greensboro', 'Yale University']
['Cambrian College', 'Simon Fraser University Burnaby', 'University of Bologna', 'Memorial University of Newfoundland', 'Centennial College', 'University of Groningen', 'Griffith University Gold Coast Campus', 'Texas A and M University College Station', 'University of Calgary', 'University of Melbourne', 'Fanshawe College', 'Zurich Swiss Federal Institute of Technology ETH', 'Northeastern University', 'Adelphi University', 'Heriot Watt University Dubai', 'University of Ottawa', 'University of Regina', 'University of Regina', 'Humber College North Campus', 'Seneca College']
['Central Queensland University Melbourne', 'Technical University of Munich', 'University of Groningen', 'Boston College Lincolnshire', 'Florida State University', 'Maryland Institute College of Art', 'Heriot Watt University Dubai', 'Hult International Business School Shanghai', 'University College Dublin', 'Bellerbys College Brighton', 'University of Ottawa', 'Queens University', 'RMIT University', 'Lakehead University Thunder Bay', 'University of Rhode Island', 'DLD College London', 'McGill University', 'University of Alberta', 'Algonquin College Ottawa', 'University of Fraser Valley']
['Mount Saint Vincent University', 'Fanshawe College', 'North Island College', 'Okanagan College Kelowna', 'St Clair College', 'Ryerson University', 'Northern College Timmins', 'Simon Fraser University Burnaby', 'Zurich Swiss Federal Institute of Technology ETH', 'Nanyang Technological University', 'Delft University of Technology', 'University of Munich LMU', 'University of Munich LMU', 'University of Freiburg', 'University of Bologna', 'University of Bologna', 'University of Windsor', 'University of Guelph', 'Harvard University', 'Emory University']
['Washington State University Pullman', 'San Diego State University', 'Heriot Watt University Dubai', 'New European College', 'The University of Northampton', 'Middlesex University Dubai', 'Middlesex University Dubai', 'University of Leeds', 'University of Hull', 'Martin College', 'University of Twente', 'University of Twente', 'Vrije Universiteit Amsterdam', 'University of Toronto St George', 'University of Hertfordshire', 'University of Wollongong', 'University of Melbourne', 'Humber College Lakeshore', 'Seneca College', 'Douglas College']
['Centennial College', 'Centennial College', 'Centennial College', 'Conestoga College', 'St Clair College', 'Ryerson University', 'Western Sydney University Sydney Campus', 'University of Zurich', 'University of Zurich', 'University of Bologna', 'University of Gottingen', 'Memorial University of Newfoundland', 'Concordia University', 'Carleton University', 'Neubrandenburg University of Applied Sciences', 'Harvard University', 'Yale University', 'Duke University', 'University of California San Diego', 'Southern Methodist University']
['University of New Hampshire', 'Oregon State University', 'Kansas State University', 'University of North Carolina Greensboro', 'Geneva Business School Geneva', 'University of Amsterdam', 'Bellerbys College London', 'Vrije Universiteit Amsterdam', 'University of Western Australia', 'University of Toronto Mississauga', 'McGill University', 'University of Montreal', 'Queens University', 'Queens University', 'University of Dundee', 'University of New South Wales', 'University of Melbourne', 'Griffith University Nathan Campus', 'University of Regina', 'British Columbia Institute of Technology Burnaby']
['University of Northern British Columbia', 'George Brown College', 'Conestoga College', 'Southern Alberta Institute of Technology', 'St Lawrence College Kingston', 'Ryerson University', 'Northern College Kirkland', 'Simon Fraser University Burnaby', 'Synergy University Dubai', 'University of Notre Dame Fremantle', 'Western Sydney University Sydney Campus', 'University of Tokyo Hongo Campus', 'Technical University of Munich', 'Queen Mary University of London', 'University of Windsor', 'Griffith University Gold Coast Campus', 'Concordia University', 'Carleton University', 'Carleton University', 'Carleton University']
['Carleton University', 'Neubrandenburg University of Applied Sciences', 'Stanford University', 'Massachusetts Institute of Technology', 'University of California Berkeley', 'Tufts University', 'University of California Santa Barbara', 'University of California Davis', 'Pennsylvania State University University Park', 'University of Georgia', 'University of Pittsburgh', 'SUNY College of Environmental Science and Forestry', 'Michigan Technological University', 'Colorado State University', 'Ohio University', 'Ohio University', 'Oregon State University', 'New Jersey Institute of Technology', 'Rutgers University Newark', 'Rutgers University Newark']
['Oklahoma State University', 'Mississippi State University', 'University of Idaho', 'University of Idaho', 'University of North Dakota', 'Heriot Watt University Dubai', 'Jacobs University', 'S P Jain School of Global Management', 'S P Jain School of Global Management', 'Istituto Marangoni Paris', 'DLD College London', 'Durham University', 'Keele University', 'Kingston University London', 'University College Dublin', 'University of Surrey', 'Royal Roads University', 'Royal Roads University', 'Royal Roads University', 'University of North Texas']