Web 抓取包含 JSON 数据的网站

Webscrapping a site which contains JSON data

我正在一个网站上工作以从中获取工作数据。当我使用 beautifulsoup 时,站点响应没有完整信息。所以尝试使用 Pandas 来实现它。仍然没有运气。有人可以帮我吗?

import pandas as pd
import requests
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'}
url = f'https://hirist.com'
# r = requests.get(url, headers, verify=False)

payload = {"pageNo": "1",
           "query": "software engineer",
           "loc": '17',
           "minexp": '0',
           "maxexp": '0',
           "range": '0',
           "boost": '0',
           "searchRange": '4',
           "searchOp": 'AND',
           "jobType": "1"
           }
jsonData = requests.post(url, headers=headers,
                         json=payload, verify=False).json()
df = pd.DataFrame(jsonData)

print(df)

尝试以下方法:

import pandas as pd
import requests

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
    'Referer' : 'https://www.hirist.com/',
    'Authorization' : 'Bearer undefined',
    'Origin' : 'https://www.hirist.com',
}

payload = {
    "pageNo" : "1",
    "query" : "software engineer",
    "loc" : '17',
    "minexp" : '0',
    "maxexp" : '0',
    "range" : '0',
    "boost" : '0',
    "searchRange" : '4',
    "searchOp" : 'AND',
    "jobType" : "1"
}

jsonData = requests.get("https://jobseeker-api.hirist.com/jobfeed/-1/search", headers=headers, params=payload, verify=False).json()

print(jsonData)

给你输出开始:

{'count': 58, 'jobs': [{'id': 982486, 'title': 'Software Engineer - ASP/C# (1-4 yrs)', 'introText': '<p><p><b>Position : Software Engineer</b><br/><br/><b>Experience : 1- 4 Years</b><br/><br/><b>Job type : Permanent</b><br/><br/><b>Skills Required :</b><br/><br/>- Extensive knowledge in <b>Asp.net, C# and SQL.</b><br/><br/>- Ability to troubleshoot and solve complex technical problems.<br/><br/>- Great interpersonal and communication skills<br/><br/>- Must have good analytical and problem-solving skills.<br/><br/>- Good Time Management and Planning skills.<br/><br/><b>Roles & Responsibility :</b><br/><br/>- Producing clean, efficient code based on specifications.<br/><br/>- Fixing and improving existing software<br/><br/>- Integrate software components and third-party programs<br/><br/>- Verify and deploy programs and systems<br/><br/>- Troubleshoot, debug and upgrade existing software<br/><br/>- Gather and evaluate user feedback<br/><br/>- Recommend and execute improvements<br/><br/>- Create technical documentation for reference and reporting<br/><br/>- Prefer Immediate Joiners</p></p>', 'jobdesignation': 'Software Developer', 'min': 1, 'max': 4, 'createdBy': 93163, 'creatorDomainName': 'sapwood.net', 'categoryId': 1, 'jobDetailUrl': 'https://www.hirist.com/j/software-engineer-aspc-1-4-yrs-982486.html?ref=ambitionbox', 'femaleCandidate': 0, 'differentlyAbled': 0, 'exDefence': 0, 'workFromHome': 0, 'femaleBackWorkForce': 0, 'confidential': 0, 'premium': 0, 'star': 0, 'applyStatus': 1, 'applyCount': 42, 'createdTimeMs': 1643024958613, 'createdTime': 1642982400000, 'createdTimeNoMillis': None, 'tagIdString': '206 387 91 7', 'tags': [{'id': 206, 'name': 'C#'}, {'id': 387, 'name': 'SQL Server'}, {'id': 91, 'name': 'ASP'}, {'id': 7, 'name': '.Net'}], 'locations': [{'id': 70, 'name': 'Cochin/Kochi'}, {'id': 17, 'name': 'Kerala'}], 'showcase': None, 'diversity': None, 'companyStatus': 1, 'createdByAlias': 'Cochin/Kochi/Kerala', 'applyUrl': '', 'videoUrl': '', 'assessmentFlags': 0, 'mediaResume': 0, 'industry': '', 'functionalArea': 18, 'minSal': 1, 'maxSal': 6, 'hits': 373, 'otherLocation': '', 'minBatch': None, 'maxBatch': None, 'brandJobFlag': 0, 'companyDomain': None, 'lableId': None, 'companyData': {'companyId': 0, 'companyName': 'Sapwood Ventures', 'companyNameNotAnalyzed': 'Sapwood Ventures', 'companyStatus': 1, 'logoPath': None}, 'recruiter': {'recruiterId': 93163, 'recruiterName': 'Hemaa R', 'designation': 'Senior Manager - Team & Key Accounts', 'profilePicUrl': '', 'logoPath': '', 'recruiterActions': 34}, 'jobStatusInfo': None, 'location': [{'id': 70, 'name': 'Cochin/Kochi'}, {'id': 17, 'name': 'Kerala'}], 'saved': 0, 'applied': 0}, {'id': 997211, 'title': 'Tetherfi Technologies - Software Engineer - Java/J2EE (3-10 yrs)', 'introText': "<p>The Right Individual :<br/><br/>The ideal candidate will have a passion for technology and software building. Attention to detail and an analytical mind are essential qualities in this role. You will have to work on both technical and design aspects of software projects. A proactive approach to problem-solving as well as a detailed understanding of coding is essential. If finding issues and fixing them with beautiful, meticulous code are among the talents that make you tick, we'd like to hear from you.<br/><br/>Required Functional Skill :<br/><br/>1. 4+ years of experience in java and familiarity in Spring boot, JPA.<br/><br/>2. Extensive Hands-on experience in JAVA Java SE.<br/><br/>3. Well versed with Object Oriented Programming Concepts.<br/><br/>4. Prior experience on JAVA Spring / Spring boot framework.<br/><br/>5. Familiarity with java application servers JBoss, WebLogic.<br/><br/>6. Have in-depth knowledge and self-driven interest to work with JAVA Servlets.<br/><br/>7. Experience in deploying solutions for cross integrations among OEMs in CC or UC environment is preferred.<br/><br/>Role and Responsibilities :<br/><br/>1. Candidate will be part of our Global Delivery center team liaising with Product Strategist and Product Owner to enhance Tetherfi's Products based on Web chat, CC & UC Product Streams.<br/><br/>2. Will develop, enhance and support Tetherfi's existing projects and future projects.<br/><br/>Required Professional & Interpersonal Qualities :<br/><br/>- Bachelor's Degree in appropriate field of study or equivalent work experience.<br/><br/>- Experienced with all ancillary technologies necessary for Internet applications: HTTP, TCP/IP, POP/SMTP, etc.</p>", 'jobdesignation': 'Software Engineer', 'min': 3, 'max': 10, 'createdBy': 72249, 'creatorDomainName': 'tetherfi.com', 'categoryId': 1, 'jobDetailUrl': 'https://www.hirist.com/j/tetherfi-technologies-software-engineer-javaj2ee-997211.html?ref=ambitionbox', 'femaleCandidate': 0, 'differentlyAbled': 0, 'exDefence': 0, 'workFromHome': 0, 'femaleBackWorkForce': 0, 'confidential': 0, 'premium': 0, 'star': 0, 'applyStatus': 1, 'applyCount': 4, 'createdTimeMs': 1645156975704, 'createdTime': 1645142400000, 'createdTimeNoMillis': None, 'tagIdString': '5 2850 25 279 87 237 11100 19', 'tags': [{'id': 5, 'name': 'Java'}, {'id': 2850, 'name': 'Spring Boot'}, {'id': 25, 'name': 'J2EE'}, {'id': 279, 'name': 'Servlets'}, {'id': 87, 'name': 'JBOSS'}, {'id': 237, 'name': 'WebLogic'}, {'id': 11100, 'name': 'Application Server'}, {'id': 19, 'name': 'OOPS'}], 'locations': [{'id': 88, 'name': 'Anywhere in India/Multiple Locations'}, {'id': 3, 'name': 'Bangalore'}, {'id': 6, 'name': 'Chennai'}, {'id': 7, 'name': 'Pune'}, {'id': 17, 'name': 'Kerala'}, {'id': 31, 'name': 'Karnataka'}], 'showcase': None, 'diversity': None, 'companyStatus': 1, 'createdByAlias': 'Anywhere in India/Multiple Locations/Bangalore/Chennai/Pune/Kerala/Karnataka', 'applyUrl': '', 'videoUrl': '', 'assessmentFlags': 0, 'mediaResume': 0, 'industry': '', 'functionalArea': 16, 'minSal': 5, 'maxSal': 14, 'hits': 15, 'otherLocation': '', 'minBatch': None, 'maxBatch': None, 'brandJobFlag': 0, 'companyDomain': None, 'lableId': None, 'companyData': {'companyId': 0, 'companyName': 'Tetherfi Technologies Pvt Ltd', 'companyNameNotAnalyzed': 'Tetherfi Technologies Pvt Ltd', 'companyStatus': 1, 'logoPath': None}, 'recruiter': {'recruiterId': 72249, 'recruiterName': 'Laxman Shenoy', 'designation': 'Deputy Manager HR', 'profilePicUrl': '', 'logoPath': '', 'recruiterActions': 2}, 'jobStatusInfo': None, 'location': [{'id': 88, 'name': 'Anywhere in India/Multiple Locations'}, {'id': 3, 'name': 'Bangalore'}, {'id': 6, 'name': 'Chennai'}, {'id': 7, 'name': 'Pune'}, {'id': 17, 'name': 'Kerala'}, {'id': 31, 'name': 'Karnataka'}], 'saved': 0, 'applied': 0}, {'id': 1003219, 'title': 'Senior Software Engineer - Python/Django (3-8 yrs)', 'introText': "<p><p><p><b>Position / Designation :</b> Software Engineer /Senior Software Engineer<br/><br/><b>Location</b> <b>: </b>Chennai<br/><br/><b>Experience</b> <b>: </b>0-3 years for SE, 3+ years for SSE, <br/><br/><b>CTC : <br/></b><br/>SE -  4 to 6 L.P.A<br/><br/>SSE- 7-11 L.P.A<br/><br/>The ideal candidate is a self-motivated, multi-tasker, and demonstrated team player. You will be a lead developer responsible for the development of new software products and enhancements to existing products. You should excel in working with large-scale applications and frameworks and have outstanding communication and leadership skills. <br/><br/><b>Responsibilities : <br/></b><br/>- Writing clean, high-quality, high-performance, maintainable code<br/><br/>- Develop and support software including applications, database integration, interfaces, and new functionality enhancements.<br/><br/>- Coordinate cross-functionally to ensure the project meets business objectives and compliance standards.<br/><br/>- Support test and deployment of new products and features.<br/><br/>- Participate in code reviews.<br/><br/><b>Qualifications : <br/></b><br/>- Bachelor's degree in Computer Science (or related field)<br/><br/>- 3+ years of work experience in Python, Django.<br/><br/>- Expertise in Object-Oriented Design, Database Design, and XML Schema<br/><br/>- Experience with Agile or Scrum software development methodologies<br/><br/>- Ability to multi-task, organize and prioritize work.</p></p></p>", 'jobdesignation': None, 'min': 3, 'max': 8, 'createdBy': 98899, 'creatorDomainName': 'gmail.com', 'categoryId': 1, 'jobDetailUrl': 'https://www.hirist.com/j/senior-software-engineer-pythondjango-3-8-yrs-1003219.html?ref=ambitionbox', 'femaleCandidate': 1, 'differentlyAbled': 0, 'exDefence': 0, 'workFromHome': 1, 'femaleBackWorkForce': 0, 'confidential': 0, 'premium': 0, 'star': 0, 'applyStatus': 1, 'applyCount': 98, 'createdTimeMs': 1646059583336, 'createdTime': 1646006400000, 'createdTimeNoMillis': None, 'tagIdString': '9 592 50 280 97 30357 3429 4422 11 2339 2807', 'tags': [{'id': 9, 'name': 'Python'}, {'id': 592, 'name': 'Agile'}, {'id': 50, 'name': 'Django'}, {'id': 280, 'name': 'Scrum'}, {'id': 97, 'name': 'XML'}, {'id': 30357, 'name': 'Object Modeling'}, {'id': 3429, 'name': 'Database Schema'}, {'id': 4422, 'name': 'Database Architecture'}, {'id': 11, 'name': 'MySQL'}, {'id': 2339, 'name': 'Python Architect'}, {'id': 2807, 'name': 'PySpark'}], 'locations': [{'id': 3, 'name': 'Bangalore'}, {'id': 6, 'name': 'Chennai'}, {'id': 84, 'name': 'Coimbatore'}, {'id': 17, 'name': 'Kerala'}], 'showcase': None, 'diversity': None, 'companyStatus': 2, 'createdByAlias': 'Bangalore/Chennai/Coimbatore/Kerala', 'applyUrl': '', 'videoUrl': '', 'assessmentFlags': 0, 'mediaResume': 0, 'industry': '0', 'functionalArea': 16, 'minSal': 16, 'maxSal': 31, 'hits': 538, 'otherLocation': '', 'minBatch': None, 'maxBatch': None, 'brandJobFlag': 0, 'companyDomain': None, 'lableId': None, 'companyData': {'companyId': 0, 'companyName': 'AR Consultant', 'companyNameNotAnalyzed': 'AR Consultant', 'companyStatus': 2, 'logoPath': None}, 'recruiter': {'recruiterId': 98899, 'recruiterName': 'Afzal', 'designation': 'Recruiter', 'profilePicUrl': 'https://edgar.hirist.com/media/recruiterpics/2022/01/25/2022-01-25-19-12-23-98899.jpg', 'logoPath': '', 'recruiterActions': 11}, 'jobStatusInfo': None, 'location': [{'id': 3, 'name': 'Bangalore'}, {'id': 6, 'name': 'Chennai'}, {'id': 84, 'name': 'Coimbatore'}, {'id': 17, 'name': 'Kerala'}], 'saved': 0, 'applied': 0}, {'id': 967513, 'title': 'Software Test Engineer - Java/Selenium (0-2 yrs)', 'introText': "<p>Immediate joiners required for a reputed client <br/><br/>Only Male Kerala candidates <br/><br/>Position : Software Test Engineer<br/><br/>Experience : 0-2 years<br/><br/>Job