将 LXML 与 Html、请求和 ETree 一起使用,它提供链接,但不会让我搜索特定文本的链接

Using LXML with Html, Requests, and ETree, it gives links, but wont let me search links for specific text

我正在尝试从下面提供的 link 中提取特定数据。当我 运行 代码时,它按预期为我提供了所有 href link,但是当我尝试进一步测试同一字符串时,但使用包含语法,它返回为空。

我已经阅读了文档和 DevHints,无论我在哪里看,"Contains" 语法都是推荐的方法来捕获我正在寻找的内容,而我所知道的是语法将被包含在内,但不是在哪里或如何。

我正在尝试构建一个爬虫来帮助很多最近被解雇的人找到新工作,因此非常感谢任何帮助。

代码:

from lxml import html, etree
import requests

page = requests.get('https://ea.gr8people.com/index.gp?method=cappportal.showPortalSearch&sysLayoutID=123')

# print(page.content)

tree = html.fromstring(page.content)

print(tree)
# Select All Nodes

AllNodes = tree.xpath("//*")

# Select Only hyperlink nodes

AllHyperLinkNodes = tree.xpath("//*/a")

# Iterate through all Node Links

for node in AllHyperLinkNodes:
        print(node.values())

print("======================================================================================================================")

# select using a condition 'contains'
# NodeThatContains = tree.xpath('//td[@class="search-results-column-left"]/text()')
NodeThatContains = tree.xpath('//*/a[contains(text(),"opportunityid")]')

for node in NodeThatContains:
        print(node.values())

# Print the link that 'contains' the text
# print(NodeThatContains[0].values())

BeautifulSoup 基于解决方案

from bs4 import BeautifulSoup
import requests

page = requests.get('https://ea.gr8people.com/index.gp?method=cappportal.showPortalSearch&sysLayoutID=123').content

soup = BeautifulSoup(page, 'html.parser')
links = soup.find_all('a')
links = [a for a in links if a.attrs.get('href') and 'opportunityid' in a.attrs.get('href')]
print('-- opportunities --')
for idx, link in enumerate(links):
    print('{}) {}'.format(idx, link))

输出

-- opportunities --
0) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=154761&amp;opportunityid=154761">
                                        2D Capture Artist - 6 month contract
                                    </a>
1) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=154426&amp;opportunityid=154426">
                                        Accounting Supervisor
                                    </a>
2) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=152147&amp;opportunityid=152147">
                                        Advanced Analyst
                                    </a>
3) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=153395&amp;opportunityid=153395">
                                        Advanced UX Researcher
                                    </a>
4) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=151309&amp;opportunityid=151309">
                                        AI Engineer
                                    </a>
5) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=150468&amp;opportunityid=150468">
                                        AI Scientist
                                    </a>
6) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=151310&amp;opportunityid=151310">
                                        AI Scientist - NLP Focus
                                    </a>
7) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=153351&amp;opportunityid=153351">
                                        AI Software Engineer (Apex Legends)
                                    </a>
8) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=152737&amp;opportunityid=152737">
                                        AI Software Engineer (Frostbite)
                                    </a>
9) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=154764&amp;opportunityid=154764">
                                        Analyste Qualité Sénior / Senior Quality Analyst
                                    </a>
10) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=153948&amp;opportunityid=153948">
                                        Animator 1
                                    </a>
11) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=151353&amp;opportunityid=151353">
                                        Applications Agreement Analyst
                                    </a>
12) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=154668&amp;opportunityid=154668">
                                        AR Analyst I
                                    </a>
13) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=153609&amp;opportunityid=153609">
                                        AR Specialist
                                    </a>
14) <a href="index.gp?method=cappportal.showJob&amp;layoutid=2092&amp;inp1541=&amp;inp1375=154773&amp;opportunityid=154773">
                                        Artiste Audio / Audio Artist
                                    </a>