Xpath 属性正则表达式

Xpath Attribute REGEX

到目前为止,这是我的代码:

from lxml import html
import requests

page = requests.get('http://web.international.ucla.edu/institute/events')
tree = html.fromstring(page.text)

event_title = tree.xpath('//a[@href="/institute/event/<<<REGEX>>>"]/text()')

print "Event Title: ", event_title

我正在抓取页面 http://web.international.ucla.edu/institute/events 以获取事件数据,我想捕获标题。它们由 5 位数字标识。我该怎么做?

cannot use regular expressions in XPath 1.0(即使正则表达式在那里肯定有用!)。在 XPath 2.0(lxml 不支持)中,正则表达式可用于某些函数,例如 matches()replace().

如果我没理解错的话,你要找的是这条数据:

<a href='/institute/event/11147'>Papel Picado Workshop Series: Session 5</a>

你可以用

找到那些a个元素
//a[starts-with(@href,'/institute/event/')]

但请注意,此 returns 是一个 list 元素 - 而您似乎希望得到一个单独的项目。请更清楚地说明您到底需要什么作为结果。

作为建议,这个怎么样:

from lxml import html
import requests

page = requests.get('http://web.international.ucla.edu/institute/events')
tree = html.fromstring(page.text)

event_titles = tree.xpath('//a[starts-with(@href,"/institute/event/")]/text()')

for event_title in event_titles:
    print "Event Title: ", event_title

结果将是

Event Title:  Papel Picado Workshop Series: Session 5
Event Title:  Cacahuatl: The Origins and Global Impact of Chocolate
Event Title:  “Institutionalizing Numbers in Post-Colonial Africa”
Event Title:  The Daniel Pearl Memorial Lecture presents A Conversation with Leon Panetta, part of the Luskin Lecture Series
Event Title:  Persian Women and Other Lies: Story-telling as Historical Retrieval
Event Title:  UCLA EVENT: Making Micronesia
Event Title:  Teach-In: Out of Nowhere? Some Questions, Answers, and Discussion about ISIS
Event Title:  Impossible Testimonies:  Literature and Aesthetics in the Aftermath of the Armenian Genocide
Event Title:  “Casa Grande” Film Screening
Event Title:  The Headscarf Debates: Conflicts of National Belonging
Event Title:  Rethinking History in Chinese Central Asia
Event Title:  Screening: "REBEL: Loreta Velazquez, Civil War Soldier and Spy"
Event Title:  "How Terrorism is Designed to Work"
Event Title:  Matthäus Rest Talk - Dreaming of Pipes: The politics of in/visibility around Nepal’s spectral infrastructures
Event Title:  The Barber of Damascus: Nouveau Literacy in the Eighteenth-Century Levant
Event Title:  Representation of "Apology": a Comparative Study on Narratives by Korean and Japanese Media
Event Title:  "They Can Live in the Desert but Nowhere Else": A History of the Armenian Genocide
Event Title:  Colloquium: Towards a contents-platform conglomerate?
Event Title:  Picturing Political Abstractions in Song/Jin Painting
Event Title:  ISIS and the Enslavement and Trafficking of Women: An Evening with Dr. Khaled Abou El Fadi
Event Title:  Korean Culture Night
Event Title:  Genocide and Global History: A Conference on the 100th Anniversary of the Armenian Genocide
Event Title:  U.S.-China: Economic Ties, Growth Strategies and Investment Opportunities
Event Title:  Human Rights and the Armenian Genocide
Event Title:  Gerschenkron Redux? New Evidence on Shanghai's Pre-War Stock Exchange and Its Implications for the Chinese Economy at Present