我该如何抓取这个标签?
How do I scrape this tag?
<div id="hide-editing-34536258">1/2 and 2/1 are reciprocals.</div>
这是我要抓取并打印的标签1/2 and 2/1 are reciprocals.
我会通过 get_text()
打印它,但我不知道如何抓取标签。
我能做到。
find_all({"class":"hide-editing-3453658"}
但是要抓取的标签更多,并且在 'high-editing-'
之后它们的编号不同
而且我在数字中找不到任何规则。
谁能帮帮我?
属性是 id
而不是 class
并且您已经在 find_all
方法中提供了您正在查找的标签。您可以使用 regex
查找具有特定模式的所有元素。
In [61]: import re
In [62]: a = """ <div id="hide-editing-34536258">1/2 and 2/1 are reciprocals.</div>
...: <div id="hide-editing-345258">1/4 and 2/1 are reciprocals.</div>
...: <div id="hide-editing-346258">1/5 and 2/1 are reciprocals.</div>
...: """
In [63]: soup = BeautifulSoup(a, "html.parser")
In [64]: all_divs = dates = soup.findAll("div", {"id" : re.compile('hide-editing.*')})
In [65]: all_divs
Out[65]:
[<div id="hide-editing-34536258">1/2 and 2/1 are reciprocals.</div>,
<div id="hide-editing-345258">1/4 and 2/1 are reciprocals.</div>,
<div id="hide-editing-346258">1/5 and 2/1 are reciprocals.</div>]
In [66]: [i.text.strip() for i in all_divs]
Out[66]:
['1/2 and 2/1 are reciprocals.',
'1/4 and 2/1 are reciprocals.',
'1/5 and 2/1 are reciprocals.']
也许你可以试试正则表达式?
import re
text = '<div id="hide-editing-34536258">1/2 and 2/1 are reciprocals.</div>'
parsedText=re.findall('>([^<]+)', text)
print(parsedText[0])
<div id="hide-editing-34536258">1/2 and 2/1 are reciprocals.</div>
这是我要抓取并打印的标签1/2 and 2/1 are reciprocals.
我会通过 get_text()
打印它,但我不知道如何抓取标签。
我能做到。
find_all({"class":"hide-editing-3453658"}
但是要抓取的标签更多,并且在 'high-editing-'
之后它们的编号不同而且我在数字中找不到任何规则。
谁能帮帮我?
属性是 id
而不是 class
并且您已经在 find_all
方法中提供了您正在查找的标签。您可以使用 regex
查找具有特定模式的所有元素。
In [61]: import re
In [62]: a = """ <div id="hide-editing-34536258">1/2 and 2/1 are reciprocals.</div>
...: <div id="hide-editing-345258">1/4 and 2/1 are reciprocals.</div>
...: <div id="hide-editing-346258">1/5 and 2/1 are reciprocals.</div>
...: """
In [63]: soup = BeautifulSoup(a, "html.parser")
In [64]: all_divs = dates = soup.findAll("div", {"id" : re.compile('hide-editing.*')})
In [65]: all_divs
Out[65]:
[<div id="hide-editing-34536258">1/2 and 2/1 are reciprocals.</div>,
<div id="hide-editing-345258">1/4 and 2/1 are reciprocals.</div>,
<div id="hide-editing-346258">1/5 and 2/1 are reciprocals.</div>]
In [66]: [i.text.strip() for i in all_divs]
Out[66]:
['1/2 and 2/1 are reciprocals.',
'1/4 and 2/1 are reciprocals.',
'1/5 and 2/1 are reciprocals.']
也许你可以试试正则表达式?
import re
text = '<div id="hide-editing-34536258">1/2 and 2/1 are reciprocals.</div>'
parsedText=re.findall('>([^<]+)', text)
print(parsedText[0])