在数据框列中使用 beautifulsoup 从网站获取列表
Fetching a list from a website using beautifulsoup in a dataframe column
我正在尝试从文章网站获取关键字。网站关键字如下所示:
`This is the link:` `https://www.horizont.net/marketing/nachrichten/bgh-haendler-haftet-nicht-fuer-kundenbewertungen-auf-amazon-180980`
我正在使用它来获取关键字:
Article_Keyword = bs.find('div', {'class':'ListTags'}).get_text()
这就是我得到的结果:
Themen Bundesgerichtshof Amazon Verband Sozialer Wettbewerb Kundenbewertung Tape dpa
我需要用逗号分隔每个关键字来获取它。我可以通过 RE 做到这一点,但有些关键字不止一个词,所以我需要它作为一个关键字。
有没有办法通过逗号分隔来获取每个关键字?
试试这个
import requests
from bs4 import BeautifulSoup
url = 'https://www.horizont.net/marketing/nachrichten/bgh-haendler-haftet-nicht-fuer-kundenbewertungen-auf-amazon-180980'
page = requests.get(url)
soup1 = BeautifulSoup(page.content, "lxml")
Article_Keyword = soup1.find('div',{'class':'ListTags'}).find_all("a")
Article_Keyword = ", ".join([keyword.text.strip() for keyword in Article_Keyword])
print(Article_Keyword)
试试这个:
Article_Keyword = bs.find('div', {'class':'ListTags'})
aes_Article_Keyword = Article_Keyword.find_all("a")
s_Article_Keyword = ", ".join([x.text for x in aes_Article_Keyword])
我使用子 class 元素来分别标识每个元素。我希望下面的代码有所帮助。
from bs4 import BeautifulSoup as soup
from requests import get
url = "https://www.horizont.net/marketing/nachrichten/bgh-haendler-haftet-nicht-fuer-kundenbewertungen-auf-amazon-180980"
clnt = get(url)
page=soup(clnt.text,"html.parser")
data = page.find('div', attrs={'class':'ListTags'})
data1 = [ele.text for ele in data.find_all('a',attrs={'class':'PageArticle_keyword'})]
print(data1)
print(",".join(data1))
输出:
>> ['Bundesgerichtshof', 'Amazon', 'Verband Sozialer Wettbewerb', 'Kundenbewertung', 'Tape', 'dpa']
>> Bundesgerichtshof,Amazon,Verband Sozialer Wettbewerb,Kundenbewertung,Tape,dpa
如果有用,请确保您批准答案。
我正在尝试从文章网站获取关键字。网站关键字如下所示:
`This is the link:` `https://www.horizont.net/marketing/nachrichten/bgh-haendler-haftet-nicht-fuer-kundenbewertungen-auf-amazon-180980`
我正在使用它来获取关键字:
Article_Keyword = bs.find('div', {'class':'ListTags'}).get_text()
这就是我得到的结果:
Themen Bundesgerichtshof Amazon Verband Sozialer Wettbewerb Kundenbewertung Tape dpa
我需要用逗号分隔每个关键字来获取它。我可以通过 RE 做到这一点,但有些关键字不止一个词,所以我需要它作为一个关键字。
有没有办法通过逗号分隔来获取每个关键字?
试试这个
import requests
from bs4 import BeautifulSoup
url = 'https://www.horizont.net/marketing/nachrichten/bgh-haendler-haftet-nicht-fuer-kundenbewertungen-auf-amazon-180980'
page = requests.get(url)
soup1 = BeautifulSoup(page.content, "lxml")
Article_Keyword = soup1.find('div',{'class':'ListTags'}).find_all("a")
Article_Keyword = ", ".join([keyword.text.strip() for keyword in Article_Keyword])
print(Article_Keyword)
试试这个:
Article_Keyword = bs.find('div', {'class':'ListTags'})
aes_Article_Keyword = Article_Keyword.find_all("a")
s_Article_Keyword = ", ".join([x.text for x in aes_Article_Keyword])
我使用子 class 元素来分别标识每个元素。我希望下面的代码有所帮助。
from bs4 import BeautifulSoup as soup
from requests import get
url = "https://www.horizont.net/marketing/nachrichten/bgh-haendler-haftet-nicht-fuer-kundenbewertungen-auf-amazon-180980"
clnt = get(url)
page=soup(clnt.text,"html.parser")
data = page.find('div', attrs={'class':'ListTags'})
data1 = [ele.text for ele in data.find_all('a',attrs={'class':'PageArticle_keyword'})]
print(data1)
print(",".join(data1))
输出:
>> ['Bundesgerichtshof', 'Amazon', 'Verband Sozialer Wettbewerb', 'Kundenbewertung', 'Tape', 'dpa']
>> Bundesgerichtshof,Amazon,Verband Sozialer Wettbewerb,Kundenbewertung,Tape,dpa
如果有用,请确保您批准答案。