很棘手,你能解决这个问题吗?
It's tricky, can you solve the issue?
Output I want
我想要附加作者列表并在一个单元格中,我可以获得该列表,但并非所有作者都在网站中提到了角色,因此我想要具有其角色的作者。我想要的输出附在上面。参见 link。这对我来说很棘手,有人可以解决这个问题。期待答案,如果有任何帮助,我将不胜感激。谢谢。
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()
site = 'https://www.goodreads.com/search?q=chughtai&qid=WzdWh5nG8z'
driver.get(site)
driver.maximize_window()
roles = []
authors = []
main = driver.find_elements_by_tag_name('tr')
for i in main:
role = []
author = []
con = i.find_elements_by_xpath('.//div[@class="authorName__container"]')
try:
for n in con:
auth = n.find_element_by_xpath('.//a[@class="authorName"]/span').text
rol = n.find_element_by_xpath('.//span[@class="authorName greyText smallText role"]').text
author.append(auth)
if rol:
role.append(rol)
one = ', '.join(role)
roles.append(auth + ' ' + rol)
else:
continue
one_cell = ', '.join(author)
authors.append(one_cell)
except:
pass
a = {'Author Name': authors,'Role': roles}
df = pd.DataFrame.from_dict(a, orient='index')
df = df.transpose()
df.to_csv("only_roles.csv", index=False)
print(df)
不知怎么的,我无法运行通过你的代码得到所有的书,所以我修改了它,请把我版本中有用的部分带到你的版本中。我在代码注释中的解释。
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
driver = webdriver.Chrome('...')
site = 'https://www.goodreads.com/search?q=chughtai&qid=WzdWh5nG8z'
driver.get(site)
driver.maximize_window()
data = [] # pandas can convert a list of dictionaries to a dataframe. Dictionary keys are column names.
for tr in driver.find_elements_by_tag_name('tr'):
# one tr for one book
# I chose the following as check for a book because it worked for the webpage
if tr.get_attribute('itemtype') != 'http://schema.org/Book':
continue # Not a book
temp = {'Author Names': [], 'Role': []}
for con in tr.find_elements_by_class_name('authorName__container'):
# one container for one author
try:
authorName = con.find_element_by_class_name('authorName').find_element_by_tag_name('span').text
temp['Author Names'].append(authorName)
authorRole = con.find_element_by_class_name('role').text
temp['Role'].append(f'{authorName} {authorRole}')
except NoSuchElementException:
pass # ignore this one
except Exception as e:
print(e) # print this one for inspection
# convert lists to strings
data.append({k: ','.join(v) for k,v in temp.items()})
df = pd.DataFrame(data)
print(df)
Author Names \
0 Ismat Chughtai,M. Asaduddin
1 Ismat Chughtai
2 Muhammad Umar Memon,M. Asaduddin,Ismat Chughtai
3 Ismat Chughtai,Tahira Naqvi
4 Ismat Chughtai,Amar Shahid
5 Ismat Chughtai,Tahira Naqvi,Syeda S. Hameed
6 Ismat Chughtai
7 Hephaestus Books
8 Ismat Chughtai,Tahira Naqvi
9 Rakhshanda Jalil
10 Ismat Chughtai
11 Ismat Chughtai
12 Ismat Chughtai
13 Azeem Baig Chughtai
14 Ismat Chughtai
15 Ismat Chughtai
16 Ismat Chughtai
17 Ismat Chughtai
18 Ismat Chughtai,Tahira Naqvi
19 Hephaestus Books
Role
0 M. Asaduddin (Translator)
1
2 Muhammad Umar Memon (Translator),M. Asaduddin ...
3 Tahira Naqvi (Translator)
4 Amar Shahid (Compiler)
5 Tahira Naqvi (Translator),Syeda S. Hameed (Tra...
6
7
8 Tahira Naqvi (Translator)
9 Rakhshanda Jalil (Editor)
10
11
12
13
14
15
16
17
18 Tahira Naqvi (Translator)
19
Output I want
我想要附加作者列表并在一个单元格中,我可以获得该列表,但并非所有作者都在网站中提到了角色,因此我想要具有其角色的作者。我想要的输出附在上面。参见 link。这对我来说很棘手,有人可以解决这个问题。期待答案,如果有任何帮助,我将不胜感激。谢谢。
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome()
site = 'https://www.goodreads.com/search?q=chughtai&qid=WzdWh5nG8z'
driver.get(site)
driver.maximize_window()
roles = []
authors = []
main = driver.find_elements_by_tag_name('tr')
for i in main:
role = []
author = []
con = i.find_elements_by_xpath('.//div[@class="authorName__container"]')
try:
for n in con:
auth = n.find_element_by_xpath('.//a[@class="authorName"]/span').text
rol = n.find_element_by_xpath('.//span[@class="authorName greyText smallText role"]').text
author.append(auth)
if rol:
role.append(rol)
one = ', '.join(role)
roles.append(auth + ' ' + rol)
else:
continue
one_cell = ', '.join(author)
authors.append(one_cell)
except:
pass
a = {'Author Name': authors,'Role': roles}
df = pd.DataFrame.from_dict(a, orient='index')
df = df.transpose()
df.to_csv("only_roles.csv", index=False)
print(df)
不知怎么的,我无法运行通过你的代码得到所有的书,所以我修改了它,请把我版本中有用的部分带到你的版本中。我在代码注释中的解释。
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
driver = webdriver.Chrome('...')
site = 'https://www.goodreads.com/search?q=chughtai&qid=WzdWh5nG8z'
driver.get(site)
driver.maximize_window()
data = [] # pandas can convert a list of dictionaries to a dataframe. Dictionary keys are column names.
for tr in driver.find_elements_by_tag_name('tr'):
# one tr for one book
# I chose the following as check for a book because it worked for the webpage
if tr.get_attribute('itemtype') != 'http://schema.org/Book':
continue # Not a book
temp = {'Author Names': [], 'Role': []}
for con in tr.find_elements_by_class_name('authorName__container'):
# one container for one author
try:
authorName = con.find_element_by_class_name('authorName').find_element_by_tag_name('span').text
temp['Author Names'].append(authorName)
authorRole = con.find_element_by_class_name('role').text
temp['Role'].append(f'{authorName} {authorRole}')
except NoSuchElementException:
pass # ignore this one
except Exception as e:
print(e) # print this one for inspection
# convert lists to strings
data.append({k: ','.join(v) for k,v in temp.items()})
df = pd.DataFrame(data)
print(df)
Author Names \
0 Ismat Chughtai,M. Asaduddin
1 Ismat Chughtai
2 Muhammad Umar Memon,M. Asaduddin,Ismat Chughtai
3 Ismat Chughtai,Tahira Naqvi
4 Ismat Chughtai,Amar Shahid
5 Ismat Chughtai,Tahira Naqvi,Syeda S. Hameed
6 Ismat Chughtai
7 Hephaestus Books
8 Ismat Chughtai,Tahira Naqvi
9 Rakhshanda Jalil
10 Ismat Chughtai
11 Ismat Chughtai
12 Ismat Chughtai
13 Azeem Baig Chughtai
14 Ismat Chughtai
15 Ismat Chughtai
16 Ismat Chughtai
17 Ismat Chughtai
18 Ismat Chughtai,Tahira Naqvi
19 Hephaestus Books
Role
0 M. Asaduddin (Translator)
1
2 Muhammad Umar Memon (Translator),M. Asaduddin ...
3 Tahira Naqvi (Translator)
4 Amar Shahid (Compiler)
5 Tahira Naqvi (Translator),Syeda S. Hameed (Tra...
6
7
8 Tahira Naqvi (Translator)
9 Rakhshanda Jalil (Editor)
10
11
12
13
14
15
16
17
18 Tahira Naqvi (Translator)
19