如何将 Pandas to_excel 的结果(url)从字符串转换为 DataFrame?
How to convert results (urls) from string to DataFrame for Pandas to_excel?
我的代码:
from bs4 import BeautifulSoup as soup
from numpy.lib.function_base import extract
import requests
import pandas as pd
Scraper2Excel = "C:\Users\Ashley\FromPython3.xlsx"
writer = pd.ExcelWriter(Scraper2Excel, engine='xlsxwriter')
READ = "C:\Users\Ashley\URLs List.xlsx"
Tickers1 = pd.read_excel(READ, sheet_name='Tickers', header=None)
Tickers = Tickers1.values.ravel()
print(Tickers)
UniformResourceLocators = pd.read_excel(READ, sheet_name= 'URLs', header=None, skiprows=1)
UniformResourceLocatorsTitles = pd.read_excel(READ, sheet_name='URLs', header=None, nrows=1).values[0]
UniformResourceLocators.columns = UniformResourceLocatorsTitles
URLs = UniformResourceLocators['Company News URL']
tick = UniformResourceLocators['Tickers']
startrow =0
for i in Tickers:
s = Tickers1.loc[(Tickers1[0]==i)]
print(s)
s.to_excel(writer, sheet_name='Sheet1', startrow= startrow, startcol= 0, header=False, index=False)
startrow += 1
url = URLs.loc[(tick==i)]
print(url)
for i in url:
html_text = requests.get(i).text
chickennoodle = soup(html_text, 'html.parser')
for link in chickennoodle.find_all('a'):
my_links = (link.get('href'))
print(my_links)
我卡在这里了。 my_links
以字符串格式打印一堆 URL,我想将它们输出到 excel 文件。我一直没能找到将它转换为 DataFrame 的方法,所以 pandas 让我使用 to_excel
。我是新手,所以感谢您的帮助。
#df = my_links??
df.to_excel(writer, sheet_name='Sheet2', startrow= startrow, startcol=0, header=False, index=False)
startrow += 1
writer.save()
我要做的是在 for 循环之前启动一个空列表,然后将 'my_links' 附加到循环中的该列表。
然后在代码的末尾,您可以将该列表转换为 df 的列,然后再导出到 excel。
像
mylinksList=[]
df=pd.DataFrame()
for i in url:
html_text = requests.get(i).text
chickennoodle = soup(html_text, 'html.parser')
for link in chickennoodle.find_all('a'):
my_links = (link.get('href'))
mylinksList.append(my_links)
df['links']=pd.Series(mylinksList)
我的代码:
from bs4 import BeautifulSoup as soup
from numpy.lib.function_base import extract
import requests
import pandas as pd
Scraper2Excel = "C:\Users\Ashley\FromPython3.xlsx"
writer = pd.ExcelWriter(Scraper2Excel, engine='xlsxwriter')
READ = "C:\Users\Ashley\URLs List.xlsx"
Tickers1 = pd.read_excel(READ, sheet_name='Tickers', header=None)
Tickers = Tickers1.values.ravel()
print(Tickers)
UniformResourceLocators = pd.read_excel(READ, sheet_name= 'URLs', header=None, skiprows=1)
UniformResourceLocatorsTitles = pd.read_excel(READ, sheet_name='URLs', header=None, nrows=1).values[0]
UniformResourceLocators.columns = UniformResourceLocatorsTitles
URLs = UniformResourceLocators['Company News URL']
tick = UniformResourceLocators['Tickers']
startrow =0
for i in Tickers:
s = Tickers1.loc[(Tickers1[0]==i)]
print(s)
s.to_excel(writer, sheet_name='Sheet1', startrow= startrow, startcol= 0, header=False, index=False)
startrow += 1
url = URLs.loc[(tick==i)]
print(url)
for i in url:
html_text = requests.get(i).text
chickennoodle = soup(html_text, 'html.parser')
for link in chickennoodle.find_all('a'):
my_links = (link.get('href'))
print(my_links)
我卡在这里了。 my_links
以字符串格式打印一堆 URL,我想将它们输出到 excel 文件。我一直没能找到将它转换为 DataFrame 的方法,所以 pandas 让我使用 to_excel
。我是新手,所以感谢您的帮助。
#df = my_links??
df.to_excel(writer, sheet_name='Sheet2', startrow= startrow, startcol=0, header=False, index=False)
startrow += 1
writer.save()
我要做的是在 for 循环之前启动一个空列表,然后将 'my_links' 附加到循环中的该列表。
然后在代码的末尾,您可以将该列表转换为 df 的列,然后再导出到 excel。 像
mylinksList=[]
df=pd.DataFrame()
for i in url:
html_text = requests.get(i).text
chickennoodle = soup(html_text, 'html.parser')
for link in chickennoodle.find_all('a'):
my_links = (link.get('href'))
mylinksList.append(my_links)
df['links']=pd.Series(mylinksList)