使用 pandas 从网站抓取 table 并保存到 csv 文件
scraping a table from website using pandas and saving to csv file
我是 python 的新手,我使用 pandas 从网站上抓取了一个 table 并将其保存为 csv 文件,并且 运行 每 60 个循环中的代码秒。
我希望每次循环运行时文件名都不同或编号。我试过以下
import pandas as pd
import time
starttime = time.time()
i=1
while True:
url = 'https://www.moneycontrol.com/india/indexfutures/nifty/9/2021-05-27/OPTIDX/CE/12800.00/true
optionchain = pd.read_html(url,attrs = {'class' : 'tblopt'})
chaindata = pd.DataFrame(optionchain[1])
chaindata1 = chaindata.rename(columns={0:'LTPcall',1:'Net Change',2:'Volume',3:'Open
Interest',4:'Change In Open Int',5:'StrikePrice',6:'LTPput',7:'Net Change',8:'Volume',9:'Open
Interest',10:'Change In Open Int'})
s = 'file'
x = (s+str(i))
chaindata1.to_csv(r'C:\Users\dell\Desktop\data\%x.csv')
i=+1
time.sleep(60.0 - ((time.time() - starttime) % 60.0))
通过 运行 我得到的第一个文件是 file1,然后是 file(x),它不断覆盖 file(x) 我希望它流向 file1、file2、file3 等等
i=+1
没有做任何事情,它只是将 +1
分配给 i
。另外,您可以使用 str.format
来格式化文件名。例如:
import pandas as pd
import time
starttime = time.time()
i = 1
while True:
url = "https://www.moneycontrol.com/india/indexfutures/nifty/9/2021-05-27/OPTIDX/CE/12800.00/true"
optionchain = pd.read_html(url, attrs={"class": "tblopt"})
chaindata = pd.DataFrame(optionchain[1])
chaindata1 = chaindata.rename(
columns={
0: "LTPcall",
1: "Net Change",
2: "Volume",
3: "Open Interest",
4: "Change In Open Int",
5: "StrikePrice",
6: "LTPput",
7: "Net Change",
8: "Volume",
9: "Open Interest",
10: "Change In Open Int",
}
)
chaindata1.to_csv(
r"C:\Users\dell\Desktop\data\file{}.csv".format(i)
) # <-- use str.format here
i += 1 # <-- use i += 1 instead of i = +1
time.sleep(60.0 - ((time.time() - starttime) % 60.0))
我是 python 的新手,我使用 pandas 从网站上抓取了一个 table 并将其保存为 csv 文件,并且 运行 每 60 个循环中的代码秒。 我希望每次循环运行时文件名都不同或编号。我试过以下
import pandas as pd
import time
starttime = time.time()
i=1
while True:
url = 'https://www.moneycontrol.com/india/indexfutures/nifty/9/2021-05-27/OPTIDX/CE/12800.00/true
optionchain = pd.read_html(url,attrs = {'class' : 'tblopt'})
chaindata = pd.DataFrame(optionchain[1])
chaindata1 = chaindata.rename(columns={0:'LTPcall',1:'Net Change',2:'Volume',3:'Open
Interest',4:'Change In Open Int',5:'StrikePrice',6:'LTPput',7:'Net Change',8:'Volume',9:'Open
Interest',10:'Change In Open Int'})
s = 'file'
x = (s+str(i))
chaindata1.to_csv(r'C:\Users\dell\Desktop\data\%x.csv')
i=+1
time.sleep(60.0 - ((time.time() - starttime) % 60.0))
通过 运行 我得到的第一个文件是 file1,然后是 file(x),它不断覆盖 file(x) 我希望它流向 file1、file2、file3 等等
i=+1
没有做任何事情,它只是将 +1
分配给 i
。另外,您可以使用 str.format
来格式化文件名。例如:
import pandas as pd
import time
starttime = time.time()
i = 1
while True:
url = "https://www.moneycontrol.com/india/indexfutures/nifty/9/2021-05-27/OPTIDX/CE/12800.00/true"
optionchain = pd.read_html(url, attrs={"class": "tblopt"})
chaindata = pd.DataFrame(optionchain[1])
chaindata1 = chaindata.rename(
columns={
0: "LTPcall",
1: "Net Change",
2: "Volume",
3: "Open Interest",
4: "Change In Open Int",
5: "StrikePrice",
6: "LTPput",
7: "Net Change",
8: "Volume",
9: "Open Interest",
10: "Change In Open Int",
}
)
chaindata1.to_csv(
r"C:\Users\dell\Desktop\data\file{}.csv".format(i)
) # <-- use str.format here
i += 1 # <-- use i += 1 instead of i = +1
time.sleep(60.0 - ((time.time() - starttime) % 60.0))