使用循环写入文件
Using loop to write in files
我正在尝试从我一直在观看的所有教程中下载原始代码,所以我做了这个:
import requests
from urllib import request
from bs4 import BeautifulSoup
page_url='https://github.com/buckyroberts/Source-Code-from-
Tutorials/tree/master/Python'
def page(main_url):
code=requests.get(main_url)
text=code.text
soup=BeautifulSoup(text, "html.parser")
for link in soup.findAll('a', {'class': 'js-navigation-open'}):
code_url='https://github.com'+link.get('href')
codelist(code_url)
def codelist(sec_url):
code = requests.get(sec_url)
text = code.text
soup = BeautifulSoup(text, "html.parser")
for link in soup.findAll('a', {'id': 'raw-url'}):
raw_url='https://github.com'+link.get('href')
rawcode(raw_url)
def rawcode(third_url):
response = request.urlopen(third_url)
txt = response.read()
lines = txt.split("\n")
dest_url = r'go.py'
fx = open(dest_url, "w")
for line in lines:
fx.write(line + "\n")
fx.close()
page(page_url)
当我 运行 这段代码期望创建 40 个由 40 种不同代码组成的 py 文件时-https://github.com/buckyroberts/Source-Code-from-Tutorials/tree/master/Python
但它不起作用。它两次随机选择只下载 40 个文件中的一个。像这样-
在调用第三个函数之前,前两个函数可以很好地协同工作。但是第三个单独工作很好。
我开始学习 Python 4 天前,任何帮助将不胜感激。谢谢大家!
[跟随评论]为了方便地改变你的文件名,你可以添加一个全局变量(这里cp
)如下:
def rawcode(third_url):
global cp
dest_url = r'go_%s.py' % cp
cp += 1
print(dest_url)
cp = 0
page(page_url)
文件名将是“go_X.py"
X 从 0 到文件数
编辑
使用您的代码:
def rawcode(third_url):
response = request.urlopen(third_url)
txt = response.read()
lines = txt.split("\n")
global cp # We say that we will use cp the global variable and not local one
dest_url = r'go_%s.py' % cp
cp += 1 # We increment for further calls
fx = open(dest_url, "w") # We can keep 'w' since we will generate new files at each call
for line in lines:
fx.write(line + "\n")
fx.close()
cp = 0 # Initialisation
page(page_url)
我正在尝试从我一直在观看的所有教程中下载原始代码,所以我做了这个:
import requests
from urllib import request
from bs4 import BeautifulSoup
page_url='https://github.com/buckyroberts/Source-Code-from-
Tutorials/tree/master/Python'
def page(main_url):
code=requests.get(main_url)
text=code.text
soup=BeautifulSoup(text, "html.parser")
for link in soup.findAll('a', {'class': 'js-navigation-open'}):
code_url='https://github.com'+link.get('href')
codelist(code_url)
def codelist(sec_url):
code = requests.get(sec_url)
text = code.text
soup = BeautifulSoup(text, "html.parser")
for link in soup.findAll('a', {'id': 'raw-url'}):
raw_url='https://github.com'+link.get('href')
rawcode(raw_url)
def rawcode(third_url):
response = request.urlopen(third_url)
txt = response.read()
lines = txt.split("\n")
dest_url = r'go.py'
fx = open(dest_url, "w")
for line in lines:
fx.write(line + "\n")
fx.close()
page(page_url)
当我 运行 这段代码期望创建 40 个由 40 种不同代码组成的 py 文件时-https://github.com/buckyroberts/Source-Code-from-Tutorials/tree/master/Python 但它不起作用。它两次随机选择只下载 40 个文件中的一个。像这样-
在调用第三个函数之前,前两个函数可以很好地协同工作。但是第三个单独工作很好。
我开始学习 Python 4 天前,任何帮助将不胜感激。谢谢大家!
[跟随评论]为了方便地改变你的文件名,你可以添加一个全局变量(这里cp
)如下:
def rawcode(third_url):
global cp
dest_url = r'go_%s.py' % cp
cp += 1
print(dest_url)
cp = 0
page(page_url)
文件名将是“go_X.py"
X 从 0 到文件数
编辑 使用您的代码:
def rawcode(third_url):
response = request.urlopen(third_url)
txt = response.read()
lines = txt.split("\n")
global cp # We say that we will use cp the global variable and not local one
dest_url = r'go_%s.py' % cp
cp += 1 # We increment for further calls
fx = open(dest_url, "w") # We can keep 'w' since we will generate new files at each call
for line in lines:
fx.write(line + "\n")
fx.close()
cp = 0 # Initialisation
page(page_url)