有没有比每次循环后清空我的列表更好的方法?
Is there a better way than empty my list after each loop ?
我是 Python 的新手。我刚刚编写了一个新脚本来从多个 Google 分析配置文件中导出一些数据。它工作得很好,但我确信它写得很糟糕。
我真的不知道从哪里开始改进它,所以这是我的第一个问题。
我正在遍历配置文件 ID 列表。对于每个配置文件 ID,我有几个操作,我在其中使用 append
方法。所以我正在逐步构建一些列表,但最后我需要重置这些列表。所以我在代码的开头和结尾都有这样的东西:
fullurllist = []
urllist = []
share = []
sharelist = []
sharelist1 = []
end_list = []
我想我应该避免这种情况。我需要更改代码的所有逻辑吗?还有什么我可以做的来改善这方面的。
代码如下:
# Loop through the profiles_list and get the best pages for each profile
for profile in profiles_list:
response = service.data().ga().get(
ids='ga:' + profile,
start_date='1daysAgo',
end_date='today',
metrics='ga:sessions',
dimensions='ga:pagePath',
sort='-ga:sessions',
filters='ga:sessions>400').execute()
# Catch response.
rawdata = response.get('rows', [])
# Flatten response (which is a list of lists)
for row in rawdata:
urllist.append(row[0])
# Building a list of full url (Hostname + Page path)
fullurllist = [urljoin(base, h) for h in urllist]
# Scraping some data from the url list
for url in fullurllist:
try:
page = urllib2.urlopen(url)
except urllib2.HTTPError as e:
if e.getcode() == 404: # eheck the return code
continue
soup = BeautifulSoup(page, 'html.parser')
# Take out the <div> of name and get its value
name_box = soup.find(attrs={'class': 'nb-shares'})
if name_box is None:
continue
share = name_box.text.strip() # strip() is used to remove starting and trailing
# save the data in tuple
sharelist.append(url)
sharelist1.append(share)
# Format the data scraped
end_list = [int(1000*float(x.replace('k', ''))) if 'k' in x else int(x) for x in sharelist1]
#export in csv
csv_out = open(response.get('profileInfo').get('profileName') + '.csv', 'wb')
mywriter = csv.writer(csv_out)
for row in zip(sharelist, end_list):
mywriter.writerow([row])
csv_out.close()
#reset list
fullurllist = []
urllist = []
share = []
sharelist = []
sharelist1 = []
end_list = []
非常感谢!
更合适的做法是在 for 循环的顶部而不是外部进行声明 (fullurlllist = []
)。
无论如何,它们应该只存在于循环中
我是 Python 的新手。我刚刚编写了一个新脚本来从多个 Google 分析配置文件中导出一些数据。它工作得很好,但我确信它写得很糟糕。
我真的不知道从哪里开始改进它,所以这是我的第一个问题。
我正在遍历配置文件 ID 列表。对于每个配置文件 ID,我有几个操作,我在其中使用 append
方法。所以我正在逐步构建一些列表,但最后我需要重置这些列表。所以我在代码的开头和结尾都有这样的东西:
fullurllist = []
urllist = []
share = []
sharelist = []
sharelist1 = []
end_list = []
我想我应该避免这种情况。我需要更改代码的所有逻辑吗?还有什么我可以做的来改善这方面的。
代码如下:
# Loop through the profiles_list and get the best pages for each profile
for profile in profiles_list:
response = service.data().ga().get(
ids='ga:' + profile,
start_date='1daysAgo',
end_date='today',
metrics='ga:sessions',
dimensions='ga:pagePath',
sort='-ga:sessions',
filters='ga:sessions>400').execute()
# Catch response.
rawdata = response.get('rows', [])
# Flatten response (which is a list of lists)
for row in rawdata:
urllist.append(row[0])
# Building a list of full url (Hostname + Page path)
fullurllist = [urljoin(base, h) for h in urllist]
# Scraping some data from the url list
for url in fullurllist:
try:
page = urllib2.urlopen(url)
except urllib2.HTTPError as e:
if e.getcode() == 404: # eheck the return code
continue
soup = BeautifulSoup(page, 'html.parser')
# Take out the <div> of name and get its value
name_box = soup.find(attrs={'class': 'nb-shares'})
if name_box is None:
continue
share = name_box.text.strip() # strip() is used to remove starting and trailing
# save the data in tuple
sharelist.append(url)
sharelist1.append(share)
# Format the data scraped
end_list = [int(1000*float(x.replace('k', ''))) if 'k' in x else int(x) for x in sharelist1]
#export in csv
csv_out = open(response.get('profileInfo').get('profileName') + '.csv', 'wb')
mywriter = csv.writer(csv_out)
for row in zip(sharelist, end_list):
mywriter.writerow([row])
csv_out.close()
#reset list
fullurllist = []
urllist = []
share = []
sharelist = []
sharelist1 = []
end_list = []
非常感谢!
更合适的做法是在 for 循环的顶部而不是外部进行声明 (fullurlllist = []
)。
无论如何,它们应该只存在于循环中