从列表中将数据写入 csv 文件后,某些列单元格为空
After writing data to a csv file from a list, some columns cells are empty
我有一个代码可以从烂番茄网站上获取前 100 部电影。解析后,数据被放入列表中。这是代码:
# create and write headers to a list
rows = []
rows.append(['Rank', 'Rating', 'Title', 'No. of Reviews'])
print(rows)
# loop over results
for result in results:
# find all columns per result
data = result.find_all('td')
# check that columns have data
if len(data) == 0:
continue
# write columns to variables
rank = data[0].getText()
rating = data[1].getText()
title = data[2].getText()
reviews = data[3].getText()
# write each result to rows
rows.append([rank, rating, title, reviews])
print(rows)
输出如下所示:
[['Rank', 'Rating', 'Title', 'No. of Reviews'], ['1.', '\n\n\n\xa096%\n\n', '\n\n Black Panther (2018)\n', '503'], ['2.', '\n\n\n\xa094%\n\n', '\n\n Avengers: Endgame (2019)\n', '514'], ['3.', '\n\n\n\xa093%\n\n', '\n\n Us (2019)\n', '520'], ['4.', '\n\n\n\xa097%\n\n', '\n\n Toy Story 4 (2019)\n', '433'], ['5.', '\n\n\n\xa098%\n\n', '\n\n The Wizard of Oz (1939)\n', '117'], ['6.', '\n\n\n\xa099%\n\n', '\n\n Lady Bird (2017)\n', '388']...
然后我将数据写入csv文件。
# Create csv and write rows to output file
with open('rottentomato.csv','w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(rows)
但只有 'Rank' 和 'No. of Reviews' 列有数据。 'Rating' 和 'Title' 列为空。
我试图重现您的问题,但我发现的唯一问题是创建空格的特殊字符。你可以用 strip
清理那些
import csv
rows = [['Rank', 'Rating', 'Title', 'No. of Reviews'], ['1.', '\n\n\n\xa096%\n\n', '\n\nBlack Panther (2018)\n', '503'], ['2.', '\n\n\n\xa094%\n\n', '\n\nAvengers: Endgame (2019)\n', '514'], ['3.', '\n\n\n\xa093%\n\n', '\n\nUs (2019)\n', '520'], ['4.', '\n\n\n\xa097%\n\n', '\n\nToy Story 4 (2019)\n', '433'], ['5.', '\n\n\n\xa098%\n\n', '\n\nThe Wizard of Oz (1939)\n', '117'], ['6.', '\n\n\n\xa099%\n\n', '\n\nLady Bird (2017)\n', '388']]
for i, row in enumerate(rows):
for j, data in enumerate(row):
rows[i][j] = data.strip()
with open('rottentomato.csv','w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(rows)
这是我得到的输出:
排名,评级,标题,编号评论数
1.,96%,黑豹 (2018),503
2.,94%,复仇者联盟4:终局之战 (2019),514
3.,93%,美国 (2019),520
4.,97%,反斗奇兵4 (2019),433
5.,98%,绿野仙踪 (1939),117
6.,99%,伯德小姐 (2017),388
您可以使用 pandas 完成大部分繁重的工作。
import pandas as pd
pd.read_html(
'https://www.rottentomatoes.com/top/bestofrt/'
)[2].to_csv(
'rottentomatoes.csv',
index=False
)
我有一个代码可以从烂番茄网站上获取前 100 部电影。解析后,数据被放入列表中。这是代码:
# create and write headers to a list
rows = []
rows.append(['Rank', 'Rating', 'Title', 'No. of Reviews'])
print(rows)
# loop over results
for result in results:
# find all columns per result
data = result.find_all('td')
# check that columns have data
if len(data) == 0:
continue
# write columns to variables
rank = data[0].getText()
rating = data[1].getText()
title = data[2].getText()
reviews = data[3].getText()
# write each result to rows
rows.append([rank, rating, title, reviews])
print(rows)
输出如下所示:
[['Rank', 'Rating', 'Title', 'No. of Reviews'], ['1.', '\n\n\n\xa096%\n\n', '\n\n Black Panther (2018)\n', '503'], ['2.', '\n\n\n\xa094%\n\n', '\n\n Avengers: Endgame (2019)\n', '514'], ['3.', '\n\n\n\xa093%\n\n', '\n\n Us (2019)\n', '520'], ['4.', '\n\n\n\xa097%\n\n', '\n\n Toy Story 4 (2019)\n', '433'], ['5.', '\n\n\n\xa098%\n\n', '\n\n The Wizard of Oz (1939)\n', '117'], ['6.', '\n\n\n\xa099%\n\n', '\n\n Lady Bird (2017)\n', '388']...
然后我将数据写入csv文件。
# Create csv and write rows to output file
with open('rottentomato.csv','w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(rows)
但只有 'Rank' 和 'No. of Reviews' 列有数据。 'Rating' 和 'Title' 列为空。
我试图重现您的问题,但我发现的唯一问题是创建空格的特殊字符。你可以用 strip
清理那些import csv
rows = [['Rank', 'Rating', 'Title', 'No. of Reviews'], ['1.', '\n\n\n\xa096%\n\n', '\n\nBlack Panther (2018)\n', '503'], ['2.', '\n\n\n\xa094%\n\n', '\n\nAvengers: Endgame (2019)\n', '514'], ['3.', '\n\n\n\xa093%\n\n', '\n\nUs (2019)\n', '520'], ['4.', '\n\n\n\xa097%\n\n', '\n\nToy Story 4 (2019)\n', '433'], ['5.', '\n\n\n\xa098%\n\n', '\n\nThe Wizard of Oz (1939)\n', '117'], ['6.', '\n\n\n\xa099%\n\n', '\n\nLady Bird (2017)\n', '388']]
for i, row in enumerate(rows):
for j, data in enumerate(row):
rows[i][j] = data.strip()
with open('rottentomato.csv','w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(rows)
这是我得到的输出:
排名,评级,标题,编号评论数
1.,96%,黑豹 (2018),503
2.,94%,复仇者联盟4:终局之战 (2019),514
3.,93%,美国 (2019),520
4.,97%,反斗奇兵4 (2019),433
5.,98%,绿野仙踪 (1939),117
6.,99%,伯德小姐 (2017),388
您可以使用 pandas 完成大部分繁重的工作。
import pandas as pd
pd.read_html(
'https://www.rottentomatoes.com/top/bestofrt/'
)[2].to_csv(
'rottentomatoes.csv',
index=False
)