从网页抓取信息后如何创建 Python CSV 文件?
How can I create a Python CSV file after scraping information from a web page?
我正在编写代码(是的,我是新手)从 facebook 的页面中提取信息。我正在使用 facebook-scraper 来获取信息。我需要创建一个 CSV 文件来存储这些信息,但我总是一无所获。
原码
from facebook_scraper import get_posts
for post in get_posts('bibliotecaunespbauru', pages=66):
print(post['time']) # não funciona
print(post['post_id'])
print(post['text'])
print(post['image'])
print(post['video'])
print(post['likes'])
print(post['comments'])
print(post['shares'])
print(post['link'])
要存储在 CSV 文件中的代码。
import csv
from facebook_scraper import get_posts
for post in get_posts('bibliotecaunespbauru', pages=10):
data = [print(post['post_id']), print(post['text']), print(post['image'])]
with open("facebook.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(data)
with open('facebook.csv', newline='') as csvfile:
data = csv.reader(csvfile, delimiter=' ')
for row in data:
print(', '.join(row))
嘿,非常感谢。现在很有意义。但是,它仍然不起作用,因为现在它只检索一个请求,而不是 10 页。
import csv
from facebook_scraper import get_posts
for post in get_posts('bibliotecaunespbauru', pages=10):
data = [post['post_id'], post['text'], post['image']]
with open("facebook.csv", "a", newline="") as f:
writer = csv.writer(f)
writer.writerow(data)
with open('facebook.csv', newline='') as csvfile:
data = csv.reader(csvfile, delimiter=' ')
for row in data:
print(', '.join(row))
第三次尝试。仍然只有一个 post.
import csv
from facebook_scraper import get_posts
for post in get_posts('bibliotecaunespbauru', pages=10):
data = [post['post_id'], post['text'], post['image']]
with open("facebook.csv", "a", newline="") as f:
writer = csv.writer(f)
writer.writerow(data)
with open('facebook.csv', newline='') as csvfile:
data = csv.reader(csvfile, delimiter=' ')
for row in data:
print(', '.join(row))
第四次尝试。
import csv
from facebook_scraper import get_posts
for post in get_posts('bibliotecaunespbauru', pages=10):
data = [post['post_id'], post['text'], post['image']]
with open("facebook.csv", "a", newline="") as f:
writer = csv.writer(f)
writer.writerow(data)
Returns
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-46-b4f7f9df1e02> in <module>
5 with open("facebook.csv", "a", newline="") as f:
6 writer = csv.writer(f)
----> 7 writer.writerow(data)
~\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):
UnicodeEncodeError: 'charmap' codec can't encode characters in position 76-77: character maps to <undefined>
您的代码有两个问题。
第一个问题是如何创建 data
错误
[print(post['post_id']), print(post['text']), print(post['image'])]
为什么
在这一行中,您在获取值时进行打印,打印的 return
值为 None
,因此 None
存储在列表中。
每次迭代时 data
的旧输出:[None, None, None]
更正:
[post['post_id'], post['text'], post['image']]
.
新data
的输出:['2092819824183367', 'Biblioteca da Unesp em Bauru ganha nova identidade visual ❤️\n\nhttps://youtu.be/dTCGp1eGmtM\n\nYOUTUBE.COM\nBiblioteca da Unesp em Bauru ganha nova identidade visual', None]
(PS : 不知道这意味着什么)
第二个问题是您写入文件的方式。
open("facebook.csv", "w", newline="")
写入文件时注意open("facebook.csv", "a", newline="")
中的a
,这个是用来打开文件的,以“追加”方式打开文件,以w
方式打开文件(您的旧代码)将在每个循环中覆盖文件,从而在每个循环中产生一个新的空白文件,这种行为不是您所需要的。
因此整合所有更改和缩进,这是您期望的完整代码
import csv
from facebook_scraper import get_posts
for post in get_posts('bibliotecaunespbauru', pages=10):
data = [post['post_id'], post['text'], post['image']]
with open("facebook.csv", "a", newline="") as f:
writer = csv.writer(f)
writer.writerow(data)
关于 unicode 错误
打开文件时可以使用open("facebook.csv", "a", newline="",encoding="utf-8")
我正在编写代码(是的,我是新手)从 facebook 的页面中提取信息。我正在使用 facebook-scraper 来获取信息。我需要创建一个 CSV 文件来存储这些信息,但我总是一无所获。
原码
from facebook_scraper import get_posts
for post in get_posts('bibliotecaunespbauru', pages=66):
print(post['time']) # não funciona
print(post['post_id'])
print(post['text'])
print(post['image'])
print(post['video'])
print(post['likes'])
print(post['comments'])
print(post['shares'])
print(post['link'])
要存储在 CSV 文件中的代码。
import csv
from facebook_scraper import get_posts
for post in get_posts('bibliotecaunespbauru', pages=10):
data = [print(post['post_id']), print(post['text']), print(post['image'])]
with open("facebook.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(data)
with open('facebook.csv', newline='') as csvfile:
data = csv.reader(csvfile, delimiter=' ')
for row in data:
print(', '.join(row))
嘿,非常感谢。现在很有意义。但是,它仍然不起作用,因为现在它只检索一个请求,而不是 10 页。
import csv
from facebook_scraper import get_posts
for post in get_posts('bibliotecaunespbauru', pages=10):
data = [post['post_id'], post['text'], post['image']]
with open("facebook.csv", "a", newline="") as f:
writer = csv.writer(f)
writer.writerow(data)
with open('facebook.csv', newline='') as csvfile:
data = csv.reader(csvfile, delimiter=' ')
for row in data:
print(', '.join(row))
第三次尝试。仍然只有一个 post.
import csv
from facebook_scraper import get_posts
for post in get_posts('bibliotecaunespbauru', pages=10):
data = [post['post_id'], post['text'], post['image']]
with open("facebook.csv", "a", newline="") as f:
writer = csv.writer(f)
writer.writerow(data)
with open('facebook.csv', newline='') as csvfile:
data = csv.reader(csvfile, delimiter=' ')
for row in data:
print(', '.join(row))
第四次尝试。
import csv
from facebook_scraper import get_posts
for post in get_posts('bibliotecaunespbauru', pages=10):
data = [post['post_id'], post['text'], post['image']]
with open("facebook.csv", "a", newline="") as f:
writer = csv.writer(f)
writer.writerow(data)
Returns
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-46-b4f7f9df1e02> in <module>
5 with open("facebook.csv", "a", newline="") as f:
6 writer = csv.writer(f)
----> 7 writer.writerow(data)
~\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):
UnicodeEncodeError: 'charmap' codec can't encode characters in position 76-77: character maps to <undefined>
您的代码有两个问题。
第一个问题是如何创建 data
错误
[print(post['post_id']), print(post['text']), print(post['image'])]
为什么
在这一行中,您在获取值时进行打印,打印的 return
值为 None
,因此 None
存储在列表中。
每次迭代时 data
的旧输出:[None, None, None]
更正:
[post['post_id'], post['text'], post['image']]
.
新data
的输出:['2092819824183367', 'Biblioteca da Unesp em Bauru ganha nova identidade visual ❤️\n\nhttps://youtu.be/dTCGp1eGmtM\n\nYOUTUBE.COM\nBiblioteca da Unesp em Bauru ganha nova identidade visual', None]
(PS : 不知道这意味着什么)
第二个问题是您写入文件的方式。
open("facebook.csv", "w", newline="")
写入文件时注意open("facebook.csv", "a", newline="")
中的a
,这个是用来打开文件的,以“追加”方式打开文件,以w
方式打开文件(您的旧代码)将在每个循环中覆盖文件,从而在每个循环中产生一个新的空白文件,这种行为不是您所需要的。
因此整合所有更改和缩进,这是您期望的完整代码
import csv
from facebook_scraper import get_posts
for post in get_posts('bibliotecaunespbauru', pages=10):
data = [post['post_id'], post['text'], post['image']]
with open("facebook.csv", "a", newline="") as f:
writer = csv.writer(f)
writer.writerow(data)
关于 unicode 错误
打开文件时可以使用open("facebook.csv", "a", newline="",encoding="utf-8")