将对象列表写入 csv 文件

Writing list of objects to csv file

我正在编写一个 python 程序,循环遍历 reddit 提交、提取数据并将其作为对象存储在列表中。但是,我无法将该列表写入 csv 文件。该文件已创建,但它只是为对象提供了某种 id 标签。我应该如何更改 csv 代码?

代码

import praw
from datetime import datetime
import pandas as pd

class Submission:
    def __init__(self, time, score, title, text, ofReddit, serious):
        self.time = time
        self.score = score
        self.title = title
        self.text = text
        self.ofReddit = ofReddit
        self.serious = serious
data = []

reddit = praw.Reddit(client_id=id, client_secret=secret,
                     user_agent='testscript by /u/SilentButtDeadlies')
subreddit = reddit.subreddit('AskReddit')
for submission in subreddit.new(limit=50):
    time = datetime.utcfromtimestamp(submission.created_utc).hour
    score = submission.score
    title = len(submission.title)
    text = len(submission.selftext)
    if 'of reddit' in submission.title.lower():
        ofReddit = 1
    else:
        ofReddit = 0
    if '[serious]' in submission.title.lower():
        serious = 1
    else:
        serious = 0
    data.append(Submission(time, score, title, text, ofReddit, serious))
df = pd.DataFrame(data)
filename = 'AskRedditData' + str(datetime.now()) + '.csv'
df.to_csv(filename, index=False, encoding='utf-8')

CSV 文件

0
<__main__.Submission instance at 0x1118f6ef0>
<__main__.Submission instance at 0x1118f68c0>
<__main__.Submission instance at 0x1118f6950>
<__main__.Submission instance at 0x1118c3758>
<__main__.Submission instance at 0x11239c638>
<__main__.Submission instance at 0x11239c5f0>
<__main__.Submission instance at 0x112398908>
<__main__.Submission instance at 0x112398998>
<__main__.Submission instance at 0x112398878>
<__main__.Submission instance at 0x1123989e0>
<__main__.Submission instance at 0x112398c68>
<__main__.Submission instance at 0x11239fe18>
<__main__.Submission instance at 0x11239fe60>
<__main__.Submission instance at 0x11239fea8>
<__main__.Submission instance at 0x11239fef0>
<__main__.Submission instance at 0x11239ff38>
<__main__.Submission instance at 0x11239ff80>
<__main__.Submission instance at 0x11239ffc8>
<__main__.Submission instance at 0x112404050>
<__main__.Submission instance at 0x112404098>
<__main__.Submission instance at 0x1124040e0>
<__main__.Submission instance at 0x112404128>
<__main__.Submission instance at 0x112404170>
<__main__.Submission instance at 0x1124041b8>
<__main__.Submission instance at 0x112404200>
<__main__.Submission instance at 0x112404248>
<__main__.Submission instance at 0x112404290>
<__main__.Submission instance at 0x1124042d8>
<__main__.Submission instance at 0x112404320>
<__main__.Submission instance at 0x112404368>
<__main__.Submission instance at 0x1124043b0>
<__main__.Submission instance at 0x1124043f8>
<__main__.Submission instance at 0x112404440>
<__main__.Submission instance at 0x112404488>
<__main__.Submission instance at 0x1124044d0>
<__main__.Submission instance at 0x112404518>
<__main__.Submission instance at 0x112404560>
<__main__.Submission instance at 0x1124045a8>
<__main__.Submission instance at 0x1124045f0>
<__main__.Submission instance at 0x112404638>
<__main__.Submission instance at 0x112404680>
<__main__.Submission instance at 0x1124046c8>
<__main__.Submission instance at 0x112404710>
<__main__.Submission instance at 0x112404758>
<__main__.Submission instance at 0x1124047a0>
<__main__.Submission instance at 0x1124047e8>
<__main__.Submission instance at 0x112404830>
<__main__.Submission instance at 0x112404878>
<__main__.Submission instance at 0x1124048c0>
<__main__.Submission instance at 0x112404908>

您的提交 class 似乎只是作为一种记录类型。您可能只使用 namedtuple。因此,将您的 class 定义替换为:

from collections import namedtuple
Submission = namedtuple('Submission', ['time', 'score', 'title', 'text', 'ofReddit', 'serious'])

现在您的其余代码应该可以正常工作了。 pandas 不知道如何解释您最初写的 Submission class。所以它只是简单地创建一个 Submission 对象的列,当它写入时,它使用默认为 object __str__str(Submission()) 因为你没有定义另一个 __str__。真的,你想使用一个序列。 namedtuple 函数实际上是一个 class 工厂 ,它创建了一个从 tuple 派生的记录类型,因此它具有所有方便的函数需要一个非常方便的构造函数。

现在,由于您使用的是 Python 2,所以我没有费心去更改您对 pandas 的使用,尽管仅将其用于编写 csv 似乎有些过分。话虽这么说,让 Python 2 csv 模块与 unicode 兼容是一件很痛苦的事情,所以你最好保留它。如果你可以切换到 Python 3,你可以简单地将 pandas 替换为:

import csv
with open(filename, 'w', newline='', encoding='utf8') as f:
    writer = csv.writer(f)
    writer.writerow(Submission._fields) # namedtuple breaks convention public fields have single underscore
    writer.writerows(data)