Python 从文件中读取，只有在找不到字符串时才工作

Question

所以我正在尝试制作一个 reddit 机器人，它将执行提交的代码。我有自己的子程序来控制这些客户端。

while __name__ == '__main__':
    string = open('config.txt').read()
    for submission in subreddit.get_new(limit = 1):
        if submission.url not in string:
            f.write(submission.url + "\n")
            f.close()
            f = open('config.txt', "a")
            string = open('config.txt').read()

所以这应该做的是从配置文件中读取，然后只有在提交 url 不在 config.txt 中时才工作。但是，它始终会看到最新的 post 并且可以正常工作。 F的打开方式是这样的

if not os.path.exists('file'):
    open('config.txt', 'w').close()
f = open('config.txt', "a")

Answer 1

首先对您现有代码的批评（在评论中）：

# the next two lines are not needed; open('config.txt', "a") 
# will create the file if it doesn't exist.
if not os.path.exists('file'):
    open('config.txt', 'w').close()
f = open('config.txt', "a")

# this is an unusual condition which will confuse readers
while __name__ == '__main__':
    # the next line will open a file handle and never explicitly close it
    # (it will probably get closed automatically when it goes out of scope,
    # but it's not good form)
    string = open('config.txt').read()
    for submission in subreddit.get_new(limit = 1):
        # the next line should check for a full-line match; as written, it 
        # will match "http://www.test.com" if "http://www.test.com/level2"
        # is in config.txt
        if submission.url not in string:
            f.write(submission.url + "\n")
            # the next two lines could be replaced with f.flush()
            f.close()
            f = open('config.txt', "a")
            # this is a cumbersome way to keep your string synced with the file,
            # and it never explicitly releases the new file handle
            string = open('config.txt').read()
    # If subreddit.get_new() doesn't return any results, this will act as
    # a busy loop, repeatedly requesting new results as fast as possible.
    # If that is undesirable, you might want to sleep here.
# file handle f should get closed after the loop

None 上面指出的问题应该会使您的代码无法正常工作（可能不精确的匹配除外）。但是更简单的代码可能更容易调试。这是一些做同样事情的代码。注意：我假设任何其他进程都不可能同时写入 config.txt。您可以使用 pdb 逐行尝试这段代码（或您的代码），看看它是否按预期工作。

import time
import praw
r = praw.Reddit(...)
subreddit = r.get_subreddit(...)

if __name__ == '__main__':
    # open config.txt for reading and writing without truncating. 
    # moves pointer to end of file; closes file at end of block
    with open('config.txt', "a+") as f:
        # move pointer to start of file
        f.seek(0) 
        # make a list of existing lines; also move pointer to end of file
        lines = set(f.read().splitlines())

        while True:
            got_one = False
            for submission in subreddit.get_new(limit=1):
                got_one = True
                if submission.url not in lines:
                    lines.add(submission.url)
                    f.write(submission.url + "\n")
                    # write data to disk immediately
                    f.flush()
                    ...
            if not got_one:
                # wait a little while before trying again
                time.sleep(10)

Python 从文件中读取，只有在找不到字符串时才工作

Python read from a file, and only do work if a string isn't found

python

praw