编辑大型 Python 文件会占用系统资源

Question

我正在创建一个包含 2,000 个字符串条目的列表变量的文件。

请注意，我不是在谈论什么时候运行代码，而是在我输入以创建文件时我的计算机开始占用大量内存。一旦我删除了第 6 行中的所有这些单词，它就会被清除。

我正在 IDLE 中编辑。列表在这里：http://pastebin.com/uwpKriZ3

仅供参考

这是简化的示例代码：

import random

# The following line is unwrapped and, in the actual script,
# contains 26431 characters comprising 2000 words:
list1 = ['aback', 'abaft', 'abandoned', 'abashed', 'aberrant', 'abhorrent', 'abiding']

rndword = random.choice(list1)
brokenword = list(rndword)

Answer 1

那个变量赋值行有 26431 个字符长——足够让 emacs 运行变慢。尝试执行 find/replace 将每个逗号替换为逗号后跟换行符。

很多时候在编写实际程序时，如果需要加载大量数据，可以从文件中读取数据。要在 python 中执行此操作：

#!/usr/bin/env python
import random
import io

list1 = [line.strip() for line in io.open('data.txt', 'r', encoding="utf-8-sig")]

rndword = random.choice(list1)
brokenword = list(rndword)
print(brokenword)

从外部源读取数据认识到数据和代码是不同的东西。它还鼓励代码 reusability/generalization。例如，您可能想出了一个可以应用于不同数据集的有用算法。当您可以使用经过优化的 python 脚本来输入不同的数据而无需修改它时，为什么还要将数据集直接放入源代码中呢？只需将代码和数据分开，突然之间，您就会拥有更清晰、更可重用的代码。

编辑大型 Python 文件会占用系统资源

Editing large Python file consumes systems resources

python

editor