从 jupyter notebook 写入文件

Writing to files from jupyter notebook

我试过 运行 这个代码:

from tqdm.auto import tqdm
import os
from datasets import load_dataset

dataset = load_dataset('oscar', 'unshuffled_deduplicated_ar', split='train[:25%]')

text_data = []
file_count = 0

for sample in tqdm(dataset['train']):
    sample = sample['text'].replace('\n', ' ')
    text_data.append(sample)
    if len(text_data) == 10_000:
        # once we git the 10K mark, save to file
        filename = f'/data/text/oscar_ar/text_{file_count}.txt'
        os.makedirs(os.path.dirname(filename), exist_ok=True)
        with open(filename, 'w', encoding='utf-8') as fp:
            fp.write('\n'.join(text_data))
        text_data = []
        file_count += 1
# after saving in 10K chunks, we will have ~2082 leftover samples, we save those now too
with open(f'data/text/oscar_ar/text_{file_count}.txt', 'w', encoding='utf-8') as fp:
    fp.write('\n'.join(text_data))

我得到以下 PermissionError:

Permission Error

我试过更改此目录的权限,运行使用 sudo 权限连接 jupyter,但它仍然不起作用。

您正在打开:

with open(f'data/text/oscar_ar/text_{file_count}.txt')

但是你写的是:

filename = f'/Dane/text/oscar_ar/text_{file_count}.txt'

你的截图说:

filename = f'/date/text/oscar_ar/text_{file_count}.txt'

您必须在 data/date/Dane 之间做出选择 :)


此外,您似乎应该删除 /data/text/oscar_ar/text_{file_count}.txt 中的第一个 /

说明:当你在路径的开头放置一个斜杠(/)时,这意味着从文件系统的根开始查找,即顶层。如果您不输入斜杠,它将从您当前的目录开始查找。