访问 Google Colab 中的“.pickle”文件

Question

我对使用 Google 的 Colab 作为 ML 的首选工具还很陌生。

在我的实验中，我必须使用 'notMNIST' 数据集，并且我在我的 Google Drive 文件夹下将 'notMNIST' 数据设置为 notMNIST.pickle作为 Data.

话虽如此，我想在我的 Google Colab 中访问这个“.pickle”文件，以便我可以使用这些数据。

有什么方法可以访问它吗？

我已阅读 Whosebug 上的文档和一些问题，但他们谈到上传、下载文件 and/or 处理 'Sheets'。

然而，我想要的是在环境中加载notMNIST.pickle文件，并使用它进行进一步处理。

任何帮助将不胜感激。

谢谢！

Answer 1

GoogleDrive 中的数据驻留在云中，colaboratory Google 提供个人 linux 虚拟机，您的笔记本将 run.so 在该虚拟机上下载从 google 驱动器到您的协作虚拟机并使用它。你可以按照this下载教程

Answer 2

您可以尝试以下方法：

import pickle
drive.mount('/content/drive')
DATA_PATH = "/content/drive/Data"
infile = open(DATA_PATH+'/notMNIST.pickle','rb')
best_model2 = pickle.load(infile)

Answer 3

您可以为此使用 pydrive。首先，您需要找到您的文件的 ID。

# Install the PyDrive wrapper & import libraries.
# This only needs to be done once per notebook.
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate and create the PyDrive client.
# This only needs to be done once per notebook.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

# Download a file based on its file ID.
#
# A file ID looks like: laggVyWshwcyP6kEI-y_W3P8D26sz
listed = drive.ListFile({'q': "title contains '.pkl' and 'root' in parents"}).GetList()
for file in listed:
    print('title {}, id {}'.format(file['title'], file['id']))

然后您可以使用以下代码加载文件：

from googleapiclient.discovery import build
drive_service = build('drive', 'v3')

import io
import pickle
from googleapiclient.http import MediaIoBaseDownload

file_id = 'laggVyWshwcyP6kEI-y_W3P8D26sz'

request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
    # _ is a placeholder for a progress object that we ignore.
    # (Our file is small, so we skip reporting progress.)
    _, done = downloader.next_chunk()

downloaded.seek(0)
f = pickle.load(downloaded)

Answer 4

谢谢大家的回答。 Google Colab 已迅速成长为一个更成熟的开发环境，我最喜欢的功能是 'Files' 选项卡。

我们可以轻松地将模型上传到我们想要的文件夹并像在本地机器上一样访问它。

这解决了问题。

谢谢。

访问 Google Colab 中的“.pickle”文件

Accessing '.pickle' file in Google Colab

python

google-data-api

tensorflow

google-colaboratory