Gmail API 仅返回 1Mb 的数据

Gmail API only returing 1Mb of data

我已经将我想要请求的所有邮件过滤到 Gmail 中的一个标签中,并且通过在他们的 quickstart.py 脚本中使用这段代码,我成功地取回了邮件:

# My Code
results = service.users().messages().list(userId='me',labelIds = '{Label_id}', maxResults='10000000').execute()
messages = results.get('messages', [])

for message in messages:
    msg = service.users().messages().get(userId='me', id=message['id'], format='metadata', metadataHeaders=['subject']).execute()
    print(msg['snippet'].encode('utf-8').strip())

我首先在之前的请求中列出了所有标签及其 ID,并将其替换为 {Label_id}。然后我只要求主题元数据字段。问题是响应只有 returns 刚好 1 Mb 的数据。我知道这一点是因为我将输出重定向到一个文件并执行 ls -latr --block-size=MB。此外,我可以看到该标签中的(较旧的)消息比它根据日期返回的消息多得多。请求总是在完全相同的消息处停止。 None 其中有附件。

根据他们的 API 参考,我应该被允许:

Daily Usage 1,000,000,000 quota units per day

Per User Rate Limit 250 quota units per user per second

我不认为那是我的意思,但也许我错了,因为每条消息都有 1-3 个回复,我可以看到这些回复,也许每个都算作 5 个配额单位?不确定。我试过使用 maxResults 参数,但似乎没有任何改变。

我是在这里达到了上限,还是在我的请求逻辑中?

编辑 1

from __future__ import print_function
import pickle
import os.path
import base64
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

## If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://mail.google.com/']

def main():
    """Shows basic usage of the Gmail API.
    Lists the user's Gmail labels.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server()
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('gmail', 'v1', credentials=creds)

    messageArray = []
    pageToken = None
    while True:
        results = service.users().messages().list(userId='me',labelIds = '{Label_ID}', maxResults=500, pageToken=pageToken).execute()
        messages = results.get('messages', [])
        for message in messages:
            msg = service.users().messages().get(userId='me', id=message['id'], format='metadata', metadataHeaders=['subject']).execute()
            messageArray.append(msg)
        pageToken = results.get('nextPageToken', None)
        if not pageToken:
            print('[%s]' % ', '.join(map(str, messageArray)))
            break


if __name__ == '__main__':
    main()

编辑 2

这是我使用的最终脚本。这一个吐出一个更好更清晰的格式,我只是重定向到一个文件并且很容易解析。

from __future__ import print_function
import pickle
import os.path
import base64
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

## If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://mail.google.com/']

def main():
    """Shows basic usage of the Gmail API.
    Lists the user's Gmail labels.
    """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server()
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    service = build('gmail', 'v1', credentials=creds)

    pageToken = None
    while True:
        results = service.users().messages().list(userId='me',labelIds = '{Label_ID}', maxResults=500, pageToken=pageToken).execute()
        messages = results.get('messages', [])
        for message in messages:
            msg = service.users().messages().get(userId='me', id=message['id'], format='metadata', metadataHeaders=['subject']).execute()
            print(msg['snippet'].encode('utf-8').strip())
        pageToken = results.get('nextPageToken', None)
        if not pageToken:
            break


if __name__ == '__main__':
    main()

maxResults 最大值为 500。如果您将其设置得更高,您仍然只会在结果中收到 500 条消息。您可以通过 messages.

的 len 检查来确认这一点

您需要实施 pagination

messages = []
pageToken = None
while True:
  results = service.users().messages().list(userId='me',labelIds = '{Label_id}', maxResults=500, pageToken=pageToken).execute()
  messages.append(results.get(messages, []))
  pageToken = results.get('nextPageToken', None)
  if not pageToken:
    break

如果您只想要未解析的原始电子邮件消息,请尝试使用

# at top of file
from base64 import urlsafe_b64decode

msg = service.users().messages().get(userId='me', id=message['id'], format='raw').execute()
print(urlsafe_b64decode(msg['raw']))