如何使用 python 根据特定时间范围过滤 API 响应数据

How to filter API response data based on particular time range using python

我正在使用一个 lambda python 函数通过 mailgun 日志 API 从 mailgun 获取电子邮件日志。 这是我的功能,

import json
import requests
resp = requests.get("https://api.eu.mailgun.net/v3/domain/events",
                    auth=("api","key-api"))
def jprint(obj):
    # create a formatted string of the Python JSON object
    text = json.dumps(obj, sort_keys=True, indent=4)
    print(text)
jprint(resp.json())

此函数提供从 mailgun API

获取的电子邮件日志的格式化 json 输出

来自 API、

的示例响应
{
    "items": [
        {
            "campaigns": [],
            "delivery-status": {
                "attempt-no": 1,
                "certificate-verified": true,
                "code": 250,
                "description": "",
                "message": "OK",
                "mx-host": "host",
                "session-seconds": 1.5093050003051758,
                "tls": true
            },
            "envelope": {
                "sender": "postmaster@domain.com",
                "sending-ip": "ip",
                "targets": "id@mail.com",
                "transport": "smtp"
            },
            "event": "delivered",
            "flags": {
                "is-authenticated": true,
                "is-routed": false,
                "is-system-test": false,
                "is-test-mode": false
            },
            "id": "id",
            "log-level": "info",
            "message": {
                "attachments": [],
                "headers": {
                    "from": "NAME <noreply@name.com>",
                    "message-id": "20220223075827.de300265fad746e9@domain.com",
                    "subject": "Client due diligence information has been submitted by one of your customers.",
                    "to": "id@mail.com"
                },
                "size": 1990
            },
            "recipient": "id@mail.com",
            "recipient-domain": "domain.com",
            "storage": {
                "key": "key",
                "url": "https://storage.eu.mailgun.net/v3/domains/domain/messages/id"
            },
            "tags": [],
            "timestamp": 1645603109.434181,
            "user-variables": {}
        },
        {
            "envelope": {
                "sender": "postmaster@domain.com",
                "targets": "id@mail.com.com",
                "transport": "smtp"
            },
            "event": "accepted",
            "flags": {
                "is-authenticated": true,
                "is-test-mode": false
            },
            "id": "id",
            "log-level": "info",
            "message": {
                "headers": {
                    "from": "NAME <noreply@name.com>",
                    "message-id": "20220223075827.de300265fad746e9@domain.com",
                    "subject": "Client due diligence information has been submitted by one of your customers.",
                    "to": "id@mail.com.com"
                },
                "size": 1990
            },
            "method": "HTTP",
            "recipient": "id@mail.com",
            "recipient-domain": "domain",
            "storage": {
                "key": "key",
                "url": "https://storage.eu.mailgun.net/v3/domains/domain/messages/key"
            },
            "tags": null,
            "timestamp": 1645603107.282775,
            "user-variables": {}
        },

这里的时间戳不是人类可读的

我需要设置 aws lambda python 脚本来触发事件以定期调用 mailgun API 并将日志发送到 cloudwatch。我熟悉设置但不熟悉脚本。

现在我只需要动态过滤 API 过去一小时的数据。

根据使用 pandas 库的分析,这可以实现,但我无法得到正确的答案来定期获取动态时间范围的日志。

我参考了很多关于这个的文档,但我找不到合适的答案,而且 python 对我来说是全新的。

谁能指导我如何动态获取最近 N 个时间范围内的日志?

mailgun, you can specify a timerange, so your result can already be filtered using begin and end parameters 的文档中。

之后,您可以使用 pd.json_normalize 重塑您的 json 响应。

除了@Corralien 所说的关于文档的内容(我个人更喜欢)之外,您还可以使用纯粹的 python 方法使用列表理解来重新选择最后一小时的数据。在下面的代码中,我假设您将 API 的响应命名为 data,它应该是字典:

from time import time
lastHour = time() - 3600
[x for x in data["items"] if x["timestamp"] > lastHour]

这将过滤时间戳大于最后一小时 (time() - 3600) 的值。

除了上述答案之外,要仅使用 python 在两个时间和日期范围之间进行过滤,您可以使用 datetime。这里使用与@Amirhossein Kiani 相同的列表理解。:

import datetime

start = datetime.datetime(year, month, day, hour, minute, second).timestamp()
stop = datetime.datetime(year, month, day, hour, minute, second).timestamp()

[x for x in data["items"] if start < x["timestamp"] < stop]

对于一小时的差异,你也可以使用timedelta:

start = (datetime.datetime.now() - datetime.timedelta(hours=1)).timestamp()
stop = datetime.datetime.now().timestamp()