编写 Twisted 客户端以将循环 GET 请求发送到多个 API 调用并记录响应

Question

我已经有一段时间没有做过扭曲的编程了，所以我正在尝试重新开始一个新项目。我正在尝试设置一个扭曲的客户端，它可以将服务器列表作为参数，并且对于每个服务器，它发送一个 API GET 调用并将 return 消息写入一个文件。此 API GET 调用应每 60 秒重复一次。

我已经使用 Twisted 的代理在单个服务器上成功完成了 class:

from StringIO import StringIO

from twisted.internet import reactor
from twisted.internet.protocol import Protocol
from twisted.web.client import Agent
from twisted.web.http_headers import Headers
from twisted.internet.defer import Deferred

import datetime
from datetime import timedelta
import time

count = 1
filename = "test.csv"

class server_response(Protocol):
    def __init__(self, finished):
        print "init server response"
        self.finished = finished
        self.remaining = 1024 * 10

    def dataReceived(self, bytes):
        if self.remaining:
            display = bytes[:self.remaining]
            print 'Some data received:'
            print display
            with open(filename, "a") as myfile:
                myfile.write(display)

            self.remaining -= len(display)


    def connectionLost(self, reason):
        print 'Finished receiving body:', reason.getErrorMessage()

        self.finished.callback(None)

def capture_response(response): 
    print "Capturing response"
    finished = Deferred()
    response.deliverBody(server_response(finished))
    print "Done capturing:", finished

    return finished

def responseFail(err):
    print "error" + err
    reactor.stop()


def cl(ignored):
    print "sending req"
    agent = Agent(reactor)
    headers = {
    'authorization': [<snipped>],
    'cache-control': [<snipped>],
    'postman-token': [<snipped>]
    }

    URL = <snipped>
    print URL

    a = agent.request(
        'GET',
        URL,
        Headers(headers),
        None)

    a.addCallback(capture_response)
    reactor.callLater(60, cl, None)
    #a.addBoth(cbShutdown, count)


def cbShutdown(ignored, count):
    print "reactor stop"
    reactor.stop()

def parse_args():
    usage = """usage: %prog [options] [hostname]:port ...
    Run it like this:
      python test.py hostname1:instanceName1 hostname2:instancename2 ...
    """

    parser = optparse.OptionParser(usage)

    _, addresses = parser.parse_args()

    if not addresses:
        print parser.format_help()
        parser.exit()

    def parse_address(addr):
        if ':' not in addr:
            hostName = '127.0.0.1'
            instanceName = addr
        else:
            hostName, instanceName = addr.split(':', 1)

        return hostName, instanceName

    return map(parse_address, addresses)

if __name__ == '__main__':
    d = Deferred()
    d.addCallbacks(cl, responseFail)
    reactor.callWhenRunning(d.callback, None)

    reactor.run()

但是我很难弄清楚如何让多个代理发送呼叫。有了这个，我依靠 cl() ---reactor.callLater(60, cl, None) 中写入的结尾来创建调用循环。 那么我如何创建多个呼叫代理协议 (server_response(Protocol)) 并在我的反应堆启动后继续循环遍历每个协议的 GET？

Answer 1

看看猫拖进来的是什么东西！

So how do I create multiple call agent

使用treq。你很少想纠结于 Agent class.

This API GET call should be repeated every 60 seconds

使用 LoopingCalls 而不是 callLater，在这种情况下它更容易，而且您以后会运行遇到更少的问题。

import treq
from twisted.internet import task, reactor

filename = 'test.csv'

def writeToFile(content):
    with open(filename, 'ab') as f:
        f.write(content)

def everyMinute(*urls):
    for url in urls:
        d = treq.get(url)
        d.addCallback(treq.content)
        d.addCallback(writeToFile)

#----- Main -----#            
sites = [
    'https://www.google.com',
    'https://www.amazon.com',
    'https://www.facebook.com']

repeating = task.LoopingCall(everyMinute, *sites)
repeating.start(60)

reactor.run()

它从 everyMinute() 函数开始，每 60 秒运行s。在该函数中，查询每个端点，一旦响应的内容可用，treq.content 函数获取响应和 returns 内容。最后将内容写入文件。

PS

您是在抓取或试图从这些网站中提取内容吗？如果您 scrapy 可能是一个不错的选择。

编写 Twisted 客户端以将循环 GET 请求发送到多个 API 调用并记录响应

Writing a Twisted Client to send looping GET request to multiple API calls and record response

python

api

client

twisted