使用 Python Cassandra Driver for large no.查询次数

Question

我们有一个与 Scylla（cassandra 的替代品）对话的脚本。该脚本应该运行用于一些 thusands 系统。该脚本运行进行了数千次查询以获取其所需的数据。但是，一段时间后脚本崩溃并抛出此错误：

2021-09-29 12:13:48 Could not execute query because of : errors={'x.x.x.x': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=x.x.x.x

2021-09-29 12:13:48 Trying for : 4th time

Traceback (most recent call last):
  File ".../db_base.py", line 92, in db_base
    ret_val = SESSION.execute(query)
  File "cassandra/cluster.py", line 2171, in cassandra.cluster.Session.execute
  File "cassandra/cluster.py", line 4062, in cassandra.cluster.ResponseFuture.result
cassandra.OperationTimedOut: errors={'x.x.x.x': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=x.x.x.x

数据库连接代码：

def db_base(current_keyspace, query, try_for_times, current_IPs, port):

    global SESSION

    if SESSION is None:

        # This logic to ensure given number of retrying runs on failure of connecting to the Cluster
        for i in range(try_for_times):
            try:

                cluster = Cluster(contact_points = current_IPs, port=port)
                session = cluster.connect() # error can be encountered in this command
                break

            except NoHostAvailable:
                print("No Host Available! Trying for : " + str(i) + "th time")
                if i == try_for_times - 1:

                    # shutting down cluster
                    cluster.shutdown()

                    raise db_connection_error("Could not connect to the cluster even in " + str(try_for_times) + " tries! Exiting")

        SESSION = session

    # This logic to ensure given number of retrying runs in the case of failing the actual query
    for i in range(try_for_times):

        try:

            # setting keyspace
            SESSION.set_keyspace(current_keyspace)
            # execute actual query - error can be encountered in this
            ret_val = SESSION.execute(query)
            break

        except Exception as e:

            print("Could not execute query because of : " + str(e))
            print("Trying for : " + str(i) + "th time")

            if i == (try_for_times -1):

                # shutting down session and cluster
                cluster.shutdown()
                session.shutdown()
                raise db_connection_error("Could not execute query even in " + str(try_for_times) + " tries! Exiting")

    return ret_val

如何改进此代码以维持并能够运行这个大号码。查询？或者我们应该研究其他工具/方法来帮助我们获取这些数据？谢谢

Answer 1

客户端会话超时表示驱动程序在服务器超时之前超时，或者 - 如果它过载 - Scylla 没有将超时回复给驱动程序。有几种方法可以解决这个问题：

1 - 确保您的 default_timeout 高于 /etc/scylla/scylla.yaml

中的 Scylla 强制超时

2 - 检查 Scylla 日志是否有任何过载迹象。如果有，请考虑限制您的请求以找到一个平衡的最佳点以确保它们不再失败。如果它继续存在，请考虑调整您的实例大小。

除此之外，值得一提的是，您的示例代码未使用 PreparedStatements、TokenAwareness 和 https://docs.datastax.com/en/developer/python-driver/3.19/api/cassandra/policies/ 中提到的其他最佳实践，这肯定会提高您的整体吞吐量。

您可以在 Scylla 文档中找到更多信息： https://docs.scylladb.com/using-scylla/drivers/cql-drivers/scylla-python-driver/ 和锡拉大学 https://university.scylladb.com/courses/using-scylla-drivers/lessons/coding-with-python/

使用 Python Cassandra Driver for large no.查询次数

Using Python Cassandra Driver for large no. of Queries

python

connection

optimization

scylla