使用 Python Cassandra Driver for large no.查询次数
Using Python Cassandra Driver for large no. of Queries
我们有一个与 Scylla(cassandra 的替代品)对话的脚本。该脚本应该 运行 用于一些 thusands 系统。该脚本 运行 进行了数千次查询以获取其所需的数据。但是,一段时间后脚本崩溃并抛出此错误:
2021-09-29 12:13:48 Could not execute query because of : errors={'x.x.x.x': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=x.x.x.x
2021-09-29 12:13:48 Trying for : 4th time
Traceback (most recent call last):
File ".../db_base.py", line 92, in db_base
ret_val = SESSION.execute(query)
File "cassandra/cluster.py", line 2171, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 4062, in cassandra.cluster.ResponseFuture.result
cassandra.OperationTimedOut: errors={'x.x.x.x': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=x.x.x.x
数据库连接代码:
def db_base(current_keyspace, query, try_for_times, current_IPs, port):
global SESSION
if SESSION is None:
# This logic to ensure given number of retrying runs on failure of connecting to the Cluster
for i in range(try_for_times):
try:
cluster = Cluster(contact_points = current_IPs, port=port)
session = cluster.connect() # error can be encountered in this command
break
except NoHostAvailable:
print("No Host Available! Trying for : " + str(i) + "th time")
if i == try_for_times - 1:
# shutting down cluster
cluster.shutdown()
raise db_connection_error("Could not connect to the cluster even in " + str(try_for_times) + " tries! Exiting")
SESSION = session
# This logic to ensure given number of retrying runs in the case of failing the actual query
for i in range(try_for_times):
try:
# setting keyspace
SESSION.set_keyspace(current_keyspace)
# execute actual query - error can be encountered in this
ret_val = SESSION.execute(query)
break
except Exception as e:
print("Could not execute query because of : " + str(e))
print("Trying for : " + str(i) + "th time")
if i == (try_for_times -1):
# shutting down session and cluster
cluster.shutdown()
session.shutdown()
raise db_connection_error("Could not execute query even in " + str(try_for_times) + " tries! Exiting")
return ret_val
如何改进此代码以维持并能够 运行 这个大号码。查询?或者我们应该研究其他工具/方法来帮助我们获取这些数据?谢谢
客户端会话超时表示驱动程序在服务器超时之前超时,或者 - 如果它过载 - Scylla 没有将超时回复给驱动程序。有几种方法可以解决这个问题:
1 - 确保您的 default_timeout 高于 /etc/scylla/scylla.yaml
中的 Scylla 强制超时
2 - 检查 Scylla 日志是否有任何过载迹象。如果有,请考虑限制您的请求以找到一个平衡的最佳点以确保它们不再失败。如果它继续存在,请考虑调整您的实例大小。
除此之外,值得一提的是,您的示例代码未使用 PreparedStatements、TokenAwareness 和 https://docs.datastax.com/en/developer/python-driver/3.19/api/cassandra/policies/ 中提到的其他最佳实践,这肯定会提高您的整体吞吐量。
您可以在 Scylla 文档中找到更多信息:
https://docs.scylladb.com/using-scylla/drivers/cql-drivers/scylla-python-driver/
和锡拉大学
https://university.scylladb.com/courses/using-scylla-drivers/lessons/coding-with-python/
我们有一个与 Scylla(cassandra 的替代品)对话的脚本。该脚本应该 运行 用于一些 thusands 系统。该脚本 运行 进行了数千次查询以获取其所需的数据。但是,一段时间后脚本崩溃并抛出此错误:
2021-09-29 12:13:48 Could not execute query because of : errors={'x.x.x.x': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=x.x.x.x
2021-09-29 12:13:48 Trying for : 4th time
Traceback (most recent call last):
File ".../db_base.py", line 92, in db_base
ret_val = SESSION.execute(query)
File "cassandra/cluster.py", line 2171, in cassandra.cluster.Session.execute
File "cassandra/cluster.py", line 4062, in cassandra.cluster.ResponseFuture.result
cassandra.OperationTimedOut: errors={'x.x.x.x': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=x.x.x.x
数据库连接代码:
def db_base(current_keyspace, query, try_for_times, current_IPs, port):
global SESSION
if SESSION is None:
# This logic to ensure given number of retrying runs on failure of connecting to the Cluster
for i in range(try_for_times):
try:
cluster = Cluster(contact_points = current_IPs, port=port)
session = cluster.connect() # error can be encountered in this command
break
except NoHostAvailable:
print("No Host Available! Trying for : " + str(i) + "th time")
if i == try_for_times - 1:
# shutting down cluster
cluster.shutdown()
raise db_connection_error("Could not connect to the cluster even in " + str(try_for_times) + " tries! Exiting")
SESSION = session
# This logic to ensure given number of retrying runs in the case of failing the actual query
for i in range(try_for_times):
try:
# setting keyspace
SESSION.set_keyspace(current_keyspace)
# execute actual query - error can be encountered in this
ret_val = SESSION.execute(query)
break
except Exception as e:
print("Could not execute query because of : " + str(e))
print("Trying for : " + str(i) + "th time")
if i == (try_for_times -1):
# shutting down session and cluster
cluster.shutdown()
session.shutdown()
raise db_connection_error("Could not execute query even in " + str(try_for_times) + " tries! Exiting")
return ret_val
如何改进此代码以维持并能够 运行 这个大号码。查询?或者我们应该研究其他工具/方法来帮助我们获取这些数据?谢谢
客户端会话超时表示驱动程序在服务器超时之前超时,或者 - 如果它过载 - Scylla 没有将超时回复给驱动程序。有几种方法可以解决这个问题:
1 - 确保您的 default_timeout 高于 /etc/scylla/scylla.yaml
中的 Scylla 强制超时2 - 检查 Scylla 日志是否有任何过载迹象。如果有,请考虑限制您的请求以找到一个平衡的最佳点以确保它们不再失败。如果它继续存在,请考虑调整您的实例大小。
除此之外,值得一提的是,您的示例代码未使用 PreparedStatements、TokenAwareness 和 https://docs.datastax.com/en/developer/python-driver/3.19/api/cassandra/policies/ 中提到的其他最佳实践,这肯定会提高您的整体吞吐量。
您可以在 Scylla 文档中找到更多信息: https://docs.scylladb.com/using-scylla/drivers/cql-drivers/scylla-python-driver/ 和锡拉大学 https://university.scylladb.com/courses/using-scylla-drivers/lessons/coding-with-python/