如何检索 Dask-YARN 作业的工作日志?
How to retrieve worker logs for a Dask-YARN job?
我有一个简单的 Dask-YARN 脚本,它只执行一项任务:从 HDFS 加载文件,如下所示。但是,我 运行 遇到了代码中的一个错误,所以我在函数中添加了一个 print
语句,但是我没有在使用 yarn logs -applicationId {application_id}
。我什至尝试了 Client.get_worker_logs()
方法,但是它也没有显示 stdout
,只是显示了一些关于工人的 INFO
。代码执行完成后如何获取worker日志?
import sys
import numpy as np
import scipy.signal
import json
import dask
from dask.distributed import Client
from dask_yarn import YarnCluster
@dask.delayed
def load(input_file):
print("In call of Load...")
with open(input_file, "r") as fo:
data = json.load(fo)
return data
# Process input args
(_, filename) = sys.argv
dag_1 = {
'load-1': (load, filename)
}
print("Building tasks...")
tasks = dask.get(dag_1, 'load-1')
print("Creating YARN cluster now...")
cluster = YarnCluster()
print("Scaling YARN cluster now...")
cluster.scale(1)
print("Creating Client now...")
client = Client(cluster)
print("Getting logs..1")
print(client.get_worker_logs())
print("Doing Dask computations now...")
dask.compute(tasks)
print("Getting logs..2")
print(client.get_worker_logs())
print("Shutting down cluster now...")
cluster.shutdown()
我不确定这里发生了什么,打印语句应该(而且通常会)最终出现在 yarn 存储的日志文件中。
如果您希望调试语句出现在 get_worker_logs
的工作日志中,您可以直接使用工作日志:
from distributed.worker import logger
logger.info("This will show up in the worker logs")
我有一个简单的 Dask-YARN 脚本,它只执行一项任务:从 HDFS 加载文件,如下所示。但是,我 运行 遇到了代码中的一个错误,所以我在函数中添加了一个 print
语句,但是我没有在使用 yarn logs -applicationId {application_id}
。我什至尝试了 Client.get_worker_logs()
方法,但是它也没有显示 stdout
,只是显示了一些关于工人的 INFO
。代码执行完成后如何获取worker日志?
import sys
import numpy as np
import scipy.signal
import json
import dask
from dask.distributed import Client
from dask_yarn import YarnCluster
@dask.delayed
def load(input_file):
print("In call of Load...")
with open(input_file, "r") as fo:
data = json.load(fo)
return data
# Process input args
(_, filename) = sys.argv
dag_1 = {
'load-1': (load, filename)
}
print("Building tasks...")
tasks = dask.get(dag_1, 'load-1')
print("Creating YARN cluster now...")
cluster = YarnCluster()
print("Scaling YARN cluster now...")
cluster.scale(1)
print("Creating Client now...")
client = Client(cluster)
print("Getting logs..1")
print(client.get_worker_logs())
print("Doing Dask computations now...")
dask.compute(tasks)
print("Getting logs..2")
print(client.get_worker_logs())
print("Shutting down cluster now...")
cluster.shutdown()
我不确定这里发生了什么,打印语句应该(而且通常会)最终出现在 yarn 存储的日志文件中。
如果您希望调试语句出现在 get_worker_logs
的工作日志中,您可以直接使用工作日志:
from distributed.worker import logger
logger.info("This will show up in the worker logs")