Spark 结构化流式传输：queryName() 设置的可能用途是什么？

Question

根据Structured Streaming Programming Guide

queryName("myTableName")用于定义内存中的table名称，当output sink为format("memory")

aggDF
  .writeStream
  .queryName("aggregates") // this query name will be the table name
  .outputMode("complete")
  .format("memory")
  .start()

spark.sql("select * from aggregates").show() // interactively query in-memory table

DataStreamWriterscala 文档 queryName() 的 Spark 源代码为：

Specifies the name of the [[StreamingQuery]] that can be started with start(). This name must be unique among all the currently active queries in the associated SQLContext.

问题：queryName() 设置还有其他可能的用法吗？火花作业日志？查询的 progress monitoring 中的详细信息？

Answer 1

我遇到了 queryName 的以下三种用法：

如 OP 所述并在结构化流指南中记录，当输出接收器的格式为“内存”时，它用于定义内存中 table 名称。
queryName 定义 event.progress.name 的值，其中事件是 StreamingQueryListener.
中的 QueryProgressEvent
在Spark Web的描述栏中也用到了UI（看我设置的截图queryName("WhosebugTest"):

Answer 2

添加到@mike 的回答中，我想提一下，在 Databricks（其核心使用 Spark）中，您可以将定义的查询名称与函数 untilStreamIsReady().

结合使用

例如，如果您定义了流式查询 WhosebugTest，那么您可以执行函数 untilStreamIsReady('WhosebugTest') 以等待查询准备就绪并开始（很抱歉是 Captain Obvious）。

我必须说我在官方文档中找不到此功能的直接参考，但在以下链接中找到了它：

用法示例：https://youtu.be/KLD10xn4sX8?t=1219

Spark 结构化流式传输：queryName() 设置的可能用途是什么？

Spark structured streaming: what are the possible usages of queryName() setting?

apache-spark

spark-structured-streaming