Spark什么时候会自动清理缓存的RDD？

When will Spark clean the cached RDDs automatically?

使用scala终端的rdd.cache()方法缓存的RDD正在内存中存储。

这意味着它将消耗部分 ram 可用于 Spark 进程本身。

话虽如此，如果内存有限，缓存的RDD越来越多，spark什么时候自动清理rdd缓存占用的内存？

Spark 将清理缓存的 RDDs 和 Datasets / DataFrames:

当通过调用 RDD.unpersist (How to uncache RDD?) / Dataset.unpersist 方法或 Catalog.clearCache.
定期，由cache cleaner:

Spark automatically monitors cache usage on each node and drops out old data partitions in a least-recently-used (LRU) fashion. If you would like to manually remove an RDD instead of waiting for it to fall out of the cache, use the RDD.unpersist() method.
当相应的分布式数据结构被垃圾回收时。

如果不再使用 RDD，Spark 将自动 un-persist/clean RDD 或 Dataframe。要检查 RDD 是否已缓存，请检查 Spark UI 并检查“存储”选项卡并查看“内存”详细信息。

从终端，我们可以使用'rdd.unpersist()'或'sqlContext.uncacheTable("sparktable")'

从内存中删除 RDD 或表。 Spark 专为惰性求值而生，除非您说出任何操作，否则它不会将任何数据加载或处理到 RDD 或 DataFrame 中。