使用现有 table 名称缓存新 table 是否会从内存中删除旧内容?
Does caching a new table with an existing table name remove old contents from memory?
使用,Spark 1.5.2:
dfOld.registerTempTable("oldTableName")
hiveContext.cacheTable("oldTableName")
// ....
// do something
// ....
dfNew.registerTempTable("oldTableName")
hiveContext.cacheTable("oldTableName")
现在,当我使用 "oldTableName" table 时,我确实从 dfNew 获取了最新内容,但是 dfOld 的内容是否已从内存中删除?
或者这样做的正确用法是:
dfOld.registerTempTable("oldTableName")
hiveContext.cacheTable("oldTableName")
// ....
// do something
// ....
dfNew.registerTempTable("oldTableName")
hiveContext.unCacheTable("oldTableName") <========== un-cache the old contents first
hiveContext.cacheTable("oldTableName")
不,在使用 hiveContext.uncacheTable("tableName")
或 hiveContext.uncache()
明确要求 spark cacheManager 这样做之前,不会取消缓存内容[警告:这会取消缓存所有 tables]。
证明:在实验中,"Storage" 选项卡清楚地显示了相同 table.
的重复条目
对于此代码段:
dfOld.registerTempTable("myColorsTable")
hiveContext.cacheTable("myColorsTable")
// ....
// do something
// ....
dfNew.registerTempTable("myColorsTable")
hiveContext.cacheTable("myColorsTable")
在./bin/spark-shell
scala> df.collect
res54: Array[org.apache.spark.sql.Row] = Array([blue,#0033FF], [red,#FF0000], [green,#FSKA]) <=== 3 rows
scala> df2.collect
res55: Array[org.apache.spark.sql.Row] = Array([blue,#0033FF], [red,#FF0000]) <=== 2 rows
scala> df.registerTempTable("myColorsTable")
scala> sqlContext.isCached("myColorsTable")
res58: Boolean = false
scala> sqlContext.cacheTable("myColorsTable") <=== cache table in df(3 rows)
scala> sqlContext.isCached("myColorsTable")
res60: Boolean = true
scala> sqlContext.sql("select * from myColorsTable").foreach(println) <=== sql is running on df(3 rows)
[blue,#0033FF]
[red,#FF0000]
[green,#FSKA]
scala> df2.registerTempTable("myColorsTable") <=== register another table with the same table name
scala> sqlContext.isCached("myColorsTable")
res63: Boolean = false
scala> sqlContext.sql("select * from myColorsTable").foreach(println) <=== sql is running on df2(2 rows)
[blue,#0033FF]
[red,#FF0000]
scala> sqlContext.cacheTable("myColorsTable")
15/12/19 09:53:55 WARN CacheManager: Asked to cache already cached data. <=====
if (lookupCachedData(planToCache).nonEmpty) {
logWarning("Asked to cache already cached data.")
使用,Spark 1.5.2:
dfOld.registerTempTable("oldTableName")
hiveContext.cacheTable("oldTableName")
// ....
// do something
// ....
dfNew.registerTempTable("oldTableName")
hiveContext.cacheTable("oldTableName")
现在,当我使用 "oldTableName" table 时,我确实从 dfNew 获取了最新内容,但是 dfOld 的内容是否已从内存中删除?
或者这样做的正确用法是:
dfOld.registerTempTable("oldTableName")
hiveContext.cacheTable("oldTableName")
// ....
// do something
// ....
dfNew.registerTempTable("oldTableName")
hiveContext.unCacheTable("oldTableName") <========== un-cache the old contents first
hiveContext.cacheTable("oldTableName")
不,在使用 hiveContext.uncacheTable("tableName")
或 hiveContext.uncache()
明确要求 spark cacheManager 这样做之前,不会取消缓存内容[警告:这会取消缓存所有 tables]。
证明:在实验中,"Storage" 选项卡清楚地显示了相同 table.
的重复条目
对于此代码段:
dfOld.registerTempTable("myColorsTable")
hiveContext.cacheTable("myColorsTable")
// ....
// do something
// ....
dfNew.registerTempTable("myColorsTable")
hiveContext.cacheTable("myColorsTable")
在./bin/spark-shell
scala> df.collect
res54: Array[org.apache.spark.sql.Row] = Array([blue,#0033FF], [red,#FF0000], [green,#FSKA]) <=== 3 rows
scala> df2.collect
res55: Array[org.apache.spark.sql.Row] = Array([blue,#0033FF], [red,#FF0000]) <=== 2 rows
scala> df.registerTempTable("myColorsTable")
scala> sqlContext.isCached("myColorsTable")
res58: Boolean = false
scala> sqlContext.cacheTable("myColorsTable") <=== cache table in df(3 rows)
scala> sqlContext.isCached("myColorsTable")
res60: Boolean = true
scala> sqlContext.sql("select * from myColorsTable").foreach(println) <=== sql is running on df(3 rows)
[blue,#0033FF]
[red,#FF0000]
[green,#FSKA]
scala> df2.registerTempTable("myColorsTable") <=== register another table with the same table name
scala> sqlContext.isCached("myColorsTable")
res63: Boolean = false
scala> sqlContext.sql("select * from myColorsTable").foreach(println) <=== sql is running on df2(2 rows)
[blue,#0033FF]
[red,#FF0000]
scala> sqlContext.cacheTable("myColorsTable")
15/12/19 09:53:55 WARN CacheManager: Asked to cache already cached data. <=====
if (lookupCachedData(planToCache).nonEmpty) {
logWarning("Asked to cache already cached data.")