如何在 Apache Spark 上的 Scala 中使用可变映射?找不到密钥错误
How to use mutable map in Scala on Apache Spark? Key not found error
我正在使用 Spark 1.3.0。
我的地图说它有钥匙,但我在访问钥匙时找不到钥匙或 none。
import scala.collection.mutable.HashMap
val labeldata = sc.textFile("/home/data/trainLabels2.csv")
val labels: Array[Array[String]] = labeldata.map(line => line.split(",")).collect()
var fn2label: HashMap[String,Int] = new HashMap()
labels.foreach{ x => fn2label += (x(0) -> x(1).toInt)}
我的地图看起来像:
scala> fn2label
res45: scala.collection.mutable.HashMap[String,Int] = Map("k2VDmKNaUlXtnMhsuCic" -> 1, "AGzOvc4dUfw1B8nDmY2X" -> 1, "BqRPMt4QY1sHzvF6JK7j" -> 3,.....
它甚至还有钥匙:
scala> fn2label.keys
res46: Iterable[String] = Set("k2VDmKNaUlXtnMhsuCic", "AGzOvc4dUfw1B8nDmY2X", "BqRPMt4QY1sHzvF6JK7j",
但我无法访问它们:
scala> fn2label.get("k2VDmKNaUlXtnMhsuCic")
res48: Option[Int] = None
scala> fn2label("k2VDmKNaUlXtnMhsuCic")
java.util.NoSuchElementException: key not found: k2VDmKNaUlXtnMhsuCic
我尝试过的包括广播地图、广播标签和地图、Map 而不是 HashMap、并行化
val mapRdd = sc.parallelize(fn2label.toSeq)
mapRdd.lookup("k2VDmKNaUlXtnMhsuCic")
res50: Seq[Int] = WrappedArray()
我错过了什么??
您的数据中只有多余的引号:
scala> val fn2label = scala.collection.mutable.HashMap("\"k2VDmKNaUlXtnMhsuCic\"" -> 1, "\"AGzOvc4dUfw1B8nDmY2X\"" -> 1, "\"BqRPMt4QY1sHzvF6JK7j\"" -> 3)
fn2label: scala.collection.mutable.HashMap[String,Int] = Map("BqRPMt4QY1sHzvF6JK7j" -> 3, "AGzOvc4dUfw1B8nDmY2X" -> 1, "k2VDmKNaUlXtnMhsuCic" -> 1)
scala> fn2label.get("\"k2VDmKNaUlXtnMhsuCic\"")
res4: Option[Int] = Some(1)
scala> fn2label.keys
res5: Iterable[String] = Set("BqRPMt4QY1sHzvF6JK7j", "AGzOvc4dUfw1B8nDmY2X", "k2VDmKNaUlXtnMhsuCic")
我正在使用 Spark 1.3.0。
我的地图说它有钥匙,但我在访问钥匙时找不到钥匙或 none。
import scala.collection.mutable.HashMap
val labeldata = sc.textFile("/home/data/trainLabels2.csv")
val labels: Array[Array[String]] = labeldata.map(line => line.split(",")).collect()
var fn2label: HashMap[String,Int] = new HashMap()
labels.foreach{ x => fn2label += (x(0) -> x(1).toInt)}
我的地图看起来像:
scala> fn2label
res45: scala.collection.mutable.HashMap[String,Int] = Map("k2VDmKNaUlXtnMhsuCic" -> 1, "AGzOvc4dUfw1B8nDmY2X" -> 1, "BqRPMt4QY1sHzvF6JK7j" -> 3,.....
它甚至还有钥匙:
scala> fn2label.keys
res46: Iterable[String] = Set("k2VDmKNaUlXtnMhsuCic", "AGzOvc4dUfw1B8nDmY2X", "BqRPMt4QY1sHzvF6JK7j",
但我无法访问它们:
scala> fn2label.get("k2VDmKNaUlXtnMhsuCic")
res48: Option[Int] = None
scala> fn2label("k2VDmKNaUlXtnMhsuCic")
java.util.NoSuchElementException: key not found: k2VDmKNaUlXtnMhsuCic
我尝试过的包括广播地图、广播标签和地图、Map 而不是 HashMap、并行化
val mapRdd = sc.parallelize(fn2label.toSeq)
mapRdd.lookup("k2VDmKNaUlXtnMhsuCic")
res50: Seq[Int] = WrappedArray()
我错过了什么??
您的数据中只有多余的引号:
scala> val fn2label = scala.collection.mutable.HashMap("\"k2VDmKNaUlXtnMhsuCic\"" -> 1, "\"AGzOvc4dUfw1B8nDmY2X\"" -> 1, "\"BqRPMt4QY1sHzvF6JK7j\"" -> 3)
fn2label: scala.collection.mutable.HashMap[String,Int] = Map("BqRPMt4QY1sHzvF6JK7j" -> 3, "AGzOvc4dUfw1B8nDmY2X" -> 1, "k2VDmKNaUlXtnMhsuCic" -> 1)
scala> fn2label.get("\"k2VDmKNaUlXtnMhsuCic\"")
res4: Option[Int] = Some(1)
scala> fn2label.keys
res5: Iterable[String] = Set("BqRPMt4QY1sHzvF6JK7j", "AGzOvc4dUfw1B8nDmY2X", "k2VDmKNaUlXtnMhsuCic")