计算输入文本文件中字符出现的次数

Count number of character occurrences from input text file

如何将文本文件的flatMap转换为字符的flatMap?我必须计算文本文件中每个字符的出现次数。遵循代码后采取什么方法?

val words = readme.flatMap(line => line.split(" ")).collect()

为了将每个 String 转换成其代表字符,您需要额外的 flatMap:

val characters = lines.flatMap(_.split(" ")).flatMap(_.toCharArray)

scala> val lines = Array("hello world", "yay more lines")
lines: Array[String] = Array(hello world, yay more lines)

scala> lines.flatMap(_.split(" ")).flatMap(_.toCharArray)
res3: Array[Char] = Array(h, e, l, l, o, w, o, r, l, d, y, a, y, m, o, r, e, l, i, n, e, s)

虽然这是 Scala 控制台,但它在 RDD 上的工作方式相同。

如果您只对 char 感兴趣,那么我想您可能也想数 spaces ' '

val chars = readme.flatMap(line => line.toCharArray)

// but if you dont want to count spaces too,
// val chars = readme.flatMap(line => line.toCharArray.filter(_ != ' '))

val charsCount = chars
  .map(c => (c, 1))
  .reduceByKey((i1: Int, i2: Int) => i1 + i2)
val txt = a.getClass.getResourceAsStream("/a.txt")
val txtFile = File.createTempFile("a", "txt")
txtFile.deleteOnExit()
ByteStreams.copy(txt, Files.newOutputStreamSupplier(txtFile))
val tokenized = sc.textFile(txtFile.toString).flatMap(_.split(' ')) 
val char = tokenized.flatMap(_.toCharArray)