计算输入文本文件中字符出现的次数
Count number of character occurrences from input text file
如何将文本文件的flatMap转换为字符的flatMap?我必须计算文本文件中每个字符的出现次数。遵循代码后采取什么方法?
val words = readme.flatMap(line => line.split(" ")).collect()
为了将每个 String
转换成其代表字符,您需要额外的 flatMap
:
val characters = lines.flatMap(_.split(" ")).flatMap(_.toCharArray)
scala> val lines = Array("hello world", "yay more lines")
lines: Array[String] = Array(hello world, yay more lines)
scala> lines.flatMap(_.split(" ")).flatMap(_.toCharArray)
res3: Array[Char] = Array(h, e, l, l, o, w, o, r, l, d, y, a, y, m, o, r, e, l, i, n, e, s)
虽然这是 Scala 控制台,但它在 RDD
上的工作方式相同。
如果您只对 char
感兴趣,那么我想您可能也想数 spaces ' '
val chars = readme.flatMap(line => line.toCharArray)
// but if you dont want to count spaces too,
// val chars = readme.flatMap(line => line.toCharArray.filter(_ != ' '))
val charsCount = chars
.map(c => (c, 1))
.reduceByKey((i1: Int, i2: Int) => i1 + i2)
val txt = a.getClass.getResourceAsStream("/a.txt")
val txtFile = File.createTempFile("a", "txt")
txtFile.deleteOnExit()
ByteStreams.copy(txt, Files.newOutputStreamSupplier(txtFile))
val tokenized = sc.textFile(txtFile.toString).flatMap(_.split(' '))
val char = tokenized.flatMap(_.toCharArray)
如何将文本文件的flatMap转换为字符的flatMap?我必须计算文本文件中每个字符的出现次数。遵循代码后采取什么方法?
val words = readme.flatMap(line => line.split(" ")).collect()
为了将每个 String
转换成其代表字符,您需要额外的 flatMap
:
val characters = lines.flatMap(_.split(" ")).flatMap(_.toCharArray)
scala> val lines = Array("hello world", "yay more lines")
lines: Array[String] = Array(hello world, yay more lines)
scala> lines.flatMap(_.split(" ")).flatMap(_.toCharArray)
res3: Array[Char] = Array(h, e, l, l, o, w, o, r, l, d, y, a, y, m, o, r, e, l, i, n, e, s)
虽然这是 Scala 控制台,但它在 RDD
上的工作方式相同。
如果您只对 char
感兴趣,那么我想您可能也想数 spaces ' '
val chars = readme.flatMap(line => line.toCharArray)
// but if you dont want to count spaces too,
// val chars = readme.flatMap(line => line.toCharArray.filter(_ != ' '))
val charsCount = chars
.map(c => (c, 1))
.reduceByKey((i1: Int, i2: Int) => i1 + i2)
val txt = a.getClass.getResourceAsStream("/a.txt")
val txtFile = File.createTempFile("a", "txt")
txtFile.deleteOnExit()
ByteStreams.copy(txt, Files.newOutputStreamSupplier(txtFile))
val tokenized = sc.textFile(txtFile.toString).flatMap(_.split(' '))
val char = tokenized.flatMap(_.toCharArray)