Write/Read/Delete Spark Databricks (scala) 中的二进制数据
Write/Read/Delete binary data in Spark Databricks (scala)
我对 Databricks (Scala) 上的 Spark 很陌生,我想知道如何将 Array[Byte]
类型的变量内容写入临时文件 data.bin
安装存储 mtn/somewhere/tmp/
(Azure Data Lake) 或 file:/tmp/
。然后我想知道如何将它作为 InputStream 读取并在完成后删除它。
到目前为止我读过的所有方法都不起作用或不适用于二进制数据。
谢谢。
原来这段代码工作正常:
import java.io._
import org.apache.commons.io.FileUtils
// Create or collect the data
val bytes: Array[Byte] = <some_data>
try {
// Write data to temp file
// Note : Here I use GRIB2 file as I manipulate forecast data,
// but you can use .bin or .png/.jpg (if it's an image data)
// extensions or no extension at all. It doesn't matter.
val path: String = "mf-data.grib"
val file: File = new File(path)
FileUtils.writeByteArrayToFile(file, bytes)
// Read the temp file
val input = new FileInputStream(path)
////////// Do something with it //////////
// Remove the temp file
if (!file.delete()) {
println("Cannot delete temporary file !")
}
} catch {
case _: Throwable => println("An I/O error occured")
我对 Databricks (Scala) 上的 Spark 很陌生,我想知道如何将 Array[Byte]
类型的变量内容写入临时文件 data.bin
安装存储 mtn/somewhere/tmp/
(Azure Data Lake) 或 file:/tmp/
。然后我想知道如何将它作为 InputStream 读取并在完成后删除它。
到目前为止我读过的所有方法都不起作用或不适用于二进制数据。
谢谢。
原来这段代码工作正常:
import java.io._
import org.apache.commons.io.FileUtils
// Create or collect the data
val bytes: Array[Byte] = <some_data>
try {
// Write data to temp file
// Note : Here I use GRIB2 file as I manipulate forecast data,
// but you can use .bin or .png/.jpg (if it's an image data)
// extensions or no extension at all. It doesn't matter.
val path: String = "mf-data.grib"
val file: File = new File(path)
FileUtils.writeByteArrayToFile(file, bytes)
// Read the temp file
val input = new FileInputStream(path)
////////// Do something with it //////////
// Remove the temp file
if (!file.delete()) {
println("Cannot delete temporary file !")
}
} catch {
case _: Throwable => println("An I/O error occured")