如何将 spark 与 Cassandra 连接起来

How to connect spark with Cassandra

我正在使用 Ubuntu 并尝试将 spark 与 Cassandra 连接 我使用了以下步骤。

git clone https://github.com/datastax/spark-cassandra-connector.git
cd spark-cassandra-connector
./sbt/sbt assembly
./spark-shell --jars ~/spark/jars/spark-cassandra-connector-assembly-1.4.0-SNAPSHOT.jar

然后我尝试了这个

Scala> sc.stop
Scala> import com.datastax.spark.connector._
Scala> org.apache.spark.SparkContext
Scala> import org.apache.spark.SparkContext._
Scala import org.apache.spark.SparkConf
Scala> val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
Scala> val sc = new SparkContext(conf)
Scala> val test_spark_rdd = sc.cassandraTable("keyspace", "table") 

我使用的是 spark 2.2.1,我的 Cassandra 是 apache-cassandra-2.2.12

当我输入这个命令时

Scala> val test_spark_rdd = sc.cassandraTable("keyspace", "table") 

它给我这个错误。

error: missing or invalid dependency detected while loading class file 'CassandraConnector.class'. Could not access type Logging in package org apache spark, because it (or its dependencies) are missing. Check your build definition for missing or conflicting dependencies. (Re-run with Ylog classpath to see the problematic classpath.) A full rebuild may help if 'CassandraConnector class' was compiled against an incompatible version of org apache spark.

我找到了不同的教程,但我无法解决我的问题,有人会给我建议吗?谢谢

不要下载jar文件并尝试使用它们。而只是将 spark shell 指向 Maven 依赖项。

./bin/spark-shell --packages "com.datastax.spark:spark-cassandra-connector:2.0.7"

现在spark shell会自动从maven central下载正确的jar文件