Spark Sql,无法查询数组中的多个可能值
Spark Sql, unable to query multiple possible values in a array
我有 LinkeIn 帐户的数据架构,如下所示。我需要查询数组 for 中的技能,其中数组可能包含 JAVA OR java OR Java 或 JAVA developer OR Java developer .
Dataset<Row> sqlDF = spark.sql("SELECT * FROM people"
+ " WHERE ARRAY_CONTAINS(skills,'Java') "
+ " OR ARRAY_CONTAINS(skills,'JAVA')"
+ " OR ARRAY_CONTAINS(skills,'Java developer') "
+ "AND ARRAY_CONTAINS(experience['description'],'Java developer')" );
上面的查询是我试过的,请提出更好的建议way.and还有如何使用不区分大小写的查询?
df.printschema()
root
|-- skills: array (nullable = true)
| |-- element: string (containsNull = true)
df.show()
+--------------------+
| skills|
+--------------------+
| [Java, java]|
|[Java Developer, ...|
| [dev]|
+--------------------+
现在让我们将其注册为临时文件 table:
>>> df.registerTempTable("t")
现在,我们将分解数组,将每个元素转换为小写并使用 LIKE 运算符查询:
>>> res = sqlContext.sql("select skills, lower(skill) as skill from (select skills, explode(skills) skill from t) a where lower(skill) like '%java%'")
>>> res.show()
+--------------------+--------------+
| skills| skill|
+--------------------+--------------+
| [Java, java]| java|
| [Java, java]| java|
|[Java Developer, ...|java developer|
|[Java Developer, ...| java dev|
+--------------------+--------------+
现在,您可以在技能字段上进行区分。
我有 LinkeIn 帐户的数据架构,如下所示。我需要查询数组 for 中的技能,其中数组可能包含 JAVA OR java OR Java 或 JAVA developer OR Java developer .
Dataset<Row> sqlDF = spark.sql("SELECT * FROM people"
+ " WHERE ARRAY_CONTAINS(skills,'Java') "
+ " OR ARRAY_CONTAINS(skills,'JAVA')"
+ " OR ARRAY_CONTAINS(skills,'Java developer') "
+ "AND ARRAY_CONTAINS(experience['description'],'Java developer')" );
上面的查询是我试过的,请提出更好的建议way.and还有如何使用不区分大小写的查询?
df.printschema()
root
|-- skills: array (nullable = true)
| |-- element: string (containsNull = true)
df.show()
+--------------------+
| skills|
+--------------------+
| [Java, java]|
|[Java Developer, ...|
| [dev]|
+--------------------+
现在让我们将其注册为临时文件 table:
>>> df.registerTempTable("t")
现在,我们将分解数组,将每个元素转换为小写并使用 LIKE 运算符查询:
>>> res = sqlContext.sql("select skills, lower(skill) as skill from (select skills, explode(skills) skill from t) a where lower(skill) like '%java%'")
>>> res.show()
+--------------------+--------------+
| skills| skill|
+--------------------+--------------+
| [Java, java]| java|
| [Java, java]| java|
|[Java Developer, ...|java developer|
|[Java Developer, ...| java dev|
+--------------------+--------------+
现在,您可以在技能字段上进行区分。