我如何 select 一个不明确的列引用?
How do I select an ambiguous column reference?
这里有一些示例代码说明了我正在尝试做的事情。有一个包含列 companyid
和 companyId
的数据框。我想 select companyId
,但引用不明确。我如何明确 select 正确的列?
>> data = [Row(companyId=1, companyid=2, company="Hello world industries")]
>> df = sc.parallelize(data).toDF()
>> df.createOrReplaceTempView('my_df')
>> spark.sql("SELECT companyid FROM mcl_df")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark22/python/pyspark/sql/session.py", line 603, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/opt/spark22/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/opt/spark22/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"Reference 'companyid' is ambiguous, could be: companyid#1L, companyid#2L.; line 1 pos 7"
最终解决方案非常简单。在 运行 SELECT 声明之前,我 运行 以下内容:
spark.sql('set spark.sql.caseSensitive=true')
这里有一些示例代码说明了我正在尝试做的事情。有一个包含列 companyid
和 companyId
的数据框。我想 select companyId
,但引用不明确。我如何明确 select 正确的列?
>> data = [Row(companyId=1, companyid=2, company="Hello world industries")]
>> df = sc.parallelize(data).toDF()
>> df.createOrReplaceTempView('my_df')
>> spark.sql("SELECT companyid FROM mcl_df")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/spark22/python/pyspark/sql/session.py", line 603, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/opt/spark22/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/opt/spark22/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"Reference 'companyid' is ambiguous, could be: companyid#1L, companyid#2L.; line 1 pos 7"
最终解决方案非常简单。在 运行 SELECT 声明之前,我 运行 以下内容:
spark.sql('set spark.sql.caseSensitive=true')