如何在 pyspark 中应用日期格式 SQL
How to apply date format in pyspark SQL
我们从下面的 pyspark sql 查询
得到下面的数据
脚本::
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('yarn').appName('myAppName').getOrCreate()
df = spark.read.parquet('gs://data/test')
df.createOrReplaceTempView("people")
df2=spark.sql("""select id,concat(year(dates) ,'_',month(dates)) as date,
count(1) count
from people
group by id, month(dates),year(dates)
预期输出是
喜欢2019_jan、2019_feb、2019_oct、....2019_Dec
请帮助我了解 pyspark sql
中的日期格式语法
您可以像下面这样尝试
spark.sql("select date_format(current_date(),'MMM') as month from data")
你的完整查询应该是这样的
from pyspark.sql.functions import *
df2=spark.sql("""select id,concat(year(dates) ,'_',date_format(dates,'MMM')) as date,
count(1) count from people group by id, date_format(dates,'MMM'),year(dates)
我们从下面的 pyspark sql 查询
得到下面的数据脚本::
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('yarn').appName('myAppName').getOrCreate()
df = spark.read.parquet('gs://data/test')
df.createOrReplaceTempView("people")
df2=spark.sql("""select id,concat(year(dates) ,'_',month(dates)) as date,
count(1) count
from people
group by id, month(dates),year(dates)
预期输出是
喜欢2019_jan、2019_feb、2019_oct、....2019_Dec 请帮助我了解 pyspark sql
中的日期格式语法您可以像下面这样尝试
spark.sql("select date_format(current_date(),'MMM') as month from data")
你的完整查询应该是这样的
from pyspark.sql.functions import *
df2=spark.sql("""select id,concat(year(dates) ,'_',date_format(dates,'MMM')) as date,
count(1) count from people group by id, date_format(dates,'MMM'),year(dates)