正在尝试从 SQL 中的组中获取第一条记录

Trying to fetch the first record from a group in SQL

我正在尝试查询候选人、项目、合同的最早时间戳 SQL。

spark.sql(
      """
        |SELECT
        | DISTICT
        | timestamp,
        | candidate_id,
        | project_id,
        | contract_id
        |FROM candidatesHistory
        |GROUP BY timestamp, candidate_id, project_id, contract_id
        |ORDER BY timestamp DESC
        |LIMIT 1
        |""".stripMargin)

此代码不执行此操作,它仅获取一条记录 - 如何获取合同项目候选人的最早时间戳?

感谢任何帮助

如果table中只有4列,那么可以使用聚合:

select candidate_id, project_id, contract_id, min(timestamp) first_timestamp
from candidateshistory
group by candidate_id, project_id, contract_id

如果列比较多,想把所有都带上,那么可以用row_number()过滤 table:

select ch.*
from (
    select ch.*,
        row_number() over(partition by candidate_id, project_id, contract_id order by timestamp) rn
    from candidateshistory ch
) ch
where rn = 1

对于每个 (candidate_id, project_id, contract_id) 元组,这会为您提供最早 timestamp 的行。

这应该可行,但不知道这是否是最好的方法:

SELECT candidate_id
, project_id
, contract_id
, timestamp
FROM (
    SELECT RANK() OVER (PARTITION BY candidate_id ORDER BY timestamp) AS RNK
    , candidate_id
    , project_id
    , contract_id
    FROM candidatesHistory
    ) as CH
WHERE CH.RNK = 1;