SQL 服务器 - 仅为字段的每个不同值获取一行

SQL Server - Only get a single row for each distinct value of a field

我刚结束工作面试回家,他们让我参加了编程测试。真正难倒我的问题之一如下:

You are a teacher at a high school and have been put in charge of picking the best possible debate team for the upcoming National Debate Championships. Given the following table structure:

CREATE TABLE CompetitionResults (
    StudentName NVARCHAR(255) NOT NULL,     -- The student's name
    SchoolYear INT NOT NULL,                -- The school year of the student at the time they entered the competition
    CompetitionDate DATE NOT NULL,          -- The date of the competition
    CompetitionResult INT NOT NULL          -- The student's final score in the competition (0 - 100)
)

Write a query that will return the names of the best candidates for the upcoming competition, based on their previous competition results.

Constraints:

  • Return a single column, StudentName.
  • Only one student should be picked from each school year (7 - 12).
  • Each returned student must have competed in exactly 3 other competitions this year.

这是我遇到最多麻烦的最后一个约束。这是我在 运行 超时后最终提交的内容:

SELECT
    StudentName AS sn,
    (SELECT COUNT(*) AS NumComps, CompetitionDate FROM CompetitionResults
        WHERE YEAR(CompetitionDate) = 2020 AND NumComps = 3),
    SchoolYear,
    CompetitionDate,
    CompetitionResult
FROM CompetitionResults
WHERE CompetitionDate IN (SELECT MIN(CompetitionDate)
    FROM CompetitionResults GROUP BY CompetitionDate) AND
    CompetitionResult IN (SELECT MAX(CompetitionResult) FROM
    CompetitionResults WHERE StudentName = sn);

为了职业发展,我希望能够在尽可能少的帮助下解决这个问题,但正如你可能会说的那样,我真的很挣扎。这段代码甚至无法编译,更不用说所有子查询的性能影响了!然而,我发现它们比连接更容易编码,因此我在这里使用它们。

任何 guidance/tips 将不胜感激。 MTIA :-)

我想这可以用 Window 函数解决。举个例子 - 毕竟可能需要一些调整,但你应该明白了:

DECLARE @t TABLE(
  StudentName NVARCHAR(255)
 ,SchoolYear INT
 ,CompetitionDate DATE
 ,CompetitionResult INT
)

INSERT INTO @t VALUES
('Peter', 7, '2019-01-01', 100)
,('Peter', 8, '2020-01-01', 100)
,('Peter', 8, '2020-03-01', 100)
,('Paul', 10, '2020-01-01', 100)
,('Paul', 10, '2020-03-01', 100)
,('Paul', 10, '2020-04-01', 100)
,('Mary', 11, '2019-01-01', 100)
,('Mary', 11, '2019-02-01', 100)
,('Mary', 11, '2019-03-01', 100)
,('Jacob', 12, '2020-01-01', 100)
,('Jacob', 12, '2020-02-01', 100)
,('Jacob', 12, '2020-03-01', 100)
,('Jacob', 12, '2020-04-01', 90)
,('Jennifer', 9, '2020-03-01', 100)
,('Jennifer', 9, '2020-04-01', 100)
,('Jennifer', 9, '2020-05-01', 100)
,('Lucas', 12, '2020-03-01', 100)
,('Lucas', 12, '2020-04-01', 100)
,('Lucas', 12, '2020-05-01', 100)

;WITH cte AS(
SELECT *
      ,COUNT(CASE WHEN YEAR(CompetitionDate) = YEAR(GETDATE()) THEN 1 ELSE NULL END) OVER (PARTITION BY StudentName, YEAR(CompetitionDate)) AS CountCompYear
      ,ROW_NUMBER() OVER (PARTITION BY StudentName ORDER BY CompetitionDate DESC) AS LastCompetition
      
  FROM @t
),
cteFilter AS(
SELECT *, ROW_NUMBER() OVER (PARTITION BY SchoolYear ORDER BY CompetitionResult DESC, StudentName ASC) AS DistStudent
  FROM cte
  WHERE CountCompYear = 3
    AND LastCompetition = 1
)
SELECT *
  FROM cteFilter
  WHERE DistStudent = 1

对我来说,这基本上就是聚合。 . .有一点 window 功能:

select studentname, SchoolYear, avg_competitionscore
from (select studentname, SchoolYear, avg(competitionscore) as avg_competitionscore,
             row_number() over (partition by SchoolYear order by avg(competitionscore) desc) as seqnum
      from CompetitionResults cr
      where year(CompetitionDate) = year(getdate())
      group by studentname
      having count(*) = 3
     ) s
where seqnum = 1;

子查询汇总每个学生的比赛,应用适当的过滤条件——包括个人比赛和总数。外部查询每年选择一个。

我不明白三场比赛与最好的比赛有什么关系。我怀疑关于根据分数选择最佳学生的部分是“隐藏要求”,用于区分仅可接受的解决方案和最佳解决方案。

我想可能有额外的逻辑来检查每年至少有一个候选人,但问题表明至少有一个这样的学生。