SQL 服务器中的计算列
Calculated Column in SQL Server
我在 table 中的数据为:
id Author_ID Research_Area Category_ID Paper_Count Paper_Year Rank
---------------------------------------------------------------------------------
1 677 feature extraction 8 1 2005 1
2 677 image annotation 11 1 2005 2
3 677 probabilistic model 12 1 2005 3
4 677 semantic 19 1 2007 1
5 677 feature extraction 8 1 2009 1
6 677 image annotation 11 1 2011 1
7 677 semantic 19 1 2012 1
8 677 video sequence 5 2 2013 1
9 1359 adversary model 1 2 2005 1
10 1359 ensemble method 14 2 2005 2
11 1359 image represent 11 2 2005 3
12 1359 adversary model 1 7 2006 1
13 1359 concurrency control 17 5 2006 2
14 1359 information system 12 2 2006 3
15 ...
16 ...
而我希望查询输出为:
id Author_ID Category_ID Paper_Count Category_Prob Paper_Year Rank
---------------------------------------------------------------------------------
1 677 8 1 0.333 2005 1
2 677 11 1 0.333 2005 2
3 677 12 1 0.333 2005 3
4 677 19 1 1.0 2007 1
5 677 8 1 1.0 2009 1
6 677 11 1 1.0 2011 1
7 677 19 1 1.0 2012 1
8 677 5 2 1.0 2013 1
9 1359 1 2 0.333 2005 1
10 1359 14 2 0.333 2005 2
11 1359 11 2 0.333 2005 3
12 1359 1 7 0.5 2006 1
13 1359 17 5 0.357 2006 2
14 1359 12 2 0.142 2006 3
15 ...
16 ...
而 Category_Prob
是一个计算列,它分两步计算:
第一步,我们必须在每个 Paper_Year
中有一个 SUM
的 Paper_Count
,例如 Paper_Year = 2005
和Author_ID = 677
, SUM(Paper_Count) = 3
Step Second,然后对于每个 Category_ID
,我们必须将 Paper_Count
除以 SUM(Paper_Count)
的值 Paper_Year
这将是 1/3
即 0.333
等等...
此外,我试过这个查询:
SELECT
Author_ID, Abstract_Category, Paper_Count,
[Category_Prob] = Paper_Count / SUM(Paper_Count),
Paper_Year, Rank
FROM
Author_Areas
GROUP BY
Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY
Author_ID, Paper_Year
但是对于 table 中的所有行,它 returns 仅 1
在列 Category_Prob
中。
我怀疑(请确认)涉及的所有字段的数据类型都是integers
。当您使用 int
计算时,return 类型也是 int
。在计算之前,您应该 convert
字段 decimal
。
SELECT Author_ID, Abstract_Category, Paper_Count,
[Category_Prob] = convert(decimal(10,3), Paper_Count) / convert(decimal(10, 3), SUM(Paper_Count)),
Paper_Year, Rank
FROM Author_Areas
GROUP BY Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY Author_ID, Paper_Year
您的查询存在问题,您不是按 Paper_Year
分组,而是按 Author_ID, Abstract_Category, Paper_Count, Rank
分组。因此,对于每个组,SUM(Paper_Count)
等于 Paper_Count。
您可以为此使用 SUM OVER
:
SELECT id, Author_ID, Abstract_Category [Category_ID],
Paper_Count,
Paper_Count * 1.0 / SUM(Paper_Count)
OVER (PARTITION BY Author_ID, Paper_Year) AS [Category_Prob],
Paper_Year, Rank
FROM Author_Areas
ORDER BY Author_ID, Paper_Year
注意:必须乘以1.0
才能避免整除。
注意 2: 如果您的实际要求是按作者、年份分组,也许您还必须在 PARTITION BY
子句中添加 Author_ID
字段。
我在 table 中的数据为:
id Author_ID Research_Area Category_ID Paper_Count Paper_Year Rank
---------------------------------------------------------------------------------
1 677 feature extraction 8 1 2005 1
2 677 image annotation 11 1 2005 2
3 677 probabilistic model 12 1 2005 3
4 677 semantic 19 1 2007 1
5 677 feature extraction 8 1 2009 1
6 677 image annotation 11 1 2011 1
7 677 semantic 19 1 2012 1
8 677 video sequence 5 2 2013 1
9 1359 adversary model 1 2 2005 1
10 1359 ensemble method 14 2 2005 2
11 1359 image represent 11 2 2005 3
12 1359 adversary model 1 7 2006 1
13 1359 concurrency control 17 5 2006 2
14 1359 information system 12 2 2006 3
15 ...
16 ...
而我希望查询输出为:
id Author_ID Category_ID Paper_Count Category_Prob Paper_Year Rank
---------------------------------------------------------------------------------
1 677 8 1 0.333 2005 1
2 677 11 1 0.333 2005 2
3 677 12 1 0.333 2005 3
4 677 19 1 1.0 2007 1
5 677 8 1 1.0 2009 1
6 677 11 1 1.0 2011 1
7 677 19 1 1.0 2012 1
8 677 5 2 1.0 2013 1
9 1359 1 2 0.333 2005 1
10 1359 14 2 0.333 2005 2
11 1359 11 2 0.333 2005 3
12 1359 1 7 0.5 2006 1
13 1359 17 5 0.357 2006 2
14 1359 12 2 0.142 2006 3
15 ...
16 ...
而 Category_Prob
是一个计算列,它分两步计算:
第一步,我们必须在每个 Paper_Year
中有一个 SUM
的 Paper_Count
,例如 Paper_Year = 2005
和Author_ID = 677
, SUM(Paper_Count) = 3
Step Second,然后对于每个 Category_ID
,我们必须将 Paper_Count
除以 SUM(Paper_Count)
的值 Paper_Year
这将是 1/3
即 0.333
等等...
此外,我试过这个查询:
SELECT
Author_ID, Abstract_Category, Paper_Count,
[Category_Prob] = Paper_Count / SUM(Paper_Count),
Paper_Year, Rank
FROM
Author_Areas
GROUP BY
Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY
Author_ID, Paper_Year
但是对于 table 中的所有行,它 returns 仅 1
在列 Category_Prob
中。
我怀疑(请确认)涉及的所有字段的数据类型都是integers
。当您使用 int
计算时,return 类型也是 int
。在计算之前,您应该 convert
字段 decimal
。
SELECT Author_ID, Abstract_Category, Paper_Count,
[Category_Prob] = convert(decimal(10,3), Paper_Count) / convert(decimal(10, 3), SUM(Paper_Count)),
Paper_Year, Rank
FROM Author_Areas
GROUP BY Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY Author_ID, Paper_Year
您的查询存在问题,您不是按 Paper_Year
分组,而是按 Author_ID, Abstract_Category, Paper_Count, Rank
分组。因此,对于每个组,SUM(Paper_Count)
等于 Paper_Count。
您可以为此使用 SUM OVER
:
SELECT id, Author_ID, Abstract_Category [Category_ID],
Paper_Count,
Paper_Count * 1.0 / SUM(Paper_Count)
OVER (PARTITION BY Author_ID, Paper_Year) AS [Category_Prob],
Paper_Year, Rank
FROM Author_Areas
ORDER BY Author_ID, Paper_Year
注意:必须乘以1.0
才能避免整除。
注意 2: 如果您的实际要求是按作者、年份分组,也许您还必须在 PARTITION BY
子句中添加 Author_ID
字段。