尝试使用 SQL 语法进行斜率计算
Trying to make a slope calculation work with SQL syntaxes
我是一个相对较新的 SQL 程序员,我正在尝试让下面的代码在 SQL 中工作。该代码用于计算给定数据集的斜率,遵循与 EXCEL SLOPE 函数完全相同的逻辑。现在的问题是不允许计数,因为聚合是嵌套的。但是如果我为计数和求和创建一个子查询,我将不得不对 x 和 y 进行分组,否则我的外部查询中将没有 x 和 y 来计算。
CREATE TABLE TEST (X FLOAT, Y FLOAT);
INSERT INTO TEST (X, Y) VALUES (1,4.10242258729964);
INSERT INTO TEST (X, Y) VALUES (2,4.57708865242591);
INSERT INTO TEST (X, Y) VALUES (3,5.16785670619896);
INSERT INTO TEST (X, Y) VALUES (4,6.88149559336059);
select sum((x-sum(x)/count(x))^2)/sum(((x-sum(x)/count(x))*(y-sum(y)/count(y))))
from TEST
您可以根据 sum(x * x) 和 sum(x * y) 以及 avg(x) 和 avg(y) 以及 n 计算斜率:
SELECT avg(x) AS mx,sum(x*x) AS sx2,sum(x*y) as sxy,avg(y) as my, count(x) AS n
FROM test
那么你可以使用:
SELECT (sxy-n*mx*my)/(sx2 - n* mx*mx)
FROM
( SELECT avg(x) AS mx,sum(x*x) AS sx2,sum(x*y) as sxy,avg(y) as my, count(x) AS n
FROM test
)
我通常按照这些思路做一些事情(我使语法尽可能简单以避免任何可能有问题的事情,例如 CTE):
CREATE TABLE #test (x FLOAT, y FLOAT);
INSERT INTO #test SELECT 1, 4.10242258729964;
INSERT INTO #test SELECT 2, 4.57708865242591;
INSERT INTO #test SELECT 3, 5.16785670619896;
INSERT INTO #test SELECT 4, 6.88149559336059;
SELECT
(N * SUM_XY - SUM_X * SUM_Y) / (N * SUM_X2 - SUM_X * SUM_X) AS slope
FROM
(
SELECT
COUNT(*) AS N,
SUM(x) AS SUM_X,
SUM(x * x) AS SUM_X2,
SUM(y) AS SUM_Y,
SUM(y * y) AS SUM_Y2,
SUM(x * y) AS SUM_XY
FROM
#test) a;
只是 运行 这个,然后注意到你有来自 "SQL Hacks" 的另一个答案。我 运行 两个版本都得到了完全相同的答案,但另一个版本更短 :D
这是您创建的 SQL 的工作版本:
SELECT sum((x-avgx)*(x-avgx)) / sum((x-avgx)*(y-avgy))
FROM TEST, (SELECT sum(X)/count(X) as avgx, sum(Y)/count(Y) as avgy FROM TEST) average;
我查了 excel 斜率函数,它的定义有点不同:
SELECT sum((x-avgx)*(y-avgy)) / sum((x-avgx)*(x-avgx))
FROM TEST,
(
SELECT
sum(X)/count(X) as avgx,
sum(Y)/count(Y) as avgy
FROM TEST
) average;
希望这就是您所需要的:)
我是一个相对较新的 SQL 程序员,我正在尝试让下面的代码在 SQL 中工作。该代码用于计算给定数据集的斜率,遵循与 EXCEL SLOPE 函数完全相同的逻辑。现在的问题是不允许计数,因为聚合是嵌套的。但是如果我为计数和求和创建一个子查询,我将不得不对 x 和 y 进行分组,否则我的外部查询中将没有 x 和 y 来计算。
CREATE TABLE TEST (X FLOAT, Y FLOAT);
INSERT INTO TEST (X, Y) VALUES (1,4.10242258729964);
INSERT INTO TEST (X, Y) VALUES (2,4.57708865242591);
INSERT INTO TEST (X, Y) VALUES (3,5.16785670619896);
INSERT INTO TEST (X, Y) VALUES (4,6.88149559336059);
select sum((x-sum(x)/count(x))^2)/sum(((x-sum(x)/count(x))*(y-sum(y)/count(y))))
from TEST
您可以根据 sum(x * x) 和 sum(x * y) 以及 avg(x) 和 avg(y) 以及 n 计算斜率:
SELECT avg(x) AS mx,sum(x*x) AS sx2,sum(x*y) as sxy,avg(y) as my, count(x) AS n
FROM test
那么你可以使用:
SELECT (sxy-n*mx*my)/(sx2 - n* mx*mx)
FROM
( SELECT avg(x) AS mx,sum(x*x) AS sx2,sum(x*y) as sxy,avg(y) as my, count(x) AS n
FROM test
)
我通常按照这些思路做一些事情(我使语法尽可能简单以避免任何可能有问题的事情,例如 CTE):
CREATE TABLE #test (x FLOAT, y FLOAT);
INSERT INTO #test SELECT 1, 4.10242258729964;
INSERT INTO #test SELECT 2, 4.57708865242591;
INSERT INTO #test SELECT 3, 5.16785670619896;
INSERT INTO #test SELECT 4, 6.88149559336059;
SELECT
(N * SUM_XY - SUM_X * SUM_Y) / (N * SUM_X2 - SUM_X * SUM_X) AS slope
FROM
(
SELECT
COUNT(*) AS N,
SUM(x) AS SUM_X,
SUM(x * x) AS SUM_X2,
SUM(y) AS SUM_Y,
SUM(y * y) AS SUM_Y2,
SUM(x * y) AS SUM_XY
FROM
#test) a;
只是 运行 这个,然后注意到你有来自 "SQL Hacks" 的另一个答案。我 运行 两个版本都得到了完全相同的答案,但另一个版本更短 :D
这是您创建的 SQL 的工作版本:
SELECT sum((x-avgx)*(x-avgx)) / sum((x-avgx)*(y-avgy))
FROM TEST, (SELECT sum(X)/count(X) as avgx, sum(Y)/count(Y) as avgy FROM TEST) average;
我查了 excel 斜率函数,它的定义有点不同:
SELECT sum((x-avgx)*(y-avgy)) / sum((x-avgx)*(x-avgx))
FROM TEST,
(
SELECT
sum(X)/count(X) as avgx,
sum(Y)/count(Y) as avgy
FROM TEST
) average;
希望这就是您所需要的:)