JOIN 与 GROUP BY 导致 SUM() 逻辑问题
JOIN with GROUP BY causing SUM() logic issues
查询-
sel TableName, DatabaseName, sum(CurrentPerm/(1024*1024*1024)) as Size_in_GB
from dbc.tablesize
group by 1,2
order by GB desc
结果 -
+-----------+--------+------------+
| TableName | DBName | Size_in_GB |
+-----------+--------+------------+
| WRP | A | 28,350.01 |
| CPC | B | 19,999.37 |
| SDF | C | 13,263.67 |
| DB1400 | D | 13,200.26 |
+-----------+--------+------------+
从上面的简单查询我可以看到数据库A的tableWRP接近28350国标
现在我正在尝试加入另一个 table dbc.indices
以使用列 IndexType
进行过滤,但现在所有 table 的 Size_in_GB 都发生了变化.
sel a.TableName,a.DatabaseName, sum(CurrentPerm/(1024*1024*1024)) as Size_in_GB from dbc.tablesize a
join dbc.indices b on a.TableName = b.TableName and a.DatabaseName=b.DatabaseName
--where b.indexType='P'
group by 1,2
order by Size_in_GB desc
结果是这样的-
+-----------+--------+------------+
| TableName | DBName | Size_in_GB |
+-----------+--------+------------+
| WRP | A | 56,700.02 |
| CPC | B | 39,998.74 |
| DB1400 | D | 39,600.78 |
+-----------+--------+------------+
现在相同的 table 是两倍大小,即 WRP 是 56700 GB。 (其他 tables 类似)
我不确定我用于加入的逻辑有什么问题。
P.S - 我的目标是找到所有大小大于 100GB 且索引类型为 'P' 的 table
编辑 - 分享来自 DBC.INDICES
table
的相关专栏
+--------------+------------+-------------+-----------+------------+---------------+------------+----------------+
| DatabaseName | TableName | IndexNumber | IndexType | UniqueFlag | IndexName | ColumnName | ColumnPosition |
+--------------+------------+-------------+-----------+------------+---------------+------------+----------------+
| Some DB | Some Table | 1 | P | N | IndexNamehere | ColumnA | 1 |
+--------------+------------+-------------+-----------+------------+---------------+------------+----------------+
可能您的密钥在 dbc.indices
table 中重复。对于单个 TableName
,dbc.indices
table 有不止一个条目,因此当您加入 dbc.tablesize
table 记录时会重复,因此应用 SUM
在重复记录上所以计算错误。
试试这个方法
SELECT a.TableName,
a.DatabaseName,
Sum(CurrentPerm / ( 1024 * 1024 * 1024 )) AS Size_in_GB
FROM dbc.tablesize a
JOIN (SELECT DISTINCT b.TableName,
b.DatabaseName
FROM dbc.indices b
--where b.indexType='P'
) b
ON a.TableName = b.TableName
AND a.DatabaseName = b.DatabaseName
GROUP BY a.TableName,
a.DatabaseName
ORDER BY Size_in_GB DESC
什么是混淆?
您显然有 table 个具有多个索引。每个索引都会导致 table 在聚合中出现不止一次。
你想要什么:
My aim is to find all the tables which are greater than 100GB in Size
and have indexType as 'P'
我建议将索引比较移动到 where
子句:
select t.TableName, t.DatabaseName,
sum(tCurrentPerm/(1024*1024*1024)) as Size_in_GB
from dbc.tablesize t
where exists (select 1
from dbc.indices i
where t.TableName = i.TableName and t.DatabaseName = i.DatabaseName and
i.indexType = 'P'
)
group by 1,2
order by Size_in_GB desc
如果您还想添加该过滤器,可以在 order by
之前添加 having Size_in_GB > 100
。
dbc.IndidesV
(永远不要使用旧的已弃用的非 V 视图)每个索引每列一行。
您可以简单地添加一个条件以将其限制为单行:where IndexType = 'P' and ColumnPosition = 1
并且进行早期聚合更有效,即在加入之前聚合:
select t.*
from
(
select TableName, DatabaseName,
sum(CurrentPerm/(1024*1024*1024)) as Size_in_GB
from dbc.TableSizeV
group by 1,2
having Size_in_GB > 100
) as dt
join dbc.IndicesV b
on a.TableName = b.TableName
and a.DatabaseName=b.DatabaseName
where IndexType = 'P'
and ColumnPosition = 1
order by Size_in_GB desc;
但是为什么要针对那个 IndexType=P
进行过滤,难道您不关心其他大于 100GB 的对象(NoPI/Columnar 表,连接索引)吗?顺便说一句,这并不是 return 所有带有 PI 的表,只有 IndexNumber=1
有。
根据您的需要,您最好加入 dbc.TablesV
。
P.S - My aim is to find all the tables which are greater than 100GB
in Size and have indexType as 'P'
如果您只想查找存在索引的某些 table,则根本不应该加入。请改用 EXISTS
。这会将您的条件放在它所属的 WHERE
或 HAVING
子句中,并且您的条件复制记录没有问题(在您的情况下:当 table 有多个匹配索引)。
select tablename, databasename, sum(currentperm/(1024*1024*1024)) as size_in_gb
from dbc.tablesize ts
group by tablename, databasename
having sum(currentperm/(1024*1024*1024)) > 100
and exists
(
select *
from dbc.indices i
where i.tablename = ts.tablename and i.databasename = ts.databasename
and i.indexType = 'P'
)
order by Size_in_GB desc;
查询-
sel TableName, DatabaseName, sum(CurrentPerm/(1024*1024*1024)) as Size_in_GB
from dbc.tablesize
group by 1,2
order by GB desc
结果 -
+-----------+--------+------------+
| TableName | DBName | Size_in_GB |
+-----------+--------+------------+
| WRP | A | 28,350.01 |
| CPC | B | 19,999.37 |
| SDF | C | 13,263.67 |
| DB1400 | D | 13,200.26 |
+-----------+--------+------------+
从上面的简单查询我可以看到数据库A的tableWRP接近28350国标
现在我正在尝试加入另一个 table dbc.indices
以使用列 IndexType
进行过滤,但现在所有 table 的 Size_in_GB 都发生了变化.
sel a.TableName,a.DatabaseName, sum(CurrentPerm/(1024*1024*1024)) as Size_in_GB from dbc.tablesize a
join dbc.indices b on a.TableName = b.TableName and a.DatabaseName=b.DatabaseName
--where b.indexType='P'
group by 1,2
order by Size_in_GB desc
结果是这样的-
+-----------+--------+------------+
| TableName | DBName | Size_in_GB |
+-----------+--------+------------+
| WRP | A | 56,700.02 |
| CPC | B | 39,998.74 |
| DB1400 | D | 39,600.78 |
+-----------+--------+------------+
现在相同的 table 是两倍大小,即 WRP 是 56700 GB。 (其他 tables 类似)
我不确定我用于加入的逻辑有什么问题。
P.S - 我的目标是找到所有大小大于 100GB 且索引类型为 'P' 的 table
编辑 - 分享来自 DBC.INDICES
table
+--------------+------------+-------------+-----------+------------+---------------+------------+----------------+
| DatabaseName | TableName | IndexNumber | IndexType | UniqueFlag | IndexName | ColumnName | ColumnPosition |
+--------------+------------+-------------+-----------+------------+---------------+------------+----------------+
| Some DB | Some Table | 1 | P | N | IndexNamehere | ColumnA | 1 |
+--------------+------------+-------------+-----------+------------+---------------+------------+----------------+
可能您的密钥在 dbc.indices
table 中重复。对于单个 TableName
,dbc.indices
table 有不止一个条目,因此当您加入 dbc.tablesize
table 记录时会重复,因此应用 SUM
在重复记录上所以计算错误。
试试这个方法
SELECT a.TableName,
a.DatabaseName,
Sum(CurrentPerm / ( 1024 * 1024 * 1024 )) AS Size_in_GB
FROM dbc.tablesize a
JOIN (SELECT DISTINCT b.TableName,
b.DatabaseName
FROM dbc.indices b
--where b.indexType='P'
) b
ON a.TableName = b.TableName
AND a.DatabaseName = b.DatabaseName
GROUP BY a.TableName,
a.DatabaseName
ORDER BY Size_in_GB DESC
什么是混淆?
您显然有 table 个具有多个索引。每个索引都会导致 table 在聚合中出现不止一次。
你想要什么:
My aim is to find all the tables which are greater than 100GB in Size and have indexType as 'P'
我建议将索引比较移动到 where
子句:
select t.TableName, t.DatabaseName,
sum(tCurrentPerm/(1024*1024*1024)) as Size_in_GB
from dbc.tablesize t
where exists (select 1
from dbc.indices i
where t.TableName = i.TableName and t.DatabaseName = i.DatabaseName and
i.indexType = 'P'
)
group by 1,2
order by Size_in_GB desc
如果您还想添加该过滤器,可以在 order by
之前添加 having Size_in_GB > 100
。
dbc.IndidesV
(永远不要使用旧的已弃用的非 V 视图)每个索引每列一行。
您可以简单地添加一个条件以将其限制为单行:where IndexType = 'P' and ColumnPosition = 1
并且进行早期聚合更有效,即在加入之前聚合:
select t.*
from
(
select TableName, DatabaseName,
sum(CurrentPerm/(1024*1024*1024)) as Size_in_GB
from dbc.TableSizeV
group by 1,2
having Size_in_GB > 100
) as dt
join dbc.IndicesV b
on a.TableName = b.TableName
and a.DatabaseName=b.DatabaseName
where IndexType = 'P'
and ColumnPosition = 1
order by Size_in_GB desc;
但是为什么要针对那个 IndexType=P
进行过滤,难道您不关心其他大于 100GB 的对象(NoPI/Columnar 表,连接索引)吗?顺便说一句,这并不是 return 所有带有 PI 的表,只有 IndexNumber=1
有。
根据您的需要,您最好加入 dbc.TablesV
。
P.S - My aim is to find all the tables which are greater than 100GB in Size and have indexType as 'P'
如果您只想查找存在索引的某些 table,则根本不应该加入。请改用 EXISTS
。这会将您的条件放在它所属的 WHERE
或 HAVING
子句中,并且您的条件复制记录没有问题(在您的情况下:当 table 有多个匹配索引)。
select tablename, databasename, sum(currentperm/(1024*1024*1024)) as size_in_gb
from dbc.tablesize ts
group by tablename, databasename
having sum(currentperm/(1024*1024*1024)) > 100
and exists
(
select *
from dbc.indices i
where i.tablename = ts.tablename and i.databasename = ts.databasename
and i.indexType = 'P'
)
order by Size_in_GB desc;