连接具有相同结构但不同数据的多个表
Join multiple tables with same structure but different data
我正在尝试在 8 个 table 之间进行连接,因为每个 table 都有超过 500,000 个条目,所以速度非常慢。我想知道,你有什么最好的方法加入这些 tables 吗?
所有 table 的结构都是这样的:
data_temprature:
+----+----------+-----+-----------+----------+
| ID_geo | NAME | Value | Date |
+--------+----------+-------+-----------------+
| 10005 | Madrid | 32 | 2017-06-12 08:00|
| 10005 | Madrid | 25 | 2017-06-12 09:00|
| 12701 | Paris | 23 | 2017-06-12 08:00|
| 13006 | Tokyo | 25 | 2017-06-12 11:00|
| 11132 | Sevilla | 27 | 2017-06-12 16:00|
| 21333 | London | 22 | 2017-06-12 17:00|
+--------+----------+-------+-----------------+
data_WeatherSimbol
+----+----------+-----+-----------+----------+
| ID_geo | NAME | Value | Date |
+--------+----------+-------+-----------------+
| 10005 | Madrid | A+ | 2017-06-12 08:00|
| 10005 | Madrid | A | 2017-06-12 09:00|
| 12701 | Paris | A- | 2017-06-12 08:00|
| 13006 | Tokyo | C- | 2017-06-12 11:00|
| 11132 | Sevilla | I+ | 2017-06-12 16:00|
| 21333 | London | D- | 2017-06-12 17:00|
+--------+----------+-------+-----------------+
我想进行连接以获得此结果:
+----+----------+-----+-----------+----------+-----------------+
| ID_geo | NAME | Temperature | Simboles | Date |
+--------+----------+-------------+----------+-----------------+
| 10005 | Madrid | 32 | A+ | 2017-06-12 08:00|
| 10005 | Madrid | 25 | A | 2017-06-12 09:00|
| 12701 | Paris | 23 | A- | 2017-06-12 08:00|
| 13006 | Tokyo | 25 | C- | 2017-06-12 11:00|
| 11132 | Sevilla | 27 | I+ | 2017-06-12 16:00|
| 21333 | London | 22 | D- | 2017-06-12 17:00|
+--------+----------+-------------+----------+-----------------+
谢谢
更新真实数据提供:
执行计划:
https://files.fm/u/b4besk27
这是查询:
SELECT
cielo.data_value AS cielo,
lluv.data_value AS lluvia,
temp.data_value AS temp,
vientos.data_value AS viento,
tmin.data_value AS tempmin,
tmax.data_value AS tempmax,
cielo.data_date AS DiaPrev
FROM
data_cielo AS cielo
INNER JOIN data_lluvia AS lluv ON cielo.data_geo = lluv.data_geo
INNER JOIN data_presion AS pres ON cielo.data_geo = pres.data_geo
INNER JOIN data_temp AS temp ON cielo.data_geo = temp.data_geo
LEFT JOIN data_tempmax AS tmax ON cielo.data_geo = tmax.data_geo
LEFT JOIN data_tempmin AS tmin ON cielo.data_geo = tmin.data_geo
INNER JOIN data_viento AS vientos ON cielo.data_geo = vientos.data_geo
WHERE
cielo.data_date = lluv.data_date
AND pres.data_date = cielo.data_date
AND vientos.data_date = pres.data_date
AND temp.data_date = vientos.data_date
AND cielo.data_geo = 46 ORDER BY cielo.data_date;
and this is the result:
E+ 0.0461028 29.6937088 S2 19.408 36.39 2017-06-13 12:00:00.000
E+ 0.0461028 29.6937088 S2 21.422 36.39 2017-06-13 12:00:00.000
E+ 0.0461028 29.6937088 S2 19.408 37.853 2017-06-13 12:00:00.000
E+ 0.0461028 29.6937088 S2 21.422 37.853 2017-06-13 12:00:00.000
E+ 0.0461028 30.7593854 S2 19.408 36.39 2017-06-13 13:00:00.000
E+ 0.0461028 30.7593854 S2 21.422 36.39 2017-06-13 13:00:00.000
E+ 0.0461028 30.7593854 S2 19.408 37.853 2017-06-13 13:00:00.000
E+ 0.0461028 30.7593854 S2 21.422 37.853 2017-06-13 13:00:00.000
A+ 0.0461028 31.6310774 SSW2 19.408 36.39 2017-06-13 14:00:00.000
A+ 0.0461028 31.6310774 SSW2 21.422 36.39 2017-06-13 14:00:00.000
A+ 0.0461028 31.6310774 SSW2 19.408 37.853 2017-06-13 14:00:00.000
A+ 0.0461028 31.6310774 SSW2 21.422 37.853 2017-06-13 14:00:00.000
A 0.0461028 32.2647927 S2 19.408 36.39 2017-06-13 15:00:00.000
A 0.0461028 32.2647927 S2 21.422 36.39 2017-06-13 15:00:00.000
A 0.0461028 32.2647927 S2 19.408 37.853 2017-06-13 15:00:00.000
它不应该像这样,我需要像我所说的温度、压力、沉淀、天空等每小时数据值的结果,......
试试这个
;With data_temprature(ID_geo,NAME,Value,[Date])
AS
(
SELECT 10005 , 'Madrid' , 32 , '2017-06-12 08:00' Union all
SELECT 10005 , 'Madrid' , 25 , '2017-06-12 09:00' Union all
SELECT 12701 , 'Paris' , 23 , '2017-06-12 08:00' Union all
SELECT 13006 , 'Tokyo' , 25 , '2017-06-12 11:00' Union all
SELECT 11132 , 'Sevilla' , 27 , '2017-06-12 16:00' Union all
SELECT 21333 , 'London' , 22 , '2017-06-12 17:00'
)
,data_WeatherSimbol(ID_geo,NAME,Value,[Date])
AS
(
SELECT 10005 , 'Madrid' , 'A+' , '2017-06-12 08:00' Union all
SELECT 10005 , 'Madrid' , 'A' , '2017-06-12 09:00' Union all
SELECT 12701 , 'Paris' , 'A-' , '2017-06-12 08:00' Union all
SELECT 13006 , 'Tokyo' , 'C-' , '2017-06-12 11:00' Union all
SELECT 11132 , 'Sevilla' , 'I+' , '2017-06-12 16:00' Union all
SELECT 21333 , 'London' , 'D-' , '2017-06-12 17:00'
)
SELECT ID_geo,
NAME,
Temperature,
Symboles,
[Date] From
(
SELECT t.ID_geo ,
t.NAME ,
t.Value AS Temperature,
w.Value AS Symboles,t.[Date] ,
ROW_NUMBER()OVER(PARTITION BY t.Value,t.[Date] ORDER BY t.[Date]) AS Rno
FROM data_temprature t
INNER join data_WeatherSimbol w
On t.ID_geo=w.ID_geo
)Dt
WHERE Dt.Rno=1
ORDER BY ID_geo
我想你可以加入地理和日期:
select t.*, ws.simboles
from data_temperature t join
data_WeatherSimbol ws
on t.ID_geo = ws.ID_geo and t.date = ws.date;
[ID_geo]
和 [Date]
似乎都不够独特,无法加入,所以:
为所有表的两列创建索引,如
create index IX_data_temprature on data_temprature ([ID_geo], [Date])
按[ID_geo]
、[Date]
加入所有表
大部分查询负载是由 RID 查找引起的。
RID 查找在索引不包含查询时使用(Sql 必须查找 table 中的值,因为它们不包含在索引中)并且索引是非-集群。
如果您使用覆盖索引,您的查询可能会更快,您可能没有在索引中包含值。有关包含的更多信息,请参见 Microsoft docs。
如果您将非聚集索引更改为聚集索引,这也可能有所帮助。
我正在尝试在 8 个 table 之间进行连接,因为每个 table 都有超过 500,000 个条目,所以速度非常慢。我想知道,你有什么最好的方法加入这些 tables 吗?
所有 table 的结构都是这样的:
data_temprature:
+----+----------+-----+-----------+----------+
| ID_geo | NAME | Value | Date |
+--------+----------+-------+-----------------+
| 10005 | Madrid | 32 | 2017-06-12 08:00|
| 10005 | Madrid | 25 | 2017-06-12 09:00|
| 12701 | Paris | 23 | 2017-06-12 08:00|
| 13006 | Tokyo | 25 | 2017-06-12 11:00|
| 11132 | Sevilla | 27 | 2017-06-12 16:00|
| 21333 | London | 22 | 2017-06-12 17:00|
+--------+----------+-------+-----------------+
data_WeatherSimbol
+----+----------+-----+-----------+----------+
| ID_geo | NAME | Value | Date |
+--------+----------+-------+-----------------+
| 10005 | Madrid | A+ | 2017-06-12 08:00|
| 10005 | Madrid | A | 2017-06-12 09:00|
| 12701 | Paris | A- | 2017-06-12 08:00|
| 13006 | Tokyo | C- | 2017-06-12 11:00|
| 11132 | Sevilla | I+ | 2017-06-12 16:00|
| 21333 | London | D- | 2017-06-12 17:00|
+--------+----------+-------+-----------------+
我想进行连接以获得此结果:
+----+----------+-----+-----------+----------+-----------------+
| ID_geo | NAME | Temperature | Simboles | Date |
+--------+----------+-------------+----------+-----------------+
| 10005 | Madrid | 32 | A+ | 2017-06-12 08:00|
| 10005 | Madrid | 25 | A | 2017-06-12 09:00|
| 12701 | Paris | 23 | A- | 2017-06-12 08:00|
| 13006 | Tokyo | 25 | C- | 2017-06-12 11:00|
| 11132 | Sevilla | 27 | I+ | 2017-06-12 16:00|
| 21333 | London | 22 | D- | 2017-06-12 17:00|
+--------+----------+-------------+----------+-----------------+
谢谢
更新真实数据提供:
执行计划: https://files.fm/u/b4besk27
这是查询:
SELECT
cielo.data_value AS cielo,
lluv.data_value AS lluvia,
temp.data_value AS temp,
vientos.data_value AS viento,
tmin.data_value AS tempmin,
tmax.data_value AS tempmax,
cielo.data_date AS DiaPrev
FROM
data_cielo AS cielo
INNER JOIN data_lluvia AS lluv ON cielo.data_geo = lluv.data_geo
INNER JOIN data_presion AS pres ON cielo.data_geo = pres.data_geo
INNER JOIN data_temp AS temp ON cielo.data_geo = temp.data_geo
LEFT JOIN data_tempmax AS tmax ON cielo.data_geo = tmax.data_geo
LEFT JOIN data_tempmin AS tmin ON cielo.data_geo = tmin.data_geo
INNER JOIN data_viento AS vientos ON cielo.data_geo = vientos.data_geo
WHERE
cielo.data_date = lluv.data_date
AND pres.data_date = cielo.data_date
AND vientos.data_date = pres.data_date
AND temp.data_date = vientos.data_date
AND cielo.data_geo = 46 ORDER BY cielo.data_date;
and this is the result:
E+ 0.0461028 29.6937088 S2 19.408 36.39 2017-06-13 12:00:00.000
E+ 0.0461028 29.6937088 S2 21.422 36.39 2017-06-13 12:00:00.000
E+ 0.0461028 29.6937088 S2 19.408 37.853 2017-06-13 12:00:00.000
E+ 0.0461028 29.6937088 S2 21.422 37.853 2017-06-13 12:00:00.000
E+ 0.0461028 30.7593854 S2 19.408 36.39 2017-06-13 13:00:00.000
E+ 0.0461028 30.7593854 S2 21.422 36.39 2017-06-13 13:00:00.000
E+ 0.0461028 30.7593854 S2 19.408 37.853 2017-06-13 13:00:00.000
E+ 0.0461028 30.7593854 S2 21.422 37.853 2017-06-13 13:00:00.000
A+ 0.0461028 31.6310774 SSW2 19.408 36.39 2017-06-13 14:00:00.000
A+ 0.0461028 31.6310774 SSW2 21.422 36.39 2017-06-13 14:00:00.000
A+ 0.0461028 31.6310774 SSW2 19.408 37.853 2017-06-13 14:00:00.000
A+ 0.0461028 31.6310774 SSW2 21.422 37.853 2017-06-13 14:00:00.000
A 0.0461028 32.2647927 S2 19.408 36.39 2017-06-13 15:00:00.000
A 0.0461028 32.2647927 S2 21.422 36.39 2017-06-13 15:00:00.000
A 0.0461028 32.2647927 S2 19.408 37.853 2017-06-13 15:00:00.000
它不应该像这样,我需要像我所说的温度、压力、沉淀、天空等每小时数据值的结果,......
试试这个
;With data_temprature(ID_geo,NAME,Value,[Date])
AS
(
SELECT 10005 , 'Madrid' , 32 , '2017-06-12 08:00' Union all
SELECT 10005 , 'Madrid' , 25 , '2017-06-12 09:00' Union all
SELECT 12701 , 'Paris' , 23 , '2017-06-12 08:00' Union all
SELECT 13006 , 'Tokyo' , 25 , '2017-06-12 11:00' Union all
SELECT 11132 , 'Sevilla' , 27 , '2017-06-12 16:00' Union all
SELECT 21333 , 'London' , 22 , '2017-06-12 17:00'
)
,data_WeatherSimbol(ID_geo,NAME,Value,[Date])
AS
(
SELECT 10005 , 'Madrid' , 'A+' , '2017-06-12 08:00' Union all
SELECT 10005 , 'Madrid' , 'A' , '2017-06-12 09:00' Union all
SELECT 12701 , 'Paris' , 'A-' , '2017-06-12 08:00' Union all
SELECT 13006 , 'Tokyo' , 'C-' , '2017-06-12 11:00' Union all
SELECT 11132 , 'Sevilla' , 'I+' , '2017-06-12 16:00' Union all
SELECT 21333 , 'London' , 'D-' , '2017-06-12 17:00'
)
SELECT ID_geo,
NAME,
Temperature,
Symboles,
[Date] From
(
SELECT t.ID_geo ,
t.NAME ,
t.Value AS Temperature,
w.Value AS Symboles,t.[Date] ,
ROW_NUMBER()OVER(PARTITION BY t.Value,t.[Date] ORDER BY t.[Date]) AS Rno
FROM data_temprature t
INNER join data_WeatherSimbol w
On t.ID_geo=w.ID_geo
)Dt
WHERE Dt.Rno=1
ORDER BY ID_geo
我想你可以加入地理和日期:
select t.*, ws.simboles
from data_temperature t join
data_WeatherSimbol ws
on t.ID_geo = ws.ID_geo and t.date = ws.date;
[ID_geo]
和 [Date]
似乎都不够独特,无法加入,所以:
为所有表的两列创建索引,如
create index IX_data_temprature on data_temprature ([ID_geo], [Date])
按
[ID_geo]
、[Date]
加入所有表
大部分查询负载是由 RID 查找引起的。
RID 查找在索引不包含查询时使用(Sql 必须查找 table 中的值,因为它们不包含在索引中)并且索引是非-集群。
如果您使用覆盖索引,您的查询可能会更快,您可能没有在索引中包含值。有关包含的更多信息,请参见 Microsoft docs。
如果您将非聚集索引更改为聚集索引,这也可能有所帮助。