意外的行数。加入不当?
Unexpected number of rows. improper join?
我被这个难住了。我是初学者,也许我对连接的了解太深了。作为一种方法,我为结果的不同功能分别编写了查询。我现在正在尝试合并,但它正在中断。
我使用 Impala 进行查询,并在使用 HIVE 导入 HDFS 之前在 MySQL 中创建了表。
有效的查询是:
SELECT carrier, zone, speed, min_price, price
FROM (SELECT carrier, total_wt, zone, speed, price, MIN(price) OVER(PARTITION BY speed) as min_price
FROM shipping_prices
WHERE total_wt IN (
SELECT
SUM(prod.shipping_wt)
FROM order_details ordet
JOIN products prod
ON prod.prod_id = ordet.prod_id
WHERE ordet.order_id = 5841506
GROUP BY ordet.order_id)
AND zone IN (
SELECT cal_zone
FROM (
SELECT carrier, dest_zip, origin_zip, zone, MIN(zone) OVER(PARTITION BY carrier) as cal_zone
FROM shipping_zones
WHERE (origin_zip = 402 OR origin_zip = 950) AND dest_zip = '560'
) t
WHERE zone=cal_zone)
) z
WHERE price=min_price
ORDER BY speed DESC;
这个returns:
+---------+------+-------+-----------+-------+
| carrier | zone | speed | min_price | price |
+---------+------+-------+-----------+-------+
| fedex | 4 | slow | 10.86 | 10.86 |
| usps | 4 | med | 11.15 | 11.15 |
| usps | 4 | fast | 40.55 | 40.55 |
+---------+------+-------+-----------+-------+
我遇到问题的查询似乎在最终结果之前的步骤失败了,当时它应该找到总共 9 个价格,每个承运人每个级别一个。取而代之的是 returns 27,因此它在每个运营商每个级别找到 9 个价格。这没有意义,因为每个运营商每个级别只有 3 个价格。这是查询:
SELECT lev5.zone, lev5.cust_id, lev5.total_wt, lev5.zip_shrt, lev5.order_id, lev5.carrier, pric.speed, pric.price, MIN(price) OVER(PARTITION BY speed) as min_price
FROM (
SELECT lev4.cust_id, lev4.total_wt, lev4.zip_shrt, lev4.order_id, lev4.carrier, lev4.zone
FROM (
SELECT lev3.cust_id, lev3.total_wt, lev3.zip_shrt, lev3.order_id, zon.carrier, zon.origin_zip, zon.dest_zip, zon.zone, MIN(zon.zone) OVER(PARTITION BY zon.carrier) as calc_zone
FROM (
SELECT lev2.cust_id, SUM(shipping_wt)/2 AS total_wt, STRLEFT(lev2.zipcode, 3) AS zip_shrt, lev2.order_id
FROM (
SELECT lev1.zipcode, lev1.cust_id, lev1.order_id, ordet.prod_id
FROM (
SELECT cus.zipcode, cus.cust_id, ord.order_id
FROM orders ord
JOIN customers cus ON ord.cust_id = cus.cust_id
WHERE ord.order_id = 5841506
) lev1
JOIN order_details ordet
ON lev1.order_id = ordet.order_id
WHERE ordet.order_id = lev1.order_id
) lev2
JOIN products prod
ON lev2.prod_id = prod.prod_id
GROUP BY lev2.cust_id, zip_shrt, lev2.order_id
) lev3
JOIN shipping_zones zon
ON lev3.zip_shrt = zon.dest_zip
WHERE (origin_zip = 402 OR origin_zip = 950) AND dest_zip = lev3.zip_shrt
) lev4
WHERE lev4.zone = calc_zone
) lev5
JOIN shipping_prices pric
ON lev5.zone = pric.zone AND lev5.total_wt = pric.total_wt;
这个returns:
+------+---------+----------+----------+----------+---------+-------+-------------------+-----------+
| zone | cust_id | total_wt | zip_shrt | order_id | carrier | speed | price | min_price |
+------+---------+----------+----------+----------+---------+-------+-------------------+-----------+
| 4 | 1050349 | 4 | 560 | 5841506 | ups | med | 22.24 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | med | 23.57 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | med | 11.15 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | med | 11.15 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | med | 22.24 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | med | 23.57 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | med | 23.57 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | med | 11.15 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | med | 22.24 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | slow | 11.15 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | slow | 11.15 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | slow | 10.86 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | slow | 11.15 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | slow | 11.15 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | slow | 10.86 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | slow | 11.15 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | slow | 11.15 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | slow | 10.86 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | fast | 69.93000000000001 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | fast | 40.55 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | fast | 65.94 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | fast | 69.93000000000001 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | fast | 40.55 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | fast | 65.94 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | fast | 69.93000000000001 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | fast | 40.55 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | fast | 65.94 | 40.55 |
+------+---------+----------+----------+----------+---------+-------+-------------------+-----------+
我进入 excel 确认(shipping_prices 的源数据)并执行了 COUNTIFS 函数,它只返回了 9 行,这是应该的。
非常感谢您的帮助。我真的一直在尝试自己解决它,但也许是我缺乏连接知识或其他我没有经验发现的东西。我真的为自己能走到这一步感到自豪,但这个障碍让我很沮丧。
谢谢
这就是我最终得到的结果,它给了我预期的结果:
SELECT lev6.order_id, lev6.carrier, lev6.speed, lev6.min_price AS cheapest_price
FROM (
SELECT lev5.zone, lev5.cust_id, lev5.total_wt, lev5.zip_shrt, lev5.order_id, lev5.origin_zip, lev5.carrier, pric.speed, pric.price, MIN(pric.price) OVER(PARTITION BY speed) as min_price
FROM (
SELECT lev4.cust_id, lev4.total_wt, lev4.zip_shrt, lev4.order_id, lev4.carrier, lev4.zone, lev4.origin_zip
FROM (
SELECT lev3.cust_id, lev3.total_wt, lev3.zip_shrt, lev3.order_id, zon.carrier, zon.origin_zip, zon.dest_zip, zon.zone, MIN(zon.zone) OVER(PARTITION BY zon.carrier) as calc_zone
FROM (
SELECT lev2.cust_id, SUM(shipping_wt)/2 AS total_wt, STRLEFT(lev2.zipcode, 3) AS zip_shrt, lev2.order_id
FROM (
SELECT lev1.zipcode, lev1.cust_id, lev1.order_id, ordet.prod_id
FROM (
SELECT cus.zipcode, cus.cust_id, ord.order_id
FROM orders ord
JOIN customers cus ON ord.cust_id = cus.cust_id
WHERE ord.order_id = 6453746
) lev1
JOIN order_details ordet
ON lev1.order_id = ordet.order_id
WHERE ordet.order_id = lev1.order_id
) lev2
JOIN products prod
ON lev2.prod_id = prod.prod_id
GROUP BY lev2.cust_id, zip_shrt, lev2.order_id
) lev3
JOIN shipping_zones zon
ON lev3.zip_shrt = zon.dest_zip
WHERE (origin_zip = 402 OR origin_zip = 950) AND dest_zip = lev3.zip_shrt
) lev4
WHERE lev4.zone = calc_zone
) lev5
JOIN shipping_prices pric
ON lev5.zone = pric.zone AND lev5.total_wt = pric.total_wt
WHERE lev5.carrier = pric.carrier
) lev6
WHERE lev6.price = lev6.min_price
ORDER BY lev6.speed DESC;
我的问题是我的 lev5 没有 WHERE 条件。我补充说:
WHERE lev5.carrier = pric.carrier
这消除了重复项并正确地 return 仅编辑了 9 个运费,每个承运商 3 个(慢速、中速、快速)。然后我将 lev6 添加到仅 return 每个速度的最低价格。
@Thorsten Kettner 至于为什么要把产品的重量总和除以2,很遗憾我没有答案。如果我不这样做 return 订单的实际重量加倍,不知道为什么。
这个查询不是防弹的,也没有针对空值和所有情况进行测试(比如当两个价格在相同速度下相等时),但它足以满足我想要做的事情并证明了概念。
我被这个难住了。我是初学者,也许我对连接的了解太深了。作为一种方法,我为结果的不同功能分别编写了查询。我现在正在尝试合并,但它正在中断。
我使用 Impala 进行查询,并在使用 HIVE 导入 HDFS 之前在 MySQL 中创建了表。
有效的查询是:
SELECT carrier, zone, speed, min_price, price
FROM (SELECT carrier, total_wt, zone, speed, price, MIN(price) OVER(PARTITION BY speed) as min_price
FROM shipping_prices
WHERE total_wt IN (
SELECT
SUM(prod.shipping_wt)
FROM order_details ordet
JOIN products prod
ON prod.prod_id = ordet.prod_id
WHERE ordet.order_id = 5841506
GROUP BY ordet.order_id)
AND zone IN (
SELECT cal_zone
FROM (
SELECT carrier, dest_zip, origin_zip, zone, MIN(zone) OVER(PARTITION BY carrier) as cal_zone
FROM shipping_zones
WHERE (origin_zip = 402 OR origin_zip = 950) AND dest_zip = '560'
) t
WHERE zone=cal_zone)
) z
WHERE price=min_price
ORDER BY speed DESC;
这个returns:
+---------+------+-------+-----------+-------+
| carrier | zone | speed | min_price | price |
+---------+------+-------+-----------+-------+
| fedex | 4 | slow | 10.86 | 10.86 |
| usps | 4 | med | 11.15 | 11.15 |
| usps | 4 | fast | 40.55 | 40.55 |
+---------+------+-------+-----------+-------+
我遇到问题的查询似乎在最终结果之前的步骤失败了,当时它应该找到总共 9 个价格,每个承运人每个级别一个。取而代之的是 returns 27,因此它在每个运营商每个级别找到 9 个价格。这没有意义,因为每个运营商每个级别只有 3 个价格。这是查询:
SELECT lev5.zone, lev5.cust_id, lev5.total_wt, lev5.zip_shrt, lev5.order_id, lev5.carrier, pric.speed, pric.price, MIN(price) OVER(PARTITION BY speed) as min_price
FROM (
SELECT lev4.cust_id, lev4.total_wt, lev4.zip_shrt, lev4.order_id, lev4.carrier, lev4.zone
FROM (
SELECT lev3.cust_id, lev3.total_wt, lev3.zip_shrt, lev3.order_id, zon.carrier, zon.origin_zip, zon.dest_zip, zon.zone, MIN(zon.zone) OVER(PARTITION BY zon.carrier) as calc_zone
FROM (
SELECT lev2.cust_id, SUM(shipping_wt)/2 AS total_wt, STRLEFT(lev2.zipcode, 3) AS zip_shrt, lev2.order_id
FROM (
SELECT lev1.zipcode, lev1.cust_id, lev1.order_id, ordet.prod_id
FROM (
SELECT cus.zipcode, cus.cust_id, ord.order_id
FROM orders ord
JOIN customers cus ON ord.cust_id = cus.cust_id
WHERE ord.order_id = 5841506
) lev1
JOIN order_details ordet
ON lev1.order_id = ordet.order_id
WHERE ordet.order_id = lev1.order_id
) lev2
JOIN products prod
ON lev2.prod_id = prod.prod_id
GROUP BY lev2.cust_id, zip_shrt, lev2.order_id
) lev3
JOIN shipping_zones zon
ON lev3.zip_shrt = zon.dest_zip
WHERE (origin_zip = 402 OR origin_zip = 950) AND dest_zip = lev3.zip_shrt
) lev4
WHERE lev4.zone = calc_zone
) lev5
JOIN shipping_prices pric
ON lev5.zone = pric.zone AND lev5.total_wt = pric.total_wt;
这个returns:
+------+---------+----------+----------+----------+---------+-------+-------------------+-----------+
| zone | cust_id | total_wt | zip_shrt | order_id | carrier | speed | price | min_price |
+------+---------+----------+----------+----------+---------+-------+-------------------+-----------+
| 4 | 1050349 | 4 | 560 | 5841506 | ups | med | 22.24 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | med | 23.57 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | med | 11.15 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | med | 11.15 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | med | 22.24 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | med | 23.57 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | med | 23.57 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | med | 11.15 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | med | 22.24 | 11.15 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | slow | 11.15 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | slow | 11.15 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | slow | 10.86 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | slow | 11.15 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | slow | 11.15 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | slow | 10.86 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | slow | 11.15 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | slow | 11.15 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | slow | 10.86 | 10.86 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | fast | 69.93000000000001 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | fast | 40.55 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | ups | fast | 65.94 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | fast | 69.93000000000001 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | fast | 40.55 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | usps | fast | 65.94 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | fast | 69.93000000000001 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | fast | 40.55 | 40.55 |
| 4 | 1050349 | 4 | 560 | 5841506 | fedex | fast | 65.94 | 40.55 |
+------+---------+----------+----------+----------+---------+-------+-------------------+-----------+
我进入 excel 确认(shipping_prices 的源数据)并执行了 COUNTIFS 函数,它只返回了 9 行,这是应该的。
非常感谢您的帮助。我真的一直在尝试自己解决它,但也许是我缺乏连接知识或其他我没有经验发现的东西。我真的为自己能走到这一步感到自豪,但这个障碍让我很沮丧。
谢谢
这就是我最终得到的结果,它给了我预期的结果:
SELECT lev6.order_id, lev6.carrier, lev6.speed, lev6.min_price AS cheapest_price
FROM (
SELECT lev5.zone, lev5.cust_id, lev5.total_wt, lev5.zip_shrt, lev5.order_id, lev5.origin_zip, lev5.carrier, pric.speed, pric.price, MIN(pric.price) OVER(PARTITION BY speed) as min_price
FROM (
SELECT lev4.cust_id, lev4.total_wt, lev4.zip_shrt, lev4.order_id, lev4.carrier, lev4.zone, lev4.origin_zip
FROM (
SELECT lev3.cust_id, lev3.total_wt, lev3.zip_shrt, lev3.order_id, zon.carrier, zon.origin_zip, zon.dest_zip, zon.zone, MIN(zon.zone) OVER(PARTITION BY zon.carrier) as calc_zone
FROM (
SELECT lev2.cust_id, SUM(shipping_wt)/2 AS total_wt, STRLEFT(lev2.zipcode, 3) AS zip_shrt, lev2.order_id
FROM (
SELECT lev1.zipcode, lev1.cust_id, lev1.order_id, ordet.prod_id
FROM (
SELECT cus.zipcode, cus.cust_id, ord.order_id
FROM orders ord
JOIN customers cus ON ord.cust_id = cus.cust_id
WHERE ord.order_id = 6453746
) lev1
JOIN order_details ordet
ON lev1.order_id = ordet.order_id
WHERE ordet.order_id = lev1.order_id
) lev2
JOIN products prod
ON lev2.prod_id = prod.prod_id
GROUP BY lev2.cust_id, zip_shrt, lev2.order_id
) lev3
JOIN shipping_zones zon
ON lev3.zip_shrt = zon.dest_zip
WHERE (origin_zip = 402 OR origin_zip = 950) AND dest_zip = lev3.zip_shrt
) lev4
WHERE lev4.zone = calc_zone
) lev5
JOIN shipping_prices pric
ON lev5.zone = pric.zone AND lev5.total_wt = pric.total_wt
WHERE lev5.carrier = pric.carrier
) lev6
WHERE lev6.price = lev6.min_price
ORDER BY lev6.speed DESC;
我的问题是我的 lev5 没有 WHERE 条件。我补充说:
WHERE lev5.carrier = pric.carrier
这消除了重复项并正确地 return 仅编辑了 9 个运费,每个承运商 3 个(慢速、中速、快速)。然后我将 lev6 添加到仅 return 每个速度的最低价格。
@Thorsten Kettner 至于为什么要把产品的重量总和除以2,很遗憾我没有答案。如果我不这样做 return 订单的实际重量加倍,不知道为什么。
这个查询不是防弹的,也没有针对空值和所有情况进行测试(比如当两个价格在相同速度下相等时),但它足以满足我想要做的事情并证明了概念。