我们如何使用 jdbc 执行连接查询,而不是使用 pyspark 获取多个表
instead of fetching multiple tables using pyspark how can we execute join query using jdbc
客户 - c_id、c_name、c_address
产品 - p_id、p_name、价格
供应商 - s_id、s_name、s_address
订单 - o_id、c_id、p_id、数量、时间
SELECT o.o_id,
c.c_id,
c.c_name,
p.p_id,
p.p_name,
p.price * o.quantity AS amount
FROM customer c
JOIN orders o ON o.c_id = c.c_id
JOIN product p ON p.p_id = o.p_id;
我想执行上面的查询,而不是在 pyspark 中将 3 个表作为单独的数据帧获取并在数据帧上执行连接。
您可以使用 table 的查询 in-place,如下所述
df = spark.read.jdbc(
"url", "(query) as table",
properties={"user":"username", "password":"password"})
在你的情况下它将是:
df = spark.read.jdbc("url", """
(
SELECT o.o_id,
c.c_id,
c.c_name,
p.p_id,
p.p_name,
p.price * o.quantity AS amount
FROM customer c
JOIN orders o ON o.c_id = c.c_id
JOIN product p ON p.p_id = o.p_id
) as table""", properties={"user":"username", "password":"password"})
这 answer has used this type of query in place of table. Also this question 与您的情况相关
客户 - c_id、c_name、c_address 产品 - p_id、p_name、价格 供应商 - s_id、s_name、s_address 订单 - o_id、c_id、p_id、数量、时间
SELECT o.o_id,
c.c_id,
c.c_name,
p.p_id,
p.p_name,
p.price * o.quantity AS amount
FROM customer c
JOIN orders o ON o.c_id = c.c_id
JOIN product p ON p.p_id = o.p_id;
我想执行上面的查询,而不是在 pyspark 中将 3 个表作为单独的数据帧获取并在数据帧上执行连接。
您可以使用 table 的查询 in-place,如下所述
df = spark.read.jdbc(
"url", "(query) as table",
properties={"user":"username", "password":"password"})
在你的情况下它将是:
df = spark.read.jdbc("url", """
(
SELECT o.o_id,
c.c_id,
c.c_name,
p.p_id,
p.p_name,
p.price * o.quantity AS amount
FROM customer c
JOIN orders o ON o.c_id = c.c_id
JOIN product p ON p.p_id = o.p_id
) as table""", properties={"user":"username", "password":"password"})
这 answer has used this type of query in place of table. Also this question 与您的情况相关