数据湖分析加入
Datalake analytic join
我有 2 个table。我想要分类 URL 谁在 table [Activite_Site] 我试过下面的查询,但它不起作用......任何人都有想法。
提前谢谢你
Table [Categorie]
URL CAT
http//www.site.com/business B2B
http//www.site.com/office B2B
http//www.site.com/job B2B
http//www.site.com/home B2C
Table [Actvite_Site]
URL
http//www.site.com/business/page2/test.html
http//www.site.com/business/page3/pagetest/tot.html
http//www.site.com/office/all/tot.html
http//www.site.com/home/holiday/paris.html
http//www.site.com/home/private/moncompte.html
I would like OUTPUT :
URL_SITE CATEGORIE
http//www.site.com/business/page2/test.html B2B
http//www.site.com/business/page3/pagetest/tot.html B2B
http//www.site.com/office/all/tot.html B2B
http//www.site.com/home/holiday/paris.html B2C
http//www.site.com/home/private/moncompte.html B2C
http//www.site.com/test/pte.html Null
My query :
SELECT A.URL AS URL_SITE
C.CAT AS CATEGORIE
FROM Actvite_Site as A
LEFT Categorie as C ON C.URL==A.URL.PadLeft(C.URL.Lenght)
RE 错误 E_CSC_USER_JOINCOLUMNSEXPECTEDONEACHSIDEOFCONDITION,U-SQL 当前不支持连接条件中的派生列。
实现此目的的一种方法可能是找到匹配的 URL,然后将不匹配的 URL 合并在一起。
@category = SELECT *
FROM (
VALUES
( "http//www.site.com/business", "B2B" ),
( "http//www.site.com/office", "B2B" ),
( "http//www.site.com/job", "B2B" ),
( "http//www.site.com/home", "B2C" )
) AS x(url, cat);
@siteActivity = SELECT *
FROM (
VALUES
( "http//www.site.com/business/page2/test.html" ),
( "http//www.site.com/business/page3/pagetest/tot.html" ),
( "http//www.site.com/office/all/tot.html" ),
( "http//www.site.com/home/holiday/paris.html" ),
( "http//www.site.com/home/private/moncompte.html" ),
( "http//www.site.com/test/pte.html" )
) AS x(url);
// Find matched URLs
@working =
SELECT sa.url,
c.cat
FROM @siteActivity AS sa
CROSS JOIN
@category AS c
WHERE sa.url.Substring(0, c.url.Length) == c.url;
// Combine the matched and unmatched URLs
@output =
SELECT url,
cat
FROM @working
UNION ALL
SELECT url,
(string) null AS cat
FROM @siteActivity AS sa
ANTISEMIJOIN
@working AS w
ON sa.url == w.url;
OUTPUT @output TO "/output/output.csv"
USING Outputters.Csv(quoting:false);
我想知道是否有更有效的方法。
我有 2 个table。我想要分类 URL 谁在 table [Activite_Site] 我试过下面的查询,但它不起作用......任何人都有想法。 提前谢谢你
Table [Categorie]
URL CAT
http//www.site.com/business B2B
http//www.site.com/office B2B
http//www.site.com/job B2B
http//www.site.com/home B2C
Table [Actvite_Site]
URL
http//www.site.com/business/page2/test.html
http//www.site.com/business/page3/pagetest/tot.html
http//www.site.com/office/all/tot.html
http//www.site.com/home/holiday/paris.html
http//www.site.com/home/private/moncompte.html
I would like OUTPUT :
URL_SITE CATEGORIE
http//www.site.com/business/page2/test.html B2B
http//www.site.com/business/page3/pagetest/tot.html B2B
http//www.site.com/office/all/tot.html B2B
http//www.site.com/home/holiday/paris.html B2C
http//www.site.com/home/private/moncompte.html B2C
http//www.site.com/test/pte.html Null
My query :
SELECT A.URL AS URL_SITE
C.CAT AS CATEGORIE
FROM Actvite_Site as A
LEFT Categorie as C ON C.URL==A.URL.PadLeft(C.URL.Lenght)
RE 错误 E_CSC_USER_JOINCOLUMNSEXPECTEDONEACHSIDEOFCONDITION,U-SQL 当前不支持连接条件中的派生列。
实现此目的的一种方法可能是找到匹配的 URL,然后将不匹配的 URL 合并在一起。
@category = SELECT *
FROM (
VALUES
( "http//www.site.com/business", "B2B" ),
( "http//www.site.com/office", "B2B" ),
( "http//www.site.com/job", "B2B" ),
( "http//www.site.com/home", "B2C" )
) AS x(url, cat);
@siteActivity = SELECT *
FROM (
VALUES
( "http//www.site.com/business/page2/test.html" ),
( "http//www.site.com/business/page3/pagetest/tot.html" ),
( "http//www.site.com/office/all/tot.html" ),
( "http//www.site.com/home/holiday/paris.html" ),
( "http//www.site.com/home/private/moncompte.html" ),
( "http//www.site.com/test/pte.html" )
) AS x(url);
// Find matched URLs
@working =
SELECT sa.url,
c.cat
FROM @siteActivity AS sa
CROSS JOIN
@category AS c
WHERE sa.url.Substring(0, c.url.Length) == c.url;
// Combine the matched and unmatched URLs
@output =
SELECT url,
cat
FROM @working
UNION ALL
SELECT url,
(string) null AS cat
FROM @siteActivity AS sa
ANTISEMIJOIN
@working AS w
ON sa.url == w.url;
OUTPUT @output TO "/output/output.csv"
USING Outputters.Csv(quoting:false);
我想知道是否有更有效的方法。