Data Lake Analytics - 如何 select/insert 变成具有更少列的 table?
Data Lake Analytics - How to select/insert into a table with fewer columns?
我有一个 table,我想将查询结果输出到其中。
目标 table 的列少于源 table。
运行 一个简单的插入 select 语句会导致以下错误:
E_CSC_USER_INSERTTOOMANYCOLUMNSSPECIFIESPARTITION: The source for a single partition INSERT statement contains more items than the target's actual columns.
Description:
The number of source columns may not exceed the number of actual target table columns. Virtual columns should not be provided in the source rowset.
Resolution:
Match the schema of the source to the actual (non-virtual) columns
我尝试使用将列限制为正确集合的视图作为中介,但这仍然会产生相同的错误。
当我必须在两个 table 中始终具有相同的列时,如何在数据湖分析中将数据从一个 table 移动到另一个?
编辑:
示例 DDL
创建table:
CREATE TABLE dbo.log
(
DateStamp DateTime,
code string,
ipAddresses string,
method string,
column4 string,
column5 string,
column6 string,
url string,
userAgent string,
queryString string,
cookie string,
column11 string,
column12 string,
column13 string,
column14 string,
column15 string,
column16 string,
Query_a1 string,
Query_c1 string,
Query_c2 string,
Query_a2 string,
Query_z string,
Query_l string,
[Cookie_ID] string,
INDEX clx_log
CLUSTERED(Query_a,Query_l ASC)
)
PARTITIONED BY (DateStamp)
DISTRIBUTED BY ROUND ROBIN;
Table 插入到
CREATE TABLE dbo.[item_views]
(
DateStamp DateTime,
a string,
c1 string,
c2 string,
l string,
Cookie_ID string,
source string,
INDEX clx_item_views
CLUSTERED(c1, l ASC)
)
PARTITIONED BY (DateStamp)
DISTRIBUTED BY HASH (l);
插入导致异常的语句:
INSERT dbo.[item_views]
(
DateStamp,
a ,
c1,
c2,
l,
Cookie_ID,
source
)
PARTITION (@partition1)
SELECT
DateStamp,
Query_a1,
Query_c1,
Query_c2,
Query_l,
[Cookie_Id],
"abcd"
FROM dbo.log
WHERE DateStamp.Date == "2017-01-01";
您的示例有点令人困惑,因为它无法编译,但错误的基本意思是:如果您要插入分区 table,则不必在列中包含分区列并插入列表,因为您使用 PARTITION
子句指定它,例如:
INSERT dbo.[item_views]
(
a,
c1,
c2,
l,
Cookie_ID,
source
)
PARTITION
(
@partition1
)
SELECT Query_a1,
Query_c1,
Query_c2,
Query_l,
Cookie_ID,
"abcd" AS source
FROM dbo.log
WHERE DateStamp.Date == "2017-01-01";
我有一个 table,我想将查询结果输出到其中。
目标 table 的列少于源 table。
运行 一个简单的插入 select 语句会导致以下错误:
E_CSC_USER_INSERTTOOMANYCOLUMNSSPECIFIESPARTITION: The source for a single partition INSERT statement contains more items than the target's actual columns. Description: The number of source columns may not exceed the number of actual target table columns. Virtual columns should not be provided in the source rowset. Resolution: Match the schema of the source to the actual (non-virtual) columns
我尝试使用将列限制为正确集合的视图作为中介,但这仍然会产生相同的错误。
当我必须在两个 table 中始终具有相同的列时,如何在数据湖分析中将数据从一个 table 移动到另一个?
编辑:
示例 DDL
创建table:
CREATE TABLE dbo.log
(
DateStamp DateTime,
code string,
ipAddresses string,
method string,
column4 string,
column5 string,
column6 string,
url string,
userAgent string,
queryString string,
cookie string,
column11 string,
column12 string,
column13 string,
column14 string,
column15 string,
column16 string,
Query_a1 string,
Query_c1 string,
Query_c2 string,
Query_a2 string,
Query_z string,
Query_l string,
[Cookie_ID] string,
INDEX clx_log
CLUSTERED(Query_a,Query_l ASC)
)
PARTITIONED BY (DateStamp)
DISTRIBUTED BY ROUND ROBIN;
Table 插入到
CREATE TABLE dbo.[item_views]
(
DateStamp DateTime,
a string,
c1 string,
c2 string,
l string,
Cookie_ID string,
source string,
INDEX clx_item_views
CLUSTERED(c1, l ASC)
)
PARTITIONED BY (DateStamp)
DISTRIBUTED BY HASH (l);
插入导致异常的语句:
INSERT dbo.[item_views]
(
DateStamp,
a ,
c1,
c2,
l,
Cookie_ID,
source
)
PARTITION (@partition1)
SELECT
DateStamp,
Query_a1,
Query_c1,
Query_c2,
Query_l,
[Cookie_Id],
"abcd"
FROM dbo.log
WHERE DateStamp.Date == "2017-01-01";
您的示例有点令人困惑,因为它无法编译,但错误的基本意思是:如果您要插入分区 table,则不必在列中包含分区列并插入列表,因为您使用 PARTITION
子句指定它,例如:
INSERT dbo.[item_views]
(
a,
c1,
c2,
l,
Cookie_ID,
source
)
PARTITION
(
@partition1
)
SELECT Query_a1,
Query_c1,
Query_c2,
Query_l,
Cookie_ID,
"abcd" AS source
FROM dbo.log
WHERE DateStamp.Date == "2017-01-01";