U-SQL 列类型转换

U-SQL Column Type Convertion

我创建了一个 U-SQL 查询,它从 DataLake Store 获取输入文件并转换值。最终输出存储在 DataLake Store 中。

DECLARE @in string = "system/dbotable{*}.tsv";
DECLARE @out string ="system/temp.tsv";

@searchlog =
    EXTRACT 
        Id         int,
        Address    string,
        number     int
    FROM @in
    USING Extractors.Tsv();

@transactions =
    SELECT 
        *,
        ROW_NUMBER() 
            OVER(PARTITION BY Id ORDER BY Id DESC) AS RowNumber
    FROM @searchlog;

@result =
    SELECT 
        Id ,
        Address,
        number 
    FROM @transactions
    WHERE RowNumber == 1;

OUTPUT @result
    TO @out
    USING Outputters.Tsv();

并且显示以下错误,

Execution failed with error '1_SV1_Extract Error : '{"diagnosticCode":195887132,"severity":"Error","component":"RUNTIME","source":"User","errorId":"E_RUNTIME_USER_EXTRACT_COLUMN_CONVERSION_INVALID_ERROR","message":"Invalid character when attempting to convert column data.","description":"HEX: \"2243616E696E6522\" Invalid character when converting input record.\nPosition: line 1, column index: 1, column name: \"Id\".","resolution":"Check the input for errors or use \"silent\" switch to ignore over(under)-sized rows in the input.\nConsider that ignoring \"invalid\" rows may influence job results and that types have to be nullable for conversion errors to be ignored.","helpLink":""

Id 列似乎并不总是 Integer 类型

我会先将 Id 列提取为字符串,然后在第二步中尝试使用用户定义的函数将其转换为 Int,如下所示:https://msdn.microsoft.com/en-us/library/azure/mt621309.aspx(基于 DateTime 的示例)。

另一种选择是在提取器中使用 silent:true,这样您就可以自动忽略转换失败的行。