获取空输出
Get empty output
我是 运行 简单的猪代码,我的输入是 :
给出的 csv 文件
CRE_28004 = LOAD '$input' USING PigStorage(';') AS (MGM_COMPTEUR:chararray,CIA_CD_CRV_CIA:chararray,CIA_DA_EM_CRV:chararray,CIA_CD_CTRL_BLCE:chararray,CIA_IDC_EXTR_RDJ:chararray,CIA_VLR_IDT_CRV_LOQ:chararray,CIA_VLR_REF_CRV:chararray,CIA_NO_SEQ_CRV:chararray,CIA_VLR_LG_ZON_RTG:chararray,CIA_HEU_CIA:chararray,CIA_TM_STP_CRE:chararray,CIA_CD_SI:chararray,CIA_VLR_1:chararray,CIA_DA_ARR_FIC:chararray,CIA_TY_ENR:chararray,CIA_CD_BTE:chararray,CIA_CD_PER:chararray,CIA_CD_EFS:chararray,CIA_CD_ETA_VAL_CRV:chararray,CIA_CD_EVE_CPR:chararray,CIA_CD_APLI_TDU:chararray,CIA_CD_STE_RTG:chararray,CIA_DA_TT_RTG:chararray,CIA_NO_ENR_RTG:chararray,CIA_DA_VAL_EVE:chararray,PSE_002:chararray,CTR_001:chararray,STR_006:chararray,T32_001:chararray,T32_004:chararray,T16_001:chararray,DAT_001_X:chararray,DAT_004_X:chararray,EUR_001_VLR:chararray,EUR_001_DCM:chararray,EUR_001_CD_DVS:chararray,EUR_005_VLR:chararray,EUR_005_DCM:chararray,EUR_005_CD_DVS:chararray,EUR_006_VLR:chararray,EUR_006_DCM:chararray,EUR_006_CD_DVS:chararray,EUR_007_VLR:chararray,EUR_007_DCM:chararray,EUR_007_CD_DVS:chararray,EUR_008_VLR:chararray,EUR_008_DCM:chararray,EUR_008_CD_DVS:chararray,T02_001:chararray,T02_002:chararray,T02_003:chararray,T02_004:chararray,T02_005:chararray,T02_006:chararray,T02_007:chararray,T02_008:chararray,T02_009:chararray,T03_001:chararray,T03_002:chararray,RUB_203:chararray,RUB_205:chararray,RUB_206:chararray,RUB_208:chararray,RUB_209:chararray,T04_001:chararray);
-- Etablir le filtre exigee
CRE_28004_FILTER = FILTER CRE_28004 BY (T02_008 == '6');
-- Sauvegarder le resultat
STORE CRE_28004_FILTER INTO '$OUTPUT_FILE_CRE' USING org.apache.pig.piggybank.storage.CSVExcelStorage(';','NO_MULTILINE', 'UNIX','WRITE_OUTPUT_HEADER');
它运行没有任何错误,但输出为空:
Input(s):
Successfully read 444 records (583792 bytes) from: "/hdfs/data/adhoc/PR/02/RDO0/BB0/MGM28001-2019-08-19.csv"
Output(s):
Successfully stored 0 records in: "/hdfs/data/adhoc/PR/02/RDO0/BB0/MGM28004-OUTPUT.csv"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1549794175705_3481592
如您所见,我的输入包含非空数据。
然后我也检查了过滤后的数据:
CRE_28004_FILTER = FILTER CRE_28004 BY (T02_008 == '6');
为了检查过滤后的数据是否真的存在,我使用了 DESCRIBE 运算符
CRE_28001_FILTER: {MGM_COMPTEUR: chararray,CIA_CD_CRV_CIA: chararray,CIA_DA_EM_CRV: chararray,CIA_CD_CTRL_BLCE: chararray,CIA_IDC_EXTR_RDJ: int,CIA_VLR_IDT_CRV_LOQ: chararray,CIA_VLR_REF_CRV: chararray,CIA_NO_SEQ_CRV: chararray,CIA_VLR_LG_ZON_RTG: chararray,CIA_HEU_CIA: chararray,CIA_TM_STP_CRE: chararray,CIA_CD_SI: chararray,CIA_VLR_1: chararray,CIA_DA_ARR_FIC: chararray,CIA_TY_ENR: chararray,CIA_CD_BTE: chararray,CIA_CD_PER: chararray,CIA_CD_EFS: chararray,CIA_CD_ETA_VAL_CRV: chararray,CIA_CD_EVE_CPR: chararray,CIA_CD_APLI_TDU: chararray,CIA_CD_STE_RTG: chararray,CIA_DA_TT_RTG: chararray,CIA_NO_ENR_RTG: chararray,CIA_DA_VAL_EVE: chararray,D08_007: chararray,D08_006: chararray,D08_005: chararray,D08_004: chararray,D08_003: chararray,D08_002: chararray,D08_001: chararray,STR_005: chararray,D08_008: chararray,D11_001: chararray,D25_001: chararray,D25_002: chararray,D25_003: chararray,STR_004: chararray,STR_003: chararray,STR_002: chararray,STR_001: chararray,PSE_001: chararray,RUB_201: chararray,RUB_202: chararray,RUB_203: chararray,RUB_204: chararray,RUB_205: chararray,RUB_206: chararray,RUB_208: chararray,RUB_209: chararray,RUB_210: chararray,RUB_211: chararray,RUB_212: chararray,RUB_217: chararray,RUB_218: chararray,RUB_219: chararray,RUB_220: chararray,RUB_224: chararray,RUB_225: chararray,RUB_226: chararray,RUB_227: chararray,RUB_228: chararray,RUB_230: chararray,RUB_231: chararray,RUB_232: chararray,RUB_233: chararray,RUB_234: chararray,RUB_235: chararray,RUB_236: chararray,RUB_301: chararray,RUB_302: chararray,RUB_303: chararray,RUB_304: chararray,RUB_305: chararray,RUB_306: chararray,RUB_307: chararray,RUB_308: chararray,RUB_309: chararray,RUB_310: chararray,RUB_311: chararray,RUB_312: chararray,RUB_313: chararray,RUB_314: chararray,RUB_315: chararray,RUB_501: chararray,RUB_502: chararray,RUB_503: chararray,RUB_511: chararray,RUB_512: chararray,RUB_513: chararray,RUB_514: chararray,RUB_515: chararray,RUB_516: chararray,RUB_520: chararray,RUB_521: chararray,RUB_522: chararray,RUB_999: chararray,DAT_001_X: chararray,DAT_002_X: chararray,DAT_003_X: chararray,HEU_001: chararray,NB_001_VLR: chararray,NB_001_DCM: chararray,NB_002_VLR: chararray,NB_002_DCM: chararray,NB_003_VLR: chararray,NB_003_DCM: chararray,T06_001: chararray,T32_001: chararray,T32_002: chararray,T32_003: chararray,T50_001: chararray,T50_002: chararray,T50_003: chararray,T50_004: chararray,EUR_001_VLREUR_001_DCM: chararray,EUR_001_CD_DVS: chararray,EUR_002_VLR: chararray,EUR_002_DCM: chararray,EUR_002_CD_DVS: chararray,EUR_003_VLR: chararray,EUR_003_DCM: chararray,EUR_003_CD_DVS: chararray,EUR_004_VLR: chararray,EUR_004_DCM: chararray,EUR_004_CD_DVS: chararray,RIB_001: chararray,RUB_229: chararray,T08_001: chararray}
所以过滤后的数据不为空!
@Vk_
编辑:按照下面消息中的要求。 6值在现实中是存在的
cut -d';' -f63 MGM28001.csv | sort | uniq
1
2
3
4
5
6
7
D
RUB_202
并且 threse 超过 100 分在 RUB_202 列中具有“6”作为值。
真的很奇怪
And there are over 100 lines that has '6' as a value in the RUB_202
column.
在过滤器
中使用RUB_202
CRE_28004_FILTER = FILTER CRE_28004 BY (RUB_202 == '6');
过滤前先查看加载的数据。我总是将文件保存到文本文件,然后将其作为逗号分隔符加载..
示例
CRE_28004 = 加载 'input.txt' 使用 PigStorage(',') AS (MGM_COMPTEUR: chararray);
Res = LIMIT CRE_28004 10;
转储资源;
并且在过滤时尝试 MATCHES。
问题出在原始数据中的空格。
我应该从原始数据生成数据实体:
加载数据后:
CRE_28004 = LOAD '$input' USING PigStorage(';') AS (MGM_COMPTEUR:chararray,CIA_CD_CRV_CIA:chararray,CIA_DA_EM_CRV:chararray,CIA_CD_CTRL_BLCE:chararray,CIA_IDC_EXTR_RDJ:chararray,CIA_VLR_IDT_CRV_LOQ:chararray,CIA_VLR_REF_CRV:chararray,CIA_NO_SEQ_CRV:chararray,CIA_VLR_LG_ZON_RTG:chararray,CIA_HEU_CIA:chararray,CIA_TM_STP_CRE:chararray,CIA_CD_SI:chararray,CIA_VLR_1:chararray,CIA_DA_ARR_FIC:chararray,CIA_TY_ENR:chararray,CIA_CD_BTE:chararray,CIA_CD_PER:chararray,CIA_CD_EFS:chararray,CIA_CD_ETA_VAL_CRV:chararray,CIA_CD_EVE_CPR:chararray,CIA_CD_APLI_TDU:chararray,CIA_CD_STE_RTG:chararray,CIA_DA_TT_RTG:chararray,CIA_NO_ENR_RTG:chararray,CIA_DA_VAL_EVE:chararray,PSE_002:chararray,CTR_001:chararray,STR_006:chararray,T32_001:chararray,T32_004:chararray,T16_001:chararray,DAT_001_X:chararray,DAT_004_X:chararray,EUR_001_VLR:chararray,EUR_001_DCM:chararray,EUR_001_CD_DVS:chararray,EUR_005_VLR:chararray,EUR_005_DCM:chararray,EUR_005_CD_DVS:chararray,EUR_006_VLR:chararray,EUR_006_DCM:chararray,EUR_006_CD_DVS:chararray,EUR_007_VLR:chararray,EUR_007_DCM:chararray,EUR_007_CD_DVS:chararray,EUR_008_VLR:chararray,EUR_008_DCM:chararray,EUR_008_CD_DVS:chararray,T02_001:chararray,T02_002:chararray,T02_003:chararray,T02_004:chararray,T02_005:chararray,T02_006:chararray,T02_007:chararray,T02_008:chararray,T02_009:chararray,T03_001:chararray,T03_002:chararray,RUB_203:chararray,RUB_205:chararray,RUB_206:chararray,RUB_208:chararray,RUB_209:chararray,T04_001:chararray);
我应该使用 casting 生成数据实体:
Data = FOREACH CRE_28004 GENERATE
(chararray) [=11=] as MGM_COMPTEUR,
(chararray) as CIA_CD_CRV_CIA,
(chararray) as CIA_DA_EM_CRV,
(chararray) as CIA_CD_CTRL_BLCE,
(chararray) as CIA_IDC_EXTR_RDJ,
(chararray) as CIA_VLR_IDT_CRV_LOQ,
....
然后进行过滤没有问题,它returns我的结果没有空。
我是 运行 简单的猪代码,我的输入是 :
给出的 csv 文件CRE_28004 = LOAD '$input' USING PigStorage(';') AS (MGM_COMPTEUR:chararray,CIA_CD_CRV_CIA:chararray,CIA_DA_EM_CRV:chararray,CIA_CD_CTRL_BLCE:chararray,CIA_IDC_EXTR_RDJ:chararray,CIA_VLR_IDT_CRV_LOQ:chararray,CIA_VLR_REF_CRV:chararray,CIA_NO_SEQ_CRV:chararray,CIA_VLR_LG_ZON_RTG:chararray,CIA_HEU_CIA:chararray,CIA_TM_STP_CRE:chararray,CIA_CD_SI:chararray,CIA_VLR_1:chararray,CIA_DA_ARR_FIC:chararray,CIA_TY_ENR:chararray,CIA_CD_BTE:chararray,CIA_CD_PER:chararray,CIA_CD_EFS:chararray,CIA_CD_ETA_VAL_CRV:chararray,CIA_CD_EVE_CPR:chararray,CIA_CD_APLI_TDU:chararray,CIA_CD_STE_RTG:chararray,CIA_DA_TT_RTG:chararray,CIA_NO_ENR_RTG:chararray,CIA_DA_VAL_EVE:chararray,PSE_002:chararray,CTR_001:chararray,STR_006:chararray,T32_001:chararray,T32_004:chararray,T16_001:chararray,DAT_001_X:chararray,DAT_004_X:chararray,EUR_001_VLR:chararray,EUR_001_DCM:chararray,EUR_001_CD_DVS:chararray,EUR_005_VLR:chararray,EUR_005_DCM:chararray,EUR_005_CD_DVS:chararray,EUR_006_VLR:chararray,EUR_006_DCM:chararray,EUR_006_CD_DVS:chararray,EUR_007_VLR:chararray,EUR_007_DCM:chararray,EUR_007_CD_DVS:chararray,EUR_008_VLR:chararray,EUR_008_DCM:chararray,EUR_008_CD_DVS:chararray,T02_001:chararray,T02_002:chararray,T02_003:chararray,T02_004:chararray,T02_005:chararray,T02_006:chararray,T02_007:chararray,T02_008:chararray,T02_009:chararray,T03_001:chararray,T03_002:chararray,RUB_203:chararray,RUB_205:chararray,RUB_206:chararray,RUB_208:chararray,RUB_209:chararray,T04_001:chararray);
-- Etablir le filtre exigee
CRE_28004_FILTER = FILTER CRE_28004 BY (T02_008 == '6');
-- Sauvegarder le resultat
STORE CRE_28004_FILTER INTO '$OUTPUT_FILE_CRE' USING org.apache.pig.piggybank.storage.CSVExcelStorage(';','NO_MULTILINE', 'UNIX','WRITE_OUTPUT_HEADER');
它运行没有任何错误,但输出为空:
Input(s):
Successfully read 444 records (583792 bytes) from: "/hdfs/data/adhoc/PR/02/RDO0/BB0/MGM28001-2019-08-19.csv"
Output(s):
Successfully stored 0 records in: "/hdfs/data/adhoc/PR/02/RDO0/BB0/MGM28004-OUTPUT.csv"
Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0
Job DAG:
job_1549794175705_3481592
如您所见,我的输入包含非空数据。
然后我也检查了过滤后的数据:
CRE_28004_FILTER = FILTER CRE_28004 BY (T02_008 == '6');
为了检查过滤后的数据是否真的存在,我使用了 DESCRIBE 运算符
CRE_28001_FILTER: {MGM_COMPTEUR: chararray,CIA_CD_CRV_CIA: chararray,CIA_DA_EM_CRV: chararray,CIA_CD_CTRL_BLCE: chararray,CIA_IDC_EXTR_RDJ: int,CIA_VLR_IDT_CRV_LOQ: chararray,CIA_VLR_REF_CRV: chararray,CIA_NO_SEQ_CRV: chararray,CIA_VLR_LG_ZON_RTG: chararray,CIA_HEU_CIA: chararray,CIA_TM_STP_CRE: chararray,CIA_CD_SI: chararray,CIA_VLR_1: chararray,CIA_DA_ARR_FIC: chararray,CIA_TY_ENR: chararray,CIA_CD_BTE: chararray,CIA_CD_PER: chararray,CIA_CD_EFS: chararray,CIA_CD_ETA_VAL_CRV: chararray,CIA_CD_EVE_CPR: chararray,CIA_CD_APLI_TDU: chararray,CIA_CD_STE_RTG: chararray,CIA_DA_TT_RTG: chararray,CIA_NO_ENR_RTG: chararray,CIA_DA_VAL_EVE: chararray,D08_007: chararray,D08_006: chararray,D08_005: chararray,D08_004: chararray,D08_003: chararray,D08_002: chararray,D08_001: chararray,STR_005: chararray,D08_008: chararray,D11_001: chararray,D25_001: chararray,D25_002: chararray,D25_003: chararray,STR_004: chararray,STR_003: chararray,STR_002: chararray,STR_001: chararray,PSE_001: chararray,RUB_201: chararray,RUB_202: chararray,RUB_203: chararray,RUB_204: chararray,RUB_205: chararray,RUB_206: chararray,RUB_208: chararray,RUB_209: chararray,RUB_210: chararray,RUB_211: chararray,RUB_212: chararray,RUB_217: chararray,RUB_218: chararray,RUB_219: chararray,RUB_220: chararray,RUB_224: chararray,RUB_225: chararray,RUB_226: chararray,RUB_227: chararray,RUB_228: chararray,RUB_230: chararray,RUB_231: chararray,RUB_232: chararray,RUB_233: chararray,RUB_234: chararray,RUB_235: chararray,RUB_236: chararray,RUB_301: chararray,RUB_302: chararray,RUB_303: chararray,RUB_304: chararray,RUB_305: chararray,RUB_306: chararray,RUB_307: chararray,RUB_308: chararray,RUB_309: chararray,RUB_310: chararray,RUB_311: chararray,RUB_312: chararray,RUB_313: chararray,RUB_314: chararray,RUB_315: chararray,RUB_501: chararray,RUB_502: chararray,RUB_503: chararray,RUB_511: chararray,RUB_512: chararray,RUB_513: chararray,RUB_514: chararray,RUB_515: chararray,RUB_516: chararray,RUB_520: chararray,RUB_521: chararray,RUB_522: chararray,RUB_999: chararray,DAT_001_X: chararray,DAT_002_X: chararray,DAT_003_X: chararray,HEU_001: chararray,NB_001_VLR: chararray,NB_001_DCM: chararray,NB_002_VLR: chararray,NB_002_DCM: chararray,NB_003_VLR: chararray,NB_003_DCM: chararray,T06_001: chararray,T32_001: chararray,T32_002: chararray,T32_003: chararray,T50_001: chararray,T50_002: chararray,T50_003: chararray,T50_004: chararray,EUR_001_VLREUR_001_DCM: chararray,EUR_001_CD_DVS: chararray,EUR_002_VLR: chararray,EUR_002_DCM: chararray,EUR_002_CD_DVS: chararray,EUR_003_VLR: chararray,EUR_003_DCM: chararray,EUR_003_CD_DVS: chararray,EUR_004_VLR: chararray,EUR_004_DCM: chararray,EUR_004_CD_DVS: chararray,RIB_001: chararray,RUB_229: chararray,T08_001: chararray}
所以过滤后的数据不为空!
@Vk_
编辑:按照下面消息中的要求。 6值在现实中是存在的
cut -d';' -f63 MGM28001.csv | sort | uniq
1
2
3
4
5
6
7
D
RUB_202
并且 threse 超过 100 分在 RUB_202 列中具有“6”作为值。
真的很奇怪
And there are over 100 lines that has '6' as a value in the RUB_202 column.
在过滤器
中使用RUB_202
CRE_28004_FILTER = FILTER CRE_28004 BY (RUB_202 == '6');
过滤前先查看加载的数据。我总是将文件保存到文本文件,然后将其作为逗号分隔符加载..
示例
CRE_28004 = 加载 'input.txt' 使用 PigStorage(',') AS (MGM_COMPTEUR: chararray); Res = LIMIT CRE_28004 10; 转储资源;
并且在过滤时尝试 MATCHES。
问题出在原始数据中的空格。 我应该从原始数据生成数据实体:
加载数据后:
CRE_28004 = LOAD '$input' USING PigStorage(';') AS (MGM_COMPTEUR:chararray,CIA_CD_CRV_CIA:chararray,CIA_DA_EM_CRV:chararray,CIA_CD_CTRL_BLCE:chararray,CIA_IDC_EXTR_RDJ:chararray,CIA_VLR_IDT_CRV_LOQ:chararray,CIA_VLR_REF_CRV:chararray,CIA_NO_SEQ_CRV:chararray,CIA_VLR_LG_ZON_RTG:chararray,CIA_HEU_CIA:chararray,CIA_TM_STP_CRE:chararray,CIA_CD_SI:chararray,CIA_VLR_1:chararray,CIA_DA_ARR_FIC:chararray,CIA_TY_ENR:chararray,CIA_CD_BTE:chararray,CIA_CD_PER:chararray,CIA_CD_EFS:chararray,CIA_CD_ETA_VAL_CRV:chararray,CIA_CD_EVE_CPR:chararray,CIA_CD_APLI_TDU:chararray,CIA_CD_STE_RTG:chararray,CIA_DA_TT_RTG:chararray,CIA_NO_ENR_RTG:chararray,CIA_DA_VAL_EVE:chararray,PSE_002:chararray,CTR_001:chararray,STR_006:chararray,T32_001:chararray,T32_004:chararray,T16_001:chararray,DAT_001_X:chararray,DAT_004_X:chararray,EUR_001_VLR:chararray,EUR_001_DCM:chararray,EUR_001_CD_DVS:chararray,EUR_005_VLR:chararray,EUR_005_DCM:chararray,EUR_005_CD_DVS:chararray,EUR_006_VLR:chararray,EUR_006_DCM:chararray,EUR_006_CD_DVS:chararray,EUR_007_VLR:chararray,EUR_007_DCM:chararray,EUR_007_CD_DVS:chararray,EUR_008_VLR:chararray,EUR_008_DCM:chararray,EUR_008_CD_DVS:chararray,T02_001:chararray,T02_002:chararray,T02_003:chararray,T02_004:chararray,T02_005:chararray,T02_006:chararray,T02_007:chararray,T02_008:chararray,T02_009:chararray,T03_001:chararray,T03_002:chararray,RUB_203:chararray,RUB_205:chararray,RUB_206:chararray,RUB_208:chararray,RUB_209:chararray,T04_001:chararray);
我应该使用 casting 生成数据实体:
Data = FOREACH CRE_28004 GENERATE
(chararray) [=11=] as MGM_COMPTEUR,
(chararray) as CIA_CD_CRV_CIA,
(chararray) as CIA_DA_EM_CRV,
(chararray) as CIA_CD_CTRL_BLCE,
(chararray) as CIA_IDC_EXTR_RDJ,
(chararray) as CIA_VLR_IDT_CRV_LOQ,
....
然后进行过滤没有问题,它returns我的结果没有空。