Apache Sqoop Where 子句在使用 SQOOP IMPORT 时不起作用
Apache Sqoop Where clause not working while using SQOOP IMPORT
谁能告诉我这个命令的输出是什么:
这里的部门 table 有默认的 6 行(从 dept_id 2 到 7),然后我向 Mysql db 'retail_db.departments' table 添加了 2 条新记录( department_id 8 和 9)。我想做的是通过使用 –where 参数并将其附加 (–append) 到部门的现有 HDFS 目录来仅选择新添加的记录。
因此,当我 运行 以下命令时,它创建了一个新的 part-m-000006 文件(之前默认的 6 条记录被拆分为 part-m-00000 到 part-m-00005 文件)和来自 department_id 2 到 9(包括 2 个新添加的记录)已添加到其中,正如您在下面的输出中看到的,存在重复的记录。
不明白为什么不遵守 where 子句:
sqoop import \
–connect “jdbc:mysql://quickstart.cloudera:3306/retail_db” \
–username retail_dba \
–password cloudera \
–query “Select * from departments where $CONDITIONS” \
–where “department_id > 7” \
–append \
-m 1 \
–target-dir /user/cloudera/sqoop_import/departments
Output :
—————————————————————————————————————————–
[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/sqoop_import/departments/part*
2,Fitness
3,Footwear
4,Apparel
5,Golf
6,Outdoors
7,Fan Shop
2,Fitness
3,Footwear
4,Apparel
5,Golf
6,Outdoors
7,Fan Shop
8,Sports
9,Jewellery
————————————————————————————————————————–
LOGS GENERATED :
—————————————————————————————————————————–
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/23 12:23:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.0
16/10/23 12:23:30 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/10/23 12:23:31 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/10/23 12:23:31 INFO tool.CodeGenTool: Beginning code generation
16/10/23 12:23:31 INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
16/10/23 12:23:31 INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
16/10/23 12:23:31 INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
16/10/23 12:23:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/b704a6e6d921fb544ba25c6343b18a36/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/23 12:23:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/b704a6e6d921fb544ba25c6343b18a36/QueryResult.jar
16/10/23 12:23:33 INFO mapreduce.ImportJobBase: Beginning query import.
16/10/23 12:23:34 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/10/23 12:23:35 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/10/23 12:23:36 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/10/23 12:23:38 INFO db.DBInputFormat: Using read commited transaction isolation
16/10/23 12:23:38 INFO mapreduce.JobSubmitter: number of splits:1
16/10/23 12:23:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477192024680_0012
16/10/23 12:23:40 INFO impl.YarnClientImpl: Submitted application application_1477192024680_0012
16/10/23 12:23:40 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1477192024680_0012/
16/10/23 12:23:40 INFO mapreduce.Job: Running job: job_1477192024680_0012
16/10/23 12:23:56 INFO mapreduce.Job: Job job_1477192024680_0012 running in uber mode : false
16/10/23 12:23:56 INFO mapreduce.Job: map 0% reduce 0%
16/10/23 12:24:25 INFO mapreduce.Job: map 100% reduce 0%
16/10/23 12:24:26 INFO mapreduce.Job: Job job_1477192024680_0012 completed successfully
16/10/23 12:24:27 INFO mapreduce.Job: Counters: 30
您正在使用 --query
和 --where
。这就是为什么 sqoop 不 尊重 --where
标签。
--query
是 --where
的超集。它涵盖了 WHERE 条件。
这就是您在日志中看到的原因:
INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
使用其中任何一个:
--query "select * from departments where department_id > 7 AND $CONDITIONS"
--where "department_id > 7"
谁能告诉我这个命令的输出是什么: 这里的部门 table 有默认的 6 行(从 dept_id 2 到 7),然后我向 Mysql db 'retail_db.departments' table 添加了 2 条新记录( department_id 8 和 9)。我想做的是通过使用 –where 参数并将其附加 (–append) 到部门的现有 HDFS 目录来仅选择新添加的记录。 因此,当我 运行 以下命令时,它创建了一个新的 part-m-000006 文件(之前默认的 6 条记录被拆分为 part-m-00000 到 part-m-00005 文件)和来自 department_id 2 到 9(包括 2 个新添加的记录)已添加到其中,正如您在下面的输出中看到的,存在重复的记录。
不明白为什么不遵守 where 子句:
sqoop import \
–connect “jdbc:mysql://quickstart.cloudera:3306/retail_db” \
–username retail_dba \
–password cloudera \
–query “Select * from departments where $CONDITIONS” \
–where “department_id > 7” \
–append \
-m 1 \
–target-dir /user/cloudera/sqoop_import/departments
Output :
—————————————————————————————————————————–
[cloudera@quickstart ~]$ hdfs dfs -cat /user/cloudera/sqoop_import/departments/part*
2,Fitness
3,Footwear
4,Apparel
5,Golf
6,Outdoors
7,Fan Shop
2,Fitness
3,Footwear
4,Apparel
5,Golf
6,Outdoors
7,Fan Shop
8,Sports
9,Jewellery
————————————————————————————————————————–
LOGS GENERATED :
—————————————————————————————————————————–
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/10/23 12:23:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.0
16/10/23 12:23:30 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/10/23 12:23:31 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
16/10/23 12:23:31 INFO tool.CodeGenTool: Beginning code generation
16/10/23 12:23:31 INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
16/10/23 12:23:31 INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
16/10/23 12:23:31 INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
16/10/23 12:23:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/b704a6e6d921fb544ba25c6343b18a36/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/23 12:23:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/b704a6e6d921fb544ba25c6343b18a36/QueryResult.jar
16/10/23 12:23:33 INFO mapreduce.ImportJobBase: Beginning query import.
16/10/23 12:23:34 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
16/10/23 12:23:35 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/10/23 12:23:36 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
16/10/23 12:23:38 INFO db.DBInputFormat: Using read commited transaction isolation
16/10/23 12:23:38 INFO mapreduce.JobSubmitter: number of splits:1
16/10/23 12:23:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477192024680_0012
16/10/23 12:23:40 INFO impl.YarnClientImpl: Submitted application application_1477192024680_0012
16/10/23 12:23:40 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1477192024680_0012/
16/10/23 12:23:40 INFO mapreduce.Job: Running job: job_1477192024680_0012
16/10/23 12:23:56 INFO mapreduce.Job: Job job_1477192024680_0012 running in uber mode : false
16/10/23 12:23:56 INFO mapreduce.Job: map 0% reduce 0%
16/10/23 12:24:25 INFO mapreduce.Job: map 100% reduce 0%
16/10/23 12:24:26 INFO mapreduce.Job: Job job_1477192024680_0012 completed successfully
16/10/23 12:24:27 INFO mapreduce.Job: Counters: 30
您正在使用 --query
和 --where
。这就是为什么 sqoop 不 尊重 --where
标签。
--query
是 --where
的超集。它涵盖了 WHERE 条件。
这就是您在日志中看到的原因:
INFO manager.SqlManager: Executing SQL statement: Select * from departments where (1 = 0)
使用其中任何一个:
--query "select * from departments where department_id > 7 AND $CONDITIONS"
--where "department_id > 7"