从 oracle 到有条件的 hdfs 的增量 sqoop

Question

我正在对 hdfs oracle 进行增量查询，给出

之类的条件

(LST_UPD_TMST >TO_TIMESTAMP('2016-05-31T18:55Z', 'YYYY-MM-DD"T"HH24:MI"Z"')
 AND LST_UPD_TMST <= TO_TIMESTAMP('2016-09-13T08:51Z', 'YYYY-MM-DD"T"HH24:MI"Z"'))

但它没有使用索引。我如何强制索引，以便通过仅考虑过滤记录来使 sqoop 更快。

执行增量 sqoop 的最佳选择是什么。 Table oracle 中的大小以 TB 为单位。 Table 有数十亿行，在 where 条件之后它有几百万

Answer 1

You can use --where or --query with where condition in select to filter import results

我不确定你的 sqoop full 命令，试试这个方法

sqoop import 
    --connect jdbc:oracle:thin:@//db.example.com/dbname \
    --username dbusername \
    --password dbpassword \
    --table tablename \
    --columns "column,names,to,select,in,comma,separeted" \
    --where "(LST_UPD_TMST >TO_TIMESTAMP('2016-05-31T18:55Z', 'YYYY-MM-DD\"T\"HH24:MI\"Z\"') AND LST_UPD_TMST <= TO_TIMESTAMP('2016-09-13T08:51Z', 'YYYY-MM-DD\"T\"HH24:MI\"Z\"'))" \
    --target-dir {hdfs/location/to/save/data/from/oracle} \
    --incremental lastmodified \
    --check-column LST_UPD_TMST \
    --last-value {from Date/Timestamp to Sqoop in incremental}

查看有关 sqoop incremental load

的更多详细信息

更新

对于增量导入Sqoop saved job建议自动维护--last-value。

sqoop job --create {incremental job name} \
    -- import
    --connect jdbc:oracle:thin:@//db.example.com/dbname \
    --username dbusername \
    --password dbpassword \
    --table tablename \
    --columns "column,names,to,select,in,comma,separeted" \
    --incremental lastmodified \
    --check-column LST_UPD_TMST \
    --last-value 0

Here --last-value 0 to import from start for first time then latest value will be passed automatically in next invocation by sqoop job

从 oracle 到有条件的 hdfs 的增量 sqoop

Incremental sqoop from oracle to hdfs with condition

hadoop

oracle11g

hdfs

sqoop

更新