elasticsearch 在使用 logstash 进行数据迁移时仅显示 1 docs.count
elasticsearch showing only 1 docs.count on data migration using logstash
我正在尝试使用自定义模板使用 logstash 将数据从 S3(.csv 文件的数据)移动到弹性搜索集群。
但是,当我在 Kibana 中使用以下查询进行检查时,它只显示 docs.count=1,其余记录显示为 docs.deleted:-
GET /_cat/indices?v
我的第一个问题是:-
- 为什么只有一条记录[最后一条]被传输而其他的被删除传输?
现在,当我在 Kibana 中使用以下查询查询此索引时:-
GET /my_file_index/_search
{
"query": {
"match_all": {}
}
}
我在 "message" :
字段中只得到一条用逗号分隔数据的记录,所以第二个问题是:-
- 我已经在输入到 logstash 的模板文件中指定了所有列映射,因此我如何才能像在 csv 中一样获取具有列名的数据?
我也尝试在 logstash csv 过滤器中提供列字段,但没有成功。
columns => ["col1", "col2",...]
如有任何帮助,我们将不胜感激。
EDIT-1:下面是我的 logstash.conf 文件:-
input {
s3{
access_key_id => "xxx"
secret_access_key => "xxxx"
region => "eu-xxx-1"
bucket => "xxxx"
prefix => "abc/stocks_03-jul-2018.csv"
}
}
filter {
csv {
separator => ","
columns => ["AAA","BBB","CCC"]
}
}
output {
amazon_es {
index => "my_r_index"
document_type => "my_r_index"
hosts => "vpc-totemdev-xxxx.eu-xxx-1.es.amazonaws.com"
region => "eu-xxxx-1"
aws_access_key_id => 'xxxxx'
aws_secret_access_key => 'xxxxxx+xxxxx'
document_id => "%{id}"
template => "templates/template_2.json"
template_name => "my_r_index"
}
}
注意:
logstash 版本:6.3.1
弹性搜索版本:6.2
编辑:-2 添加 template_2.json 文件以及样本 csv header :-
1。映射文件:-
{
"template" : "my_r_index",
"settings" : {
"index" : {
"number_of_shards" : 50,
"number_of_replicas" : 1
},
"index.codec" : "best_compression",
"index.refresh_interval" : "60s"
},
"mappings" : {
"_default_" : {
"_all" : { "enabled" : false },
"properties" : {
"SECURITY" : {
"type" : "keyword"
},
"SERVICEID" : {
"type" : "integer"
},
"MEMBERID" : {
"type" : "integer"
},
"VALUEDATE" : {
"type" : "date"
},
"COUNTRY" : {
"type" : "keyword"
},
"CURRENCY" : {
"type" : "keyword"
},
"ABC" : {
"type" : "integer"
},
"PQR" : {
"type" : "keyword"
},
"KKK" : {
"type" : "keyword"
},
"EXPIRYDATE" : {
"type" : "text",
"index" : "false"
},
"SOMEID" : {
"type" : "double",
"index" : "false"
},
"DDD" : {
"type" : "double",
"index" : "false"
},
"EEE" : {
"type" : "double",
"index" : "false"
},
"FFF" : {
"type" : "double",
"index" : "false"
},
"GGG" : {
"type" : "text",
"index" : "false"
},
"LLL" : {
"type" : "double",
"index" : "false"
},
"MMM" : {
"type" : "double",
"index" : "false"
},
"NNN" : {
"type" : "double",
"index" : "false"
},
"OOO" : {
"type" : "double",
"index" : "false"
},
"PPP" : {
"type" : "text",
"index" : "false"
},
"QQQ" : {
"type" : "integer",
"index" : "false"
},
"RRR" : {
"type" : "double",
"index" : "false"
},
"SSS" : {
"type" : "double",
"index" : "false"
},
"TTT" : {
"type" : "double",
"index" : "false"
},
"UUU" : {
"type" : "double",
"index" : "false"
},
"VVV" : {
"type" : "text",
"index" : "false"
},
"WWW" : {
"type" : "double",
"index" : "false"
},
"XXX" : {
"type" : "double",
"index" : "false"
},
"YYY" : {
"type" : "double",
"index" : "false"
},
"ZZZ" : {
"type" : "double",
"index" : "false"
},
"KNOCKORWARD" : {
"type" : "text",
"index" : "false"
},
"RANGEATSSPUT" : {
"type" : "double",
"index" : "false"
},
"STDATMESSPUT" : {
"type" : "double",
"index" : "false"
},
"CONSENSUPUT" : {
"type" : "double",
"index" : "false"
},
"CLIENTLESSPUT" : {
"type" : "double",
"index" : "false"
},
"KNOCKOUESSPUT" : {
"type" : "text",
"index" : "false"
},
"RANGACTOR" : {
"type" : "double",
"index" : "false"
},
"STDDACTOR" : {
"type" : "double",
"index" : "false"
},
"CONSCTOR" : {
"type" : "double",
"index" : "false"
},
"CLIENTOR" : {
"type" : "double",
"index" : "false"
},
"KNOCKOACTOR" : {
"type" : "text",
"index" : "false"
},
"RANGEPRICE" : {
"type" : "double",
"index" : "false"
},
"STANDARCE" : {
"type" : "double",
"index" : "false"
},
"NUMBERICE" : {
"type" : "integer",
"index" : "false"
},
"CONSECE" : {
"type" : "double",
"index" : "false"
},
"CLIECE" : {
"type" : "double",
"index" : "false"
},
"KNOCICE" : {
"type" : "text",
"index" : "false"
},
"SKEWICE" : {
"type" : "text",
"index" : "false"
},
"WILDISED" : {
"type" : "text",
"index" : "false"
},
"WILDATUS" : {
"type" : "text",
"index" : "false"
},
"RRF" : {
"type" : "double",
"index" : "false"
},
"SRF" : {
"type" : "double",
"index" : "false"
},
"CNRF" : {
"type" : "double",
"index" : "false"
},
"CTRF" : {
"type" : "double",
"index" : "false"
},
"RANADDLE" : {
"type" : "double",
"index" : "false"
},
"STANDANSTRADDLE" : {
"type" : "double",
"index" : "false"
},
"CONSLE" : {
"type" : "double",
"index" : "false"
},
"CLIDLE" : {
"type" : "double",
"index" : "false"
},
"KNOCKOADDLE" : {
"type" : "text",
"index" : "false"
},
"RANGEFM" : {
"type" : "double",
"index" : "false"
},
"SMIUM" : {
"type" : "double",
"index" : "false"
},
"CONIUM" : {
"type" : "double",
"index" : "false"
},
"CLIEEMIUM" : {
"type" : "double",
"index" : "false"
},
"KNOREMIUM" : {
"type" : "text",
"index" : "false"
},
"COT" : {
"type" : "double",
"index" : "false"
},
"CLIEEDSPOT" : {
"type" : "double",
"index" : "false"
},
"IME" : {
"type" : "keyword"
},
"KKE" : {
"type" : "keyword"
}
}
}
}
}
我的excel内容为:-
Header : 实际header很长,因为有很多列,请继续考虑其他类似于下面的列名。
SECURITY | SERVICEID | MEMBERID | VALUEDATE...
第一行:同样,下面某些列的列值具有空白值,我在上面提到了具有所有列值的真实模板文件(在上面的映射文件中)。
KKK-LMN 2 1815 6/25/2018
PPL-ORL 2 1815 6/25/2018
SLB-ORD 2 1815 2018 年 6 月 25 日
3。 Kibana 查询输出
查询:
GET /my_r_index/_search
{
"query": {
"match_all": {}
}
}
输出:
{
"_index": "my_r_index",
"_type": "my_r_index",
"_id": "IjjIZWUBduulDsi0vYot",
"_score": 1,
"_source": {
"@version": "1",
"message": "XXX-XXX-XXX-USD,2,3190,2018-07-03,UNITED STATES,USD,300,60,Put,2042-12-19,,,,.009108041,q,,,,.269171754,q,,,,,.024127966,q,,,,68.414017367,q,,,,.298398645,q,,,,.502677959,q,,,,,0.040880692400344164,q,,,,,,,159.361792143,,,,.631296636,q,,,,.154877384,q,,42.93,N,Y,\n",
"@timestamp": "2018-08-23T07:56:06.515Z"
}
},
...其他类似的记录。
EDIT-3:
使用 autodetect_column_names => true 后的示例输出:-
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "indr",
"_type": "logs",
"_id": "hAF1aWUBS_wbCH7ZG4tW",
"_score": 1,
"_source": {
"2": "2",
"1815": "1815",
"message": """
PPL-ORD-XNYS-USD,2,1815,6/25/2018,UNITED STATES
""",
"SLB-ORD-XNYS-USD": "PPL-ORD-XNYS-USD",
"6/25/2018": "6/25/2018",
"@timestamp": "2018-08-24T01:03:26.436Z",
"UNITED STATES": "UNITED STATES",
"@version": "1"
}
},
{
"_index": "indr",
"_type": "logs",
"_id": "kP11aWUBctDorPcGHICS",
"_score": 1,
"_source": {
"2": "2",
"1815": "1815",
"message": """
SLBUSD,2,1815,4/22/2018,UNITEDSTATES
""",
"SLB-ORD-XNYS-USD": "SLBUSD",
"6/25/2018": "4/22/2018",
"@timestamp": "2018-08-24T01:03:26.436Z",
"UNITED STATES": "UNITEDSTATES",
"@version": "1"
}
},
{
"_index": "indr",
"_type": "logs",
"_id": "j_11aWUBctDorPcGHICS",
"_score": 1,
"_source": {
"2": "SERVICE",
"1815": "CLIENT",
"message": """
UNDERLYING,SERVICE,CLIENT,VALUATIONDATE,COUNTRY
""",
"SLB-ORD-XNYS-USD": "UNDERLYING",
"6/25/2018": "VALUATIONDATE",
"@timestamp": "2018-08-24T01:03:26.411Z",
"UNITED STATES": "COUNTRY",
"@version": "1"
}
}
]
}
}
我很确定您的单个文档的 ID 为 %{id}
。第一个问题来自这样一个事实,即在您的 CSV 文件中,您没有提取名称为 id
的列,而这正是您在 document_id => "%{id}"
中使用的列,因此所有行都使用 id %{id}
并且每个索引都会删除前一个。最后,您有一个文档被索引的次数与 CSV 中的行数一样多。
关于第二个问题,您需要修复过滤器部分,如下所示:
filter {
csv {
separator => ","
autodetect_column_names => true
}
date {
match => [ "VALUATIONDATE", "M/dd/yyyy" ]
}
}
您还需要像这样修复您的索引模板(我只在 VALUATIONDATE
字段中添加了 format
设置:
{
"order": 0,
"template": "helloindex",
"settings": {
"index": {
"codec": "best_compression",
"refresh_interval": "60s",
"number_of_shards": "10",
"number_of_replicas": "1"
}
},
"mappings": {
"_default_": {
"_all": {
"enabled": false
},
"properties": {
"UNDERLYING": {
"type": "keyword"
},
"SERVICE": {
"type": "integer"
},
"CLIENT": {
"type": "integer"
},
"VALUATIONDATE": {
"type": "date",
"format": "MM/dd/yyyy"
},
"COUNTRY": {
"type": "keyword"
}
}
}
},
"aliases": {}
}
我正在尝试使用自定义模板使用 logstash 将数据从 S3(.csv 文件的数据)移动到弹性搜索集群。 但是,当我在 Kibana 中使用以下查询进行检查时,它只显示 docs.count=1,其余记录显示为 docs.deleted:-
GET /_cat/indices?v
我的第一个问题是:-
- 为什么只有一条记录[最后一条]被传输而其他的被删除传输?
现在,当我在 Kibana 中使用以下查询查询此索引时:-
GET /my_file_index/_search
{
"query": {
"match_all": {}
}
}
我在 "message" :
字段中只得到一条用逗号分隔数据的记录,所以第二个问题是:-
- 我已经在输入到 logstash 的模板文件中指定了所有列映射,因此我如何才能像在 csv 中一样获取具有列名的数据?
我也尝试在 logstash csv 过滤器中提供列字段,但没有成功。
columns => ["col1", "col2",...]
如有任何帮助,我们将不胜感激。
EDIT-1:下面是我的 logstash.conf 文件:-
input {
s3{
access_key_id => "xxx"
secret_access_key => "xxxx"
region => "eu-xxx-1"
bucket => "xxxx"
prefix => "abc/stocks_03-jul-2018.csv"
}
}
filter {
csv {
separator => ","
columns => ["AAA","BBB","CCC"]
}
}
output {
amazon_es {
index => "my_r_index"
document_type => "my_r_index"
hosts => "vpc-totemdev-xxxx.eu-xxx-1.es.amazonaws.com"
region => "eu-xxxx-1"
aws_access_key_id => 'xxxxx'
aws_secret_access_key => 'xxxxxx+xxxxx'
document_id => "%{id}"
template => "templates/template_2.json"
template_name => "my_r_index"
}
}
注意: logstash 版本:6.3.1 弹性搜索版本:6.2
编辑:-2 添加 template_2.json 文件以及样本 csv header :-
1。映射文件:-
{
"template" : "my_r_index",
"settings" : {
"index" : {
"number_of_shards" : 50,
"number_of_replicas" : 1
},
"index.codec" : "best_compression",
"index.refresh_interval" : "60s"
},
"mappings" : {
"_default_" : {
"_all" : { "enabled" : false },
"properties" : {
"SECURITY" : {
"type" : "keyword"
},
"SERVICEID" : {
"type" : "integer"
},
"MEMBERID" : {
"type" : "integer"
},
"VALUEDATE" : {
"type" : "date"
},
"COUNTRY" : {
"type" : "keyword"
},
"CURRENCY" : {
"type" : "keyword"
},
"ABC" : {
"type" : "integer"
},
"PQR" : {
"type" : "keyword"
},
"KKK" : {
"type" : "keyword"
},
"EXPIRYDATE" : {
"type" : "text",
"index" : "false"
},
"SOMEID" : {
"type" : "double",
"index" : "false"
},
"DDD" : {
"type" : "double",
"index" : "false"
},
"EEE" : {
"type" : "double",
"index" : "false"
},
"FFF" : {
"type" : "double",
"index" : "false"
},
"GGG" : {
"type" : "text",
"index" : "false"
},
"LLL" : {
"type" : "double",
"index" : "false"
},
"MMM" : {
"type" : "double",
"index" : "false"
},
"NNN" : {
"type" : "double",
"index" : "false"
},
"OOO" : {
"type" : "double",
"index" : "false"
},
"PPP" : {
"type" : "text",
"index" : "false"
},
"QQQ" : {
"type" : "integer",
"index" : "false"
},
"RRR" : {
"type" : "double",
"index" : "false"
},
"SSS" : {
"type" : "double",
"index" : "false"
},
"TTT" : {
"type" : "double",
"index" : "false"
},
"UUU" : {
"type" : "double",
"index" : "false"
},
"VVV" : {
"type" : "text",
"index" : "false"
},
"WWW" : {
"type" : "double",
"index" : "false"
},
"XXX" : {
"type" : "double",
"index" : "false"
},
"YYY" : {
"type" : "double",
"index" : "false"
},
"ZZZ" : {
"type" : "double",
"index" : "false"
},
"KNOCKORWARD" : {
"type" : "text",
"index" : "false"
},
"RANGEATSSPUT" : {
"type" : "double",
"index" : "false"
},
"STDATMESSPUT" : {
"type" : "double",
"index" : "false"
},
"CONSENSUPUT" : {
"type" : "double",
"index" : "false"
},
"CLIENTLESSPUT" : {
"type" : "double",
"index" : "false"
},
"KNOCKOUESSPUT" : {
"type" : "text",
"index" : "false"
},
"RANGACTOR" : {
"type" : "double",
"index" : "false"
},
"STDDACTOR" : {
"type" : "double",
"index" : "false"
},
"CONSCTOR" : {
"type" : "double",
"index" : "false"
},
"CLIENTOR" : {
"type" : "double",
"index" : "false"
},
"KNOCKOACTOR" : {
"type" : "text",
"index" : "false"
},
"RANGEPRICE" : {
"type" : "double",
"index" : "false"
},
"STANDARCE" : {
"type" : "double",
"index" : "false"
},
"NUMBERICE" : {
"type" : "integer",
"index" : "false"
},
"CONSECE" : {
"type" : "double",
"index" : "false"
},
"CLIECE" : {
"type" : "double",
"index" : "false"
},
"KNOCICE" : {
"type" : "text",
"index" : "false"
},
"SKEWICE" : {
"type" : "text",
"index" : "false"
},
"WILDISED" : {
"type" : "text",
"index" : "false"
},
"WILDATUS" : {
"type" : "text",
"index" : "false"
},
"RRF" : {
"type" : "double",
"index" : "false"
},
"SRF" : {
"type" : "double",
"index" : "false"
},
"CNRF" : {
"type" : "double",
"index" : "false"
},
"CTRF" : {
"type" : "double",
"index" : "false"
},
"RANADDLE" : {
"type" : "double",
"index" : "false"
},
"STANDANSTRADDLE" : {
"type" : "double",
"index" : "false"
},
"CONSLE" : {
"type" : "double",
"index" : "false"
},
"CLIDLE" : {
"type" : "double",
"index" : "false"
},
"KNOCKOADDLE" : {
"type" : "text",
"index" : "false"
},
"RANGEFM" : {
"type" : "double",
"index" : "false"
},
"SMIUM" : {
"type" : "double",
"index" : "false"
},
"CONIUM" : {
"type" : "double",
"index" : "false"
},
"CLIEEMIUM" : {
"type" : "double",
"index" : "false"
},
"KNOREMIUM" : {
"type" : "text",
"index" : "false"
},
"COT" : {
"type" : "double",
"index" : "false"
},
"CLIEEDSPOT" : {
"type" : "double",
"index" : "false"
},
"IME" : {
"type" : "keyword"
},
"KKE" : {
"type" : "keyword"
}
}
}
}
}
我的excel内容为:-
Header : 实际header很长,因为有很多列,请继续考虑其他类似于下面的列名。
SECURITY | SERVICEID | MEMBERID | VALUEDATE...
第一行:同样,下面某些列的列值具有空白值,我在上面提到了具有所有列值的真实模板文件(在上面的映射文件中)。
KKK-LMN 2 1815 6/25/2018
PPL-ORL 2 1815 6/25/2018
SLB-ORD 2 1815 2018 年 6 月 25 日
3。 Kibana 查询输出
查询:
GET /my_r_index/_search
{
"query": {
"match_all": {}
}
}
输出:
{
"_index": "my_r_index",
"_type": "my_r_index",
"_id": "IjjIZWUBduulDsi0vYot",
"_score": 1,
"_source": {
"@version": "1",
"message": "XXX-XXX-XXX-USD,2,3190,2018-07-03,UNITED STATES,USD,300,60,Put,2042-12-19,,,,.009108041,q,,,,.269171754,q,,,,,.024127966,q,,,,68.414017367,q,,,,.298398645,q,,,,.502677959,q,,,,,0.040880692400344164,q,,,,,,,159.361792143,,,,.631296636,q,,,,.154877384,q,,42.93,N,Y,\n",
"@timestamp": "2018-08-23T07:56:06.515Z"
}
},
...其他类似的记录。
EDIT-3:
使用 autodetect_column_names => true 后的示例输出:-
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "indr",
"_type": "logs",
"_id": "hAF1aWUBS_wbCH7ZG4tW",
"_score": 1,
"_source": {
"2": "2",
"1815": "1815",
"message": """
PPL-ORD-XNYS-USD,2,1815,6/25/2018,UNITED STATES
""",
"SLB-ORD-XNYS-USD": "PPL-ORD-XNYS-USD",
"6/25/2018": "6/25/2018",
"@timestamp": "2018-08-24T01:03:26.436Z",
"UNITED STATES": "UNITED STATES",
"@version": "1"
}
},
{
"_index": "indr",
"_type": "logs",
"_id": "kP11aWUBctDorPcGHICS",
"_score": 1,
"_source": {
"2": "2",
"1815": "1815",
"message": """
SLBUSD,2,1815,4/22/2018,UNITEDSTATES
""",
"SLB-ORD-XNYS-USD": "SLBUSD",
"6/25/2018": "4/22/2018",
"@timestamp": "2018-08-24T01:03:26.436Z",
"UNITED STATES": "UNITEDSTATES",
"@version": "1"
}
},
{
"_index": "indr",
"_type": "logs",
"_id": "j_11aWUBctDorPcGHICS",
"_score": 1,
"_source": {
"2": "SERVICE",
"1815": "CLIENT",
"message": """
UNDERLYING,SERVICE,CLIENT,VALUATIONDATE,COUNTRY
""",
"SLB-ORD-XNYS-USD": "UNDERLYING",
"6/25/2018": "VALUATIONDATE",
"@timestamp": "2018-08-24T01:03:26.411Z",
"UNITED STATES": "COUNTRY",
"@version": "1"
}
}
]
}
}
我很确定您的单个文档的 ID 为 %{id}
。第一个问题来自这样一个事实,即在您的 CSV 文件中,您没有提取名称为 id
的列,而这正是您在 document_id => "%{id}"
中使用的列,因此所有行都使用 id %{id}
并且每个索引都会删除前一个。最后,您有一个文档被索引的次数与 CSV 中的行数一样多。
关于第二个问题,您需要修复过滤器部分,如下所示:
filter {
csv {
separator => ","
autodetect_column_names => true
}
date {
match => [ "VALUATIONDATE", "M/dd/yyyy" ]
}
}
您还需要像这样修复您的索引模板(我只在 VALUATIONDATE
字段中添加了 format
设置:
{
"order": 0,
"template": "helloindex",
"settings": {
"index": {
"codec": "best_compression",
"refresh_interval": "60s",
"number_of_shards": "10",
"number_of_replicas": "1"
}
},
"mappings": {
"_default_": {
"_all": {
"enabled": false
},
"properties": {
"UNDERLYING": {
"type": "keyword"
},
"SERVICE": {
"type": "integer"
},
"CLIENT": {
"type": "integer"
},
"VALUATIONDATE": {
"type": "date",
"format": "MM/dd/yyyy"
},
"COUNTRY": {
"type": "keyword"
}
}
}
},
"aliases": {}
}