使用 awk 转换空字段
Transforms a null field with awk
我有一个像这样的字符串:
Topic: test1 TopicId: IMjrpzIARVyMPVgxRC1dsA PartitionCount: 1 ReplicationFactor: 2 Configs: message.format.version=2.8-IV1,message.timestamp.type=CreateTime,min.insync.replicas=1
Topic: test2 TopicId: KFkR8ukRQXif1nvjQAcwZA PartitionCount: 1 ReplicationFactor: 1 Configs: cleanup.policy=compact
Topic: d9nvrvth TopicId: 5Ec71za3TV-KznAnG8nV0Q PartitionCount: 6 ReplicationFactor: 3 Configs: message.format.version=2.3-IV1,cleanup.policy=delete,max.message.bytes=2097164,min.compaction.lag.ms=0,message.timestamp.type=CreateTime,min.insync.replicas=2,segment.bytes=104857600,segment.ms=604800000,retention.ms=604800000,message.timestamp.difference.max.ms=9223372036854775807,delete.retention.ms=86400000,retention.bytes=-1
我只想 select 2 个字段(cleanup.policy 和 retention.ms),但有时字符串中不存在这些字段。
当这些字段不存在时,我想设置一个默认值。
我用这句awk
awk '
match([=11=],/Topic:[^\t]*/){
topic=substr([=11=],RSTART+6,RLENGTH-6)
match([=11=],/retention\.ms[^,]*/)
retention=substr([=11=],RSTART+13,RLENGTH-13)
if ( length(retention == 0) retention = "1 week"
match([=11=],/cleanup\.policy[^,]*/)
clean=substr([=11=],RSTART+15,RLENGTH-15)
if ( length(clean == 0) clean = "delete"
print topic","retention,","clean }'
但问题是总是给我相同的值
OP 当前 awk
代码的一些问题:
- 未尝试捕获
retention.ms
和 cleanup.policy
属性的值
/retention\.ms/
匹配 retention.ms
和 delete.retention.ms
所以 match()
会在 Configs:
部分找到第一个
-
print
正在打印文字字符串 "retention"
和 "clean"
而不是变量 retention
和 clean
的内容
一个awk
想法:
awk '
== "Topic:" { topic=
retention="1 week" # set default value
clean="delete" # set default value
n=split($NF,a,/[,=]/) # split last field on dual delimiters "," and "=";
# odd indexed entries are attributes, even indexed entries are values
for (i=1;i<=n;i+=2) { # loop through list of attributes
if (a[i]=="retention.ms") # if we have an attribute match then ...
retention=a[i+1] # save value
if (a[1]=="cleanup.policy") # if we have an attribute match then ...
clean=a[i+1] # save value
}
print topic, retention, clean
}
' topic.dat
这会生成:
test1 1 week delete
test2 1 week compact
d9nvrvth 604800000 delete
我有一个像这样的字符串:
Topic: test1 TopicId: IMjrpzIARVyMPVgxRC1dsA PartitionCount: 1 ReplicationFactor: 2 Configs: message.format.version=2.8-IV1,message.timestamp.type=CreateTime,min.insync.replicas=1
Topic: test2 TopicId: KFkR8ukRQXif1nvjQAcwZA PartitionCount: 1 ReplicationFactor: 1 Configs: cleanup.policy=compact
Topic: d9nvrvth TopicId: 5Ec71za3TV-KznAnG8nV0Q PartitionCount: 6 ReplicationFactor: 3 Configs: message.format.version=2.3-IV1,cleanup.policy=delete,max.message.bytes=2097164,min.compaction.lag.ms=0,message.timestamp.type=CreateTime,min.insync.replicas=2,segment.bytes=104857600,segment.ms=604800000,retention.ms=604800000,message.timestamp.difference.max.ms=9223372036854775807,delete.retention.ms=86400000,retention.bytes=-1
我只想 select 2 个字段(cleanup.policy 和 retention.ms),但有时字符串中不存在这些字段。 当这些字段不存在时,我想设置一个默认值。
我用这句awk
awk '
match([=11=],/Topic:[^\t]*/){
topic=substr([=11=],RSTART+6,RLENGTH-6)
match([=11=],/retention\.ms[^,]*/)
retention=substr([=11=],RSTART+13,RLENGTH-13)
if ( length(retention == 0) retention = "1 week"
match([=11=],/cleanup\.policy[^,]*/)
clean=substr([=11=],RSTART+15,RLENGTH-15)
if ( length(clean == 0) clean = "delete"
print topic","retention,","clean }'
但问题是总是给我相同的值
OP 当前 awk
代码的一些问题:
- 未尝试捕获
retention.ms
和cleanup.policy
属性的值 /retention\.ms/
匹配retention.ms
和delete.retention.ms
所以match()
会在Configs:
部分找到第一个-
print
正在打印文字字符串"retention"
和"clean"
而不是变量retention
和clean
的内容
一个awk
想法:
awk '
== "Topic:" { topic=
retention="1 week" # set default value
clean="delete" # set default value
n=split($NF,a,/[,=]/) # split last field on dual delimiters "," and "=";
# odd indexed entries are attributes, even indexed entries are values
for (i=1;i<=n;i+=2) { # loop through list of attributes
if (a[i]=="retention.ms") # if we have an attribute match then ...
retention=a[i+1] # save value
if (a[1]=="cleanup.policy") # if we have an attribute match then ...
clean=a[i+1] # save value
}
print topic, retention, clean
}
' topic.dat
这会生成:
test1 1 week delete
test2 1 week compact
d9nvrvth 604800000 delete