删除和合并 CSV 文件中的行以实现列的特殊条件

Delete and combine rows in CSV file for special condition on columns

我在 csv 文件中有以下内容:

ID Contact_id Tags_id
"id114" "" "Tags_id3"
"id12" "" "Tags_id1"
"" "" "Tags_id3"
"id3353" "contact_id8764" "Tags_id5"
"id355" "contact_id16" "Tags_id6"
"" "" "Tags_id7"
"" "" "Tags_id3"
"" "contact_id564" "Tags_id2"
"" "" "Tags_id12"
"id12076" "contact_id137" "Tags_id7"
"" "" "Tags_id3"
"" "" "Tags_id5"
"" "" "Tags_id1"
... ... ...

用于测试的纯文本:

ID,Contact_id,Tags_id
"id114","","Tags_id3"
"id12","","Tags_id1"
"","","Tags_id3"
"id3353","contact_id8764","Tags_id5"
"id355","contact_id16","Tags_id6"
"","","Tags_id7"
"","","Tags_id3"
"","contact_id564","Tags_id2"
"","","Tags_id12"
"id12076","contact_id137","Tags_id7"
"","","Tags_id3"
"","","Tags_id5"
"","","Tags_id1"

预期结果:

Contact_id Tags_id
"contact_id8764" "Tags_id5"
"contact_id16" "Tags_id6,Tags_id7,Tags_id3,Tags_id2,Tags_id12"
"contact_id564" "Tags_id6,Tags_id7,Tags_id3,Tags_id2,Tags_id12"
"contact_id137" "Tags_id7,Tags_id3,Tags_id5,Tags_id1"
... ...

纯文本的预期结果:

Contact_id,Tags_id
"contact_id8764","Tags_id5"
"contact_id16","Tags_id6,Tags_id7,Tags_id3,Tags_id2,Tags_id12"
"contact_id564","Tags_id6,Tags_id7,Tags_id3,Tags_id2,Tags_id12"
"contact_id137","Tags_id7,Tags_id3,Tags_id5,Tags_id1"

我用 vim 宏尝试过,但没有成功。我知道 awk 可能更适合这项任务,但我仍在学习它并且也无法做到。也许还有另一种方法可以解决这个任务。希望有人能帮忙。

下面的可能很难看,但试试这个。

awk -F ',' -f foo.awk input.txt

foo.awk:

NR == 1          { print , ; next }
                 { gsub(/2/, "", [=11=]) } # remove double quotes
 && ( != id) { flush(); id =  ?  : "" } # when ID is changed, print out the 'buffer'.
!id              { next } # note that we defined ID without contact_ID as "" in the previous line.
               { cids[i++] =  } # add contact_ID if detected
                 { tags = tags ","  } # add tag
END              { flush() }

function flush() {
  for (ind in cids) { print "\"" cids[ind] "\",\"" substr(tags, 2) "\"" }
  delete cids; tags = ""; i = 0 # wipe out the buffer
  }

另一种选择

$ awk -F, 'BEGIN         {e="\"\""; OFS=FS} 
           NR==1         {print ,; next} 
           !=e && !=e{s=1; id=; c=; idc[id]=c; idt[id]=; next} 
           s && !=e    {idc[id]=idc[id] FS } 
           s && !=e    {idt[id]=idt[id] FS } 
           END           {for(id in idc)
                            {n=split(idc[id],cs); 
                             for(i=1;i<=n;i++) print cs[i], idt[id]}}' file

Contact_id,Tags_id
"contact_id16","Tags_id6","Tags_id7","Tags_id3","Tags_id2","Tags_id12"
"contact_id564","Tags_id6","Tags_id7","Tags_id3","Tags_id2","Tags_id12"
"contact_id8764","Tags_id5"
"contact_id137","Tags_id7","Tags_id3","Tags_id5","Tags_id1"

使用这种方法不会保留联系人的顺序,如果重要,则需要一些额外的簿记。