根据多个匹配项将一个文件的列添加到另一个文件，同时保留不匹配项

Question

所以我真的对这类东西很陌生（说真的，提前抱歉）但我想我会 post 这个问题，因为它需要我一些是时候解决它了，我敢肯定这比我想象的要困难得多。

我有文件 small.csv:

id,name,x,y,id2
1,john,2,6,13
2,bob,3,4,15
3,jane,5,6,17
4,cindy,1,4,18

和另一个文件 big.csv:

id3,id4,name,x,y
100,{},john,2,6
101,{},bob,3,4
102,{},jane,5,6
103,{},cindy,1,4
104,{},alice,7,8
105,{},jane,0,3
106,{},cindy,1,7

这个问题是我试图将 small.csv 的 id2 放入 big.csv 的 id4 列，仅当 name AND x AND y 匹配。我曾尝试在 Git Bash 中使用不同的 awk 和 join 命令，但效果不佳。对于所有这些的新手观点，我再次感到抱歉，但任何帮助都会很棒。提前谢谢你。

编辑：抱歉，这是最终所需的输出：

id3,id4,name,x,y
100,{13},john,2,6
101,{15},bob,3,4
102,{17},jane,5,6
103,{18},cindy,1,4
104,{},alice,7,8
105,{},jane,0,3
106,{},cindy,1,7

我最近做的一项试验如下：

$ join -j 1 -o 1.5,2.1,2.2,2.3,2.4,2.5 <(sort -k2 small.csv) <(sort -k2 big.csv)

但是我收到了这个错误：

join: /dev/fd/63: No such file or directory

Answer 1

用 join 解决可能并不容易，但用 awk:

解决起来相当容易

awk -F, -v OFS=, ' # set input and output field separators to comma

    # create lookup table from lines of small.csv
    NR==FNR {
        # ignore header
        # map columns 2/3/4 to column 5
        if (NR>1) lut[,,] = 
        next
    }

    # process lines of big.csv
    # if lookup table has mapping for columns 3/4/5, update column 2
    v = lut[,,] {
         = "{" v "}"
    }

    # print (possibly-modified) lines of big.csv
    1

' small.csv big.csv >bignew.csv

代码假定 small.csv 每个不同的列 2/3/4 仅包含一行。

NR==FNR { ...; next } 是一种处理第一个文件参数内容的方法。（当处理来自第二个和后续文件参数的行时，FNR 小于 NR。next 跳过其余 awk 命令的执行。）

根据多个匹配项将一个文件的列添加到另一个文件，同时保留不匹配项

Add column from one file to another based on multiple matches while retaining unmatched

bash

awk

join

git-bash