Awk 脚本额外输出:打印原始行(读取时)以及处理后的行

Awk script extra output: printing raw line (as read) as well as processed line

我有一些 CSV 文件,其中某个列实际上应该是一个数组,但所有字段都用逗号分隔。我需要将文件转换为每个值都被引用的位置,并且数组列是一个引用的逗号分隔列表。我知道每个文件的列索引。

我写了下面的脚本来处理这个问题。但是,我按预期打印了每一行,但后面是原始行。

期望的输出:

A,B,C,D
"1","","a,b,c","2"
"3","4","","5"
"","5","d,e","6"
"7","8","f","9"
(base) balter@winmac:~/winhome/CancerGraph$ cat testfile
A,B,C,D
1,,a,b,c,2
3,4,,5
,5,d,e,6
7,8,f,9
(base) balter@winmac:~/winhome/CancerGraph$ ./fix_array_cols.awk FS="," array_col=3 testfile
A,B,C,D
"1","","a,b,c","2"
1,,a,b,c,2
"3","4","","5"
3,4,,5
"","5","d,e","6"
,5,d,e,6
"7","8","f","9"
7,8,f,9
(base) balter@winmac:~/winhome/CancerGraph$ cat fix_array_cols.awk
#!/bin/awk -f

BEGIN {
        getline;
        print [=11=];
        num_cols = NF;

        #printf("num_cols: %s, array_col: %s\n\n", num_cols, array_col);
}
NR>1 {
        total_fields = NF;

        # fields_before_array = (array_col - 1)
        # fields_before_array + array_length + fields_after_array = NF
        # fields_before_array + fields_after_array + 1 = num_cols
        # array_length - 1 = total_fields - num_cols
        # array_length = total_fields - num_cols + 1
        # fields_after_array = total_fields - array_length - fields_before_array
        #                    = total_fields - (total_fields - num_cols + 1) - (array_col - 1)
        #                    = num_cols - array_col
        fields_before_array = (array_col - 1);
        array_length = total_fields - num_cols + 1;
        fields_after_array = num_cols - array_col;
        first_array_position = array_col;
        last_array_position = array_col + array_length-1;

        #printf("array_col: %s, fields_before_array: %s, array_length: %s, fields_after_array: %s, total_fields: %s, num_cols: %s", array_col, fields_before_array, array_length, fields_after_array, total_fields, num_cols)

        ### loop through fields before array column
        ### remove whitespace, and print surround with ""
        for (i=1; i<array_col; i++)
        {
          gsub(/ /,"",$i);
          printf("\"%s\",", $i);
        }

        ### Collect array surrounded by ""
        array_data = "";

        ### Loop through array
        for (i=array_col ; i<array_col+array_length-1 ; i++)
        {
          gsub(/ /, "", $i);
          array_data = array_data $i ",";
        }

        ### collect last array element with no trailing ,
        array_data = array_data $i

        ### print array surrounded by quotes
        printf("\"%s\",", array_data);

        ### loop through remaining fields, remove whitespace, surround with ""
        for (i=last_array_position+1 ; i<total_fields ; i++)
        {
          gsub(/ /,"",$i);
          printf("\"%s\",", $i);
        }

        ### finish line with \n
        printf("\"%s\"\n", $total_fields);


} FILENAME

从您的脚本中删除 FILENAME