Awk 脚本额外输出:打印原始行(读取时)以及处理后的行
Awk script extra output: printing raw line (as read) as well as processed line
我有一些 CSV 文件,其中某个列实际上应该是一个数组,但所有字段都用逗号分隔。我需要将文件转换为每个值都被引用的位置,并且数组列是一个引用的逗号分隔列表。我知道每个文件的列索引。
我写了下面的脚本来处理这个问题。但是,我按预期打印了每一行,但后面是原始行。
期望的输出:
A,B,C,D
"1","","a,b,c","2"
"3","4","","5"
"","5","d,e","6"
"7","8","f","9"
(base) balter@winmac:~/winhome/CancerGraph$ cat testfile
A,B,C,D
1,,a,b,c,2
3,4,,5
,5,d,e,6
7,8,f,9
(base) balter@winmac:~/winhome/CancerGraph$ ./fix_array_cols.awk FS="," array_col=3 testfile
A,B,C,D
"1","","a,b,c","2"
1,,a,b,c,2
"3","4","","5"
3,4,,5
"","5","d,e","6"
,5,d,e,6
"7","8","f","9"
7,8,f,9
(base) balter@winmac:~/winhome/CancerGraph$ cat fix_array_cols.awk
#!/bin/awk -f
BEGIN {
getline;
print [=11=];
num_cols = NF;
#printf("num_cols: %s, array_col: %s\n\n", num_cols, array_col);
}
NR>1 {
total_fields = NF;
# fields_before_array = (array_col - 1)
# fields_before_array + array_length + fields_after_array = NF
# fields_before_array + fields_after_array + 1 = num_cols
# array_length - 1 = total_fields - num_cols
# array_length = total_fields - num_cols + 1
# fields_after_array = total_fields - array_length - fields_before_array
# = total_fields - (total_fields - num_cols + 1) - (array_col - 1)
# = num_cols - array_col
fields_before_array = (array_col - 1);
array_length = total_fields - num_cols + 1;
fields_after_array = num_cols - array_col;
first_array_position = array_col;
last_array_position = array_col + array_length-1;
#printf("array_col: %s, fields_before_array: %s, array_length: %s, fields_after_array: %s, total_fields: %s, num_cols: %s", array_col, fields_before_array, array_length, fields_after_array, total_fields, num_cols)
### loop through fields before array column
### remove whitespace, and print surround with ""
for (i=1; i<array_col; i++)
{
gsub(/ /,"",$i);
printf("\"%s\",", $i);
}
### Collect array surrounded by ""
array_data = "";
### Loop through array
for (i=array_col ; i<array_col+array_length-1 ; i++)
{
gsub(/ /, "", $i);
array_data = array_data $i ",";
}
### collect last array element with no trailing ,
array_data = array_data $i
### print array surrounded by quotes
printf("\"%s\",", array_data);
### loop through remaining fields, remove whitespace, surround with ""
for (i=last_array_position+1 ; i<total_fields ; i++)
{
gsub(/ /,"",$i);
printf("\"%s\",", $i);
}
### finish line with \n
printf("\"%s\"\n", $total_fields);
} FILENAME
从您的脚本中删除 FILENAME
。
我有一些 CSV 文件,其中某个列实际上应该是一个数组,但所有字段都用逗号分隔。我需要将文件转换为每个值都被引用的位置,并且数组列是一个引用的逗号分隔列表。我知道每个文件的列索引。
我写了下面的脚本来处理这个问题。但是,我按预期打印了每一行,但后面是原始行。
期望的输出:
A,B,C,D
"1","","a,b,c","2"
"3","4","","5"
"","5","d,e","6"
"7","8","f","9"
(base) balter@winmac:~/winhome/CancerGraph$ cat testfile
A,B,C,D
1,,a,b,c,2
3,4,,5
,5,d,e,6
7,8,f,9
(base) balter@winmac:~/winhome/CancerGraph$ ./fix_array_cols.awk FS="," array_col=3 testfile
A,B,C,D
"1","","a,b,c","2"
1,,a,b,c,2
"3","4","","5"
3,4,,5
"","5","d,e","6"
,5,d,e,6
"7","8","f","9"
7,8,f,9
(base) balter@winmac:~/winhome/CancerGraph$ cat fix_array_cols.awk
#!/bin/awk -f
BEGIN {
getline;
print [=11=];
num_cols = NF;
#printf("num_cols: %s, array_col: %s\n\n", num_cols, array_col);
}
NR>1 {
total_fields = NF;
# fields_before_array = (array_col - 1)
# fields_before_array + array_length + fields_after_array = NF
# fields_before_array + fields_after_array + 1 = num_cols
# array_length - 1 = total_fields - num_cols
# array_length = total_fields - num_cols + 1
# fields_after_array = total_fields - array_length - fields_before_array
# = total_fields - (total_fields - num_cols + 1) - (array_col - 1)
# = num_cols - array_col
fields_before_array = (array_col - 1);
array_length = total_fields - num_cols + 1;
fields_after_array = num_cols - array_col;
first_array_position = array_col;
last_array_position = array_col + array_length-1;
#printf("array_col: %s, fields_before_array: %s, array_length: %s, fields_after_array: %s, total_fields: %s, num_cols: %s", array_col, fields_before_array, array_length, fields_after_array, total_fields, num_cols)
### loop through fields before array column
### remove whitespace, and print surround with ""
for (i=1; i<array_col; i++)
{
gsub(/ /,"",$i);
printf("\"%s\",", $i);
}
### Collect array surrounded by ""
array_data = "";
### Loop through array
for (i=array_col ; i<array_col+array_length-1 ; i++)
{
gsub(/ /, "", $i);
array_data = array_data $i ",";
}
### collect last array element with no trailing ,
array_data = array_data $i
### print array surrounded by quotes
printf("\"%s\",", array_data);
### loop through remaining fields, remove whitespace, surround with ""
for (i=last_array_position+1 ; i<total_fields ; i++)
{
gsub(/ /,"",$i);
printf("\"%s\",", $i);
}
### finish line with \n
printf("\"%s\"\n", $total_fields);
} FILENAME
从您的脚本中删除 FILENAME
。