如何将引号中存在的定界符值替换为文件中数据的一部分
how to replace delimiter value present within quotes as part of data in file
我想替换作为每条记录数据一部分的定界符。对于 Ex-
echo '"hi","how,are,you","bye"'|sed -nE 's/"([^,]*),([^,]*),([^,]*)"/";;"/gp'
输出 -->
"hi","how;are;you","bye"
因此,我可以用分号替换数据中存在的分隔符(在本例中为逗号)。
但挑战在于,我们不确定 delmiter 会实时出现多少次,而且它也可能出现在多个领域。
对于 Ex-
"1","2,3,4,5","6","7,8"
"1","2,4,5","6","7,8,9"
"1","4,5","6","7,8,9.2"
这些都是有效记录。
有人可以帮我从这里出去吗。我们如何编写通用代码来处理这个问题?
假设数据不包含嵌入的双引号...
示例数据:
$ cat delim.dat
"hi","how,are,you","bye"
"1","2,3,4,5","6","7,8"
"1","2,4,5","6","7,8,9"
"1","4,5","6","7,8,9.2"
一个 awk
想法,我们在偶数字段中用 ;
替换 ,
:
awk '
BEGIN { FS=OFS="\"" }
{ for (i=2;i<=NF;i=i+2) gsub(",",";",$i) }
1
' delim.dat
这会生成:
"hi","how;are;you","bye"
"1","2;3;4;5","6","7;8"
"1","2;4;5","6","7;8;9"
"1","4;5","6","7;8;9.2"
除了最琐碎的 CSV 数据外,我更喜欢使用直接理解格式的东西,而不是乱用正则表达式来尝试处理引用字段之类的东西。例如(警告:提前公然自我提升!),我的基于 tcl
的类似 awk
的实用程序 tawk,我编写它的部分原因是为了更容易操作 CSV 文件:
$ tawk -csv -quoteall '
line {
for {set n 1} {$n <= $NF} {incr n} {
set F($n) [string map {, \;} $F($n)]
}
print
}' input.csv
"hi","how;are;you","bye"
"1","2;3;4;5","6","7;8"
"1","2;4;5","6","7;8;9"
"1","4;5","6","7;8;9.2"
或者使用 Text::CSV_XS
模块的 perl
方法:
$ perl -MText::CSV_XS -e '
my $csv = Text::CSV_XS->new({binary=>1, always_quote=>1});
while (my $row = $csv->getline(\*STDIN)) {
tr/,/;/ foreach @$row;
$csv->say(\*STDOUT, $row);
}' < input.csv
"hi","how;are;you","bye"
"1","2;3;4;5","6","7;8"
"1","2;4;5","6","7;8;9"
"1","4;5","6","7;8;9.2"
我想替换作为每条记录数据一部分的定界符。对于 Ex-
echo '"hi","how,are,you","bye"'|sed -nE 's/"([^,]*),([^,]*),([^,]*)"/";;"/gp'
输出 -->
"hi","how;are;you","bye"
因此,我可以用分号替换数据中存在的分隔符(在本例中为逗号)。 但挑战在于,我们不确定 delmiter 会实时出现多少次,而且它也可能出现在多个领域。 对于 Ex-
"1","2,3,4,5","6","7,8"
"1","2,4,5","6","7,8,9"
"1","4,5","6","7,8,9.2"
这些都是有效记录。 有人可以帮我从这里出去吗。我们如何编写通用代码来处理这个问题?
假设数据不包含嵌入的双引号...
示例数据:
$ cat delim.dat
"hi","how,are,you","bye"
"1","2,3,4,5","6","7,8"
"1","2,4,5","6","7,8,9"
"1","4,5","6","7,8,9.2"
一个 awk
想法,我们在偶数字段中用 ;
替换 ,
:
awk '
BEGIN { FS=OFS="\"" }
{ for (i=2;i<=NF;i=i+2) gsub(",",";",$i) }
1
' delim.dat
这会生成:
"hi","how;are;you","bye"
"1","2;3;4;5","6","7;8"
"1","2;4;5","6","7;8;9"
"1","4;5","6","7;8;9.2"
除了最琐碎的 CSV 数据外,我更喜欢使用直接理解格式的东西,而不是乱用正则表达式来尝试处理引用字段之类的东西。例如(警告:提前公然自我提升!),我的基于 tcl
的类似 awk
的实用程序 tawk,我编写它的部分原因是为了更容易操作 CSV 文件:
$ tawk -csv -quoteall '
line {
for {set n 1} {$n <= $NF} {incr n} {
set F($n) [string map {, \;} $F($n)]
}
print
}' input.csv
"hi","how;are;you","bye"
"1","2;3;4;5","6","7;8"
"1","2;4;5","6","7;8;9"
"1","4;5","6","7;8;9.2"
或者使用 Text::CSV_XS
模块的 perl
方法:
$ perl -MText::CSV_XS -e '
my $csv = Text::CSV_XS->new({binary=>1, always_quote=>1});
while (my $row = $csv->getline(\*STDIN)) {
tr/,/;/ foreach @$row;
$csv->say(\*STDOUT, $row);
}' < input.csv
"hi","how;are;you","bye"
"1","2;3;4;5","6","7;8"
"1","2;4;5","6","7;8;9"
"1","4;5","6","7;8;9.2"