在 Perl 中,如何解析字段包含逗号分隔值的 CSV 文件?
In Perl, how can I parse a CSV file where the fields contain comma separated values?
有一个名为 file.csv
的 csv
文件,如下所示:(这只是一个示例)
"Name","Alias","Phone","email","address"
"rob","rob","534235","rob@example.com","US,UK"
"nik","nik","976784","nik@example.com,nik@foram.org","UK"
"picy","pic","327654,823747","pic@example.com","US"
在此文件中有 5 个 header,但少数 header 的值更多。任何 header 可以有任意数量的值意味着超过 2 或 3。
我正在尝试获得这样的输出:
Name Nickname Phone email address
rob rob 534235 rob@example.com US,UK
nik nik 976784 nik@example.com,nik@foram.org UK
picy pic 327654,823747 pic@example.com US
或任何特定列,但该列的数据将如上所示。
我知道 split
函数和限制 spliting
:
while (<$fh>)
{
my @data = split /,/, $_, 5;
}
但这在这里不起作用。
我怎样才能做到这一点?有什么想法吗?
有了更新的信息,您的问题的解决方案现在变得微不足道了:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV_XS;
use Text::Table::Tiny;
my $csv = Text::CSV_XS->new;
my @data = ( $csv->getline(\*DATA) ); #header
while (my $row = $csv->getline(\*DATA)) {
next unless @$row == @{ $data[0] };
push @data, $row;
}
print Text::Table::Tiny::table(
rows => \@data,
header_row => 1,
);
__DATA__
"Name","Alias","Phone","email","address"
"rob","rob","534235","rob@example.com","US,UK"
"nik","nik","976784","nik@example.com,nik@foram.org","UK"
"picy","pic","327654,823747","pic@example.com","US"
输出:
+------+-------+---------------+-------------------------------+---------+
| Name | Alias | Phone | email | address |
+------+-------+---------------+-------------------------------+---------+
| rob | rob | 534235 | rob@example.com | US,UK |
| nik | nik | 976784 | nik@example.com,nik@foram.org | UK |
| picy | pic | 327654,823747 | pic@example.com | US |
+------+-------+---------------+-------------------------------+---------+
您还可以通过使用 CSV 解析器解析每一行和每一行中的字段来创建嵌套数据结构:
while (my $row = $csv->getline(\*DATA)) {
next unless @$row == @{ $data[0] };
push @data, [
map [ $csv->parse($_) ? $csv->fields : () ], @$row
];
}
如果您的主要兴趣在于处理数据,而不仅仅是打印出来,这将非常有用。
有一个名为 file.csv
的 csv
文件,如下所示:(这只是一个示例)
"Name","Alias","Phone","email","address"
"rob","rob","534235","rob@example.com","US,UK"
"nik","nik","976784","nik@example.com,nik@foram.org","UK"
"picy","pic","327654,823747","pic@example.com","US"
在此文件中有 5 个 header,但少数 header 的值更多。任何 header 可以有任意数量的值意味着超过 2 或 3。
我正在尝试获得这样的输出:
Name Nickname Phone email address
rob rob 534235 rob@example.com US,UK
nik nik 976784 nik@example.com,nik@foram.org UK
picy pic 327654,823747 pic@example.com US
或任何特定列,但该列的数据将如上所示。
我知道 split
函数和限制 spliting
:
while (<$fh>)
{
my @data = split /,/, $_, 5;
}
但这在这里不起作用。
我怎样才能做到这一点?有什么想法吗?
有了更新的信息,您的问题的解决方案现在变得微不足道了:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV_XS;
use Text::Table::Tiny;
my $csv = Text::CSV_XS->new;
my @data = ( $csv->getline(\*DATA) ); #header
while (my $row = $csv->getline(\*DATA)) {
next unless @$row == @{ $data[0] };
push @data, $row;
}
print Text::Table::Tiny::table(
rows => \@data,
header_row => 1,
);
__DATA__
"Name","Alias","Phone","email","address"
"rob","rob","534235","rob@example.com","US,UK"
"nik","nik","976784","nik@example.com,nik@foram.org","UK"
"picy","pic","327654,823747","pic@example.com","US"
输出:
+------+-------+---------------+-------------------------------+---------+ | Name | Alias | Phone | email | address | +------+-------+---------------+-------------------------------+---------+ | rob | rob | 534235 | rob@example.com | US,UK | | nik | nik | 976784 | nik@example.com,nik@foram.org | UK | | picy | pic | 327654,823747 | pic@example.com | US | +------+-------+---------------+-------------------------------+---------+
您还可以通过使用 CSV 解析器解析每一行和每一行中的字段来创建嵌套数据结构:
while (my $row = $csv->getline(\*DATA)) {
next unless @$row == @{ $data[0] };
push @data, [
map [ $csv->parse($_) ? $csv->fields : () ], @$row
];
}
如果您的主要兴趣在于处理数据,而不仅仅是打印出来,这将非常有用。