perl

Question

我是 Perl 的新手，我想根据输入文件中的列名创建输出文件的名称。假设我的输入文件头如下：

#identifier    (%)composition

我希望我的输出文件名为 identifier_composition。这些 identifiers 和 compositions 可以是字母数字字符的序列，例如 #E2FAR4 用于标识符或 (%)MhDE4 用于组合。对于此示例，输出文件名应为 E2FAR4_MhDE4。到目前为止，我能够得到 identifier 但不能得到 composition。这是我尝试过的代码：

if ($line =~ /^#\s*(\S+)\t\(%)s*(\S+)/){
    my $ID = ;
    my $comp = ;
    my $out_file = "${ID}_${comp}"
}

但我也将 identifier 作为第二个参数。任何帮助将不胜感激。

Answer 1

使用下面的正则表达式

^#\s*(\S+)\t\(%\)(\S+)

Demo

示例代码：

#!/usr/bin/perl
use strict;
use warnings;
while(<DATA>){
    my $line = $_;
    chomp $line;
    if ($line =~ /^#\s*(\S+)\t\(%\)(\S+)/){
        my $ID = ;
        my $comp = ;
        my $out_file = "${ID}_${comp}";
        print "Filename: $out_file";
    }
}

__DATA__
#identifier (%)composition

输出：

Filename: identifier_composition

Answer 2

看来您对正则表达式考虑过度了。您正在寻找由一些非单词字符分隔的两个单词字符序列。

if ($line =~ /(\w+)\W+(\w+)/) {
  say " / ";
}

一种更简单的方法是匹配所有单词字符序列：

if (my @words = $line =~ /(\w+)/g) {
  say join ' / ', @words;
}

更新： 我将你的正则表达式放入此 regex explainer。结果如下：

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  #                        '#'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  \t                       '\t' (tab)
--------------------------------------------------------------------------------
  \^                       '^'
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    %                        '%'
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  s*                       's' (0 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of

我认为您最大的问题是您试图在正则表达式中间匹配的文字 ^，但是 % 周围的未转义括号也是一个问题。 s* 毫无意义且令人困惑:-)

perl - 从列名创建文件名

perl - create file name from column names

regex

filenames