删除非数字字符 perl

Remove non-digit characters perl

我有一个包含多个引号的文件,如下所示:

  <verse-no>quote</verse-no>
            <quote-verse>1:26,27 Man Created to Continually Develop</quote-verse>
            <quote>When Adam came from the Creator’s hand, he bore, in his physical, mental, and
                spiritual nature, a likeness to his Maker. “God created man in His own image”
                (Genesis 1:27), and it was His purpose that the longer man lived the more fully
                he should reveal this image—the more fully reflect the glory of the Creator. All
                his faculties were capable of development; their capacity and vigor were
                continually to increase. Ed 15
            </quote>

我想从 <quote-verse>.....</quote-verse> 行中删除所有字符串,这样最终结果将是 <quote>1:26,27</quote>.

我试过了perl -pi.bak -e 's#\D*$<\/quote-verse>#<\/quote-verse>#g' file.txt

这没有任何作用。我是 perl 的初学者(自学),经验不足 10 天。请告诉我出了什么问题以及如何进行。

你有 XML。因此你需要一个 XML 解析器。 XML::Twig 不错。 之所以有很多人说“不要使用正则表达式来解析 XML”,是因为它 确实 在有限的范围内工作。但是 XML 是一个规范,有些东西是有效的,有些则不是。如果您编写的代码建立在并非总是正确的假设之上,那么您最终得到的是脆弱的代码——如果有人将其完全有效的 XML 更改为略有不同但仍然存在的代码,那么有一天代码会在没有警告的情况下崩溃完全有效 XML。

考虑到这一点:

这个有效:

#!/usr/bin/perl
use strict;
use warnings;

use XML::Twig;

sub quote_verse_handler {
    my ( $twig, $quote ) = @_;
    my $text = $quote->text;
    $text =~ s/(\d)\D+$//;
    $quote->set_text($text);
}

my $parser = XML::Twig->new(
    twig_handlers => { 'quote-verse' => \&quote_verse_handler },
    pretty_print  => 'indented'
);


#$parser -> parsefile ( 'your_file.xml' );
local $/;
$parser->parse(<DATA>);
$parser->print;


__DATA__
<xml>
<verse-no>quote</verse-no>
        <quote-verse>1:26,27 Man Created to Continually Develop</quote-verse>
        <quote>When Adam came from the Creator's hand, he bore, in his physical, mental, and
            spiritual nature, a likeness to his Maker. "God created man in His own image"
            (Genesis 1:27), and it was His purpose that the longer man lived the more fully
            he should reveal this image-the more fully reflect the glory of the Creator. All
            his faculties were capable of development; their capacity and vigor were
            continually to increase. Ed 15
        </quote>
   </xml>

它的作用是 - 运行 通过您的文件。每次遇到 quote-verse 部分时,它都会调用处理程序,并给它 XML 的 'that bit' 来处理。我们应用正则表达式来截断该行的结尾位,然后相应地更新 XML。

解析完成后,我们吐出成品。

您可能想要替换:

local $/;
$parser -> parse ( <DATA> );

与:

$parser -> parsefile ( 'your_file_name' );

您可能还会发现:

$parser -> print_to_file( 'output_filename' ); 

有用。