为什么来自子进程的管道有时会中断,有时不会?

Why do Pipes from Child Processes break sometimes and sometimes not?

我正在用 Perl 开发一个系统,它有一个主脚本,需要执行几个不同的工作脚本,这些工作脚本同时在系统上执行不同的任务。
这些脚本也不需要用 Perl 编写。它们可以是任何可在命令行上执行的程序和 returns STDOUTSTDERR 以及 EXITCODE.

上的结果

所以我做了一个小示例脚本 hello.pl:

#!/usr/bin/perl

print STDERR "HELLO ERROR 0 !\n";

print "hello script!!\n";

print "hello waiting 3 sec ...\n";

sleep 3;

print "hello list:\n";
print "cmd: 'ls -lah'\n";

print `ls -lah`;

print "res: '$?'\n";    

print "hello waiting 2 sec ...\n";

sleep 2;    

print STDERR "HELLO ERROR 1 !\n";

主脚本一方面需要捕获 STDOUT 的结果,另一方面需要捕获 STDERR 中的错误通知。

Master Script fork 以非阻塞模式执行 Worker Script,并使用 waitpidselect 监控 Worker Script 进度。

所以 master.pl 脚本的代码是这样的:

#!/usr/bin/perl

use strict;
use warnings;

use IO::Select;
use POSIX ":sys_wait_h";


sub Launch
{
  my $rprocess = shift;
  my $irs = 0;


  local *logreader;
  local *errorreader;
  my $logwriter   = undef;
  my $errorwriter = undef;

  pipe( *logreader,   $logwriter );
  pipe( *errorreader, $errorwriter );

  #Spawn the Child Process
  my $iprcpid = fork();

  #Check the Success of Process Forking
  if(defined $iprcpid)
  {           
    #------------------------
    #Sub Process Launch succeeded

    # Check whether parent/child process
    if($iprcpid > 0)
    {
      #------------------------
      #Parent Process

      close($logwriter);
      close($errorwriter);

      $rprocess->{"pid"} = $iprcpid;
      $rprocess->{"log_pipe"} = *logreader;
      $rprocess->{"error_pipe"} = *errorreader;

      $irs = 1;
    }
    elsif($iprcpid == 0)
    {
      #------------------------
      #Child Process

      my $ierr = 0;

      close(*logreader);
      close(*errorreader);

      open( STDOUT, ">&=", $logwriter );
      open( STDERR, ">&=", $errorwriter );


      #------------------------
      #Execute the configured Command

      print "cmd: '" . $rprocess->{"file"} . "'\n";

      print "cmd rng ...\n";

      print `$rprocess->{"file"}`; 

      $ierr = $?;

      print "closing transmission ...\n";   

      close STDOUT;
      close STDERR;

      exit $ierr; 
    }
    else    #An Error has ocurred in the Sub Process Launch
    {
        # Unable to fork
        print "ERROR: Sub Process '" . $rprocess->{"name"} . "' Launch failed: $!\n";
    }   #if($iprcpid > 0)
  }
  else    #An Error has ocurred in the Process Spawning   
  {
    # Unable to fork
    print "ERROR: Sub Process '" . $rprocess->{"name"} . "' Launch failed: $!\n";
  }  #if(defined $iprcpid)

  return $irs;
}

sub Read
{
  my $rprocess = shift;

  my $ppsel = $rprocess->{"select"};
  my $pplog = $rprocess->{"log_pipe"};
  my $pperr = $rprocess->{"error_pipe"};
  my @arrppselrdy = undef;
  my $pprdy = undef;
  my $srdln = "";
  my $irdcnt = 0;

  unless(defined $ppsel)
  {
    $ppsel = IO::Select->new();

    $ppsel->add($pplog);
    $ppsel->add($pperr);

    #Store the Selector Object
    $rprocess->{"select"} = $ppsel;
  }
  else  #The Selector was already created
  {
    $ppsel = $rprocess->{"select"};
  }  #unless(defined $ppsel)    

  while(@arrppselrdy = $ppsel->can_read(1))
  {
    foreach $pprdy (@arrppselrdy)
    {
      $irdcnt = sysread($pprdy, $srdln, 8192);

      if(defined $irdcnt)
      {
        if($irdcnt > 0)
        {
          if(fileno($pprdy) == fileno($pplog))
          {
            $rprocess->{"log"} .= "pipe (" . fileno($pprdy) . "): reading report ...\n";

            $rprocess->{"log"} .= $srdln;
          }
          elsif(fileno($pprdy) == fileno($pperr))
          {
            $rprocess->{"log"} .= "pipe (" . fileno($pprdy) . "): reading error ...\n";

            $rprocess->{"error"} .= $srdln;
          }   #if(fileno($pprdy) == fileno($pplog))
        }
        else    #End of Transmission
        {
          $rprocess->{"log"} .= "pipe (" . fileno($pprdy) . "): transmission done.\n";

          #Remove the Pipe File Handle
          $ppsel->remove($pprdy);  

        } #if($irdcnt > 0)
      }
      else  #Reading from the Pipe failed
      {
        #Remove the Pipe File Handle
        $ppsel->remove($pprdy);

        if($!)
        {
          $rprocess->{"error"} .= "ERROR: Sub Process " . $rprocess->{"name"} 
            . ": Read failed with [" . ($! * 1) . "]!\n"
            . "Message: '$!'\n";
        }  #if($!)
      }  #if(defined $irdcnt)
    }  #foreach $pprdy (@arrppselrdy)
  } #while(@arrppselrdy = $ppsel->can_read(1))

  $rprocess->{"log"} .= "try read done. '" . $ppsel->count . "' pipes left.\n";
}

sub Check
{
  my $rprocess = shift;
  my $irng = 0;

  if($rprocess->{"pid"} > 0
    && $rprocess->{"status"} < 0)
  {
    my $ifinishpid = waitpid($rprocess->{"pid"}, WNOHANG);

    if($ifinishpid > -1)
    {
      if($ifinishpid == 0)
      {
        $irng = 1;
      }
      else
      {
        $rprocess->{"status"} = $?;
      }

      #Read the Message Pipes
      Read($rprocess);

      if($ifinishpid > 0)
      {
        close $rprocess->{"log_pipe"};
        close $rprocess->{"error_pipe"};
      }
    }
    else  #Sub Process does not exist
    {
      $rprocess->{"error"} .= "ERROR: Sub Process does not exist!\n";
    }  #if($ifinishpid > 0)
  }  #if($rprocess->{"pid"} > 0 && $rprocess->{"status"} < 0)

  return $irng;
}



my %hshprocesses = ("hello1" => {"name" => "hello1", "pid" => -1, "file" => "./hello.pl"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1}
  , "hello2" => {"name" => "hello2", "pid" => -1, "file" => "sleep 3 ; ./hello2.pl"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1}
  , "hello3" => {"name" => "hello3", "pid" => -1, "file" => "./hello3.pl ; sleep 2"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1}
  , "hello4" => {"name" => "hello4", "pid" => -1, "file" => "./hello4.pl"
    , "log_pipe" => undef, "error_pipe" => undef, "select" => undef
    , "log" => "", "error" => "", "status" => -1});


print "prcs launching ...\n";

foreach (keys %hshprocesses)
{
  unless(Launch($hshprocesses{$_}))
  {
    print "prc '" . $hshprocesses{$_}{"name"} . "': Launch failed!\n";
  }
}  #foreach (keys %hshprocesses)

print "prcs launched.\n";

my $irunningcount = 0;

do
{
  $irunningcount = 0;

  foreach (keys %hshprocesses)
  {
    $irunningcount++ if(Check($hshprocesses{$_}));
  }  #foreach (keys %hshprocesses)

  print "prc rng cnt: '$irunningcount'\n";
}
while($irunningcount > 0);

foreach (keys %hshprocesses)
{
  print "process (" . $hshprocesses{$_}{"pid"} . ") '" . $hshprocesses{$_}{"name"} . "':\n";
  print "status [" . $hshprocesses{$_}{"status"} . "]\n";
  print "log:\n"
    . "'" . $hshprocesses{$_}{"log"} . "'\n";
  print "error:\n"
    . "'" . $hshprocesses{$_}{"error"} . "'\n";
}  #foreach (keys %hshprocesses)

所以阅读 logreadererrorreader 我想我能够捕获整个输出。

预期的报告是这样的:

$ ./master.pl
prcs launching ...
prcs launched.
prc rng cnt: '4'
prc rng cnt: '3'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '1'
prc rng cnt: '0'
process (2742) 'hello4':
status [0]
log:
'pipe (3): reading report ...
cmd: './hello4.pl'
cmd rng ...
pipe (5): reading error ...
try read done. '2' pipes left.
pipe (5): reading error ...
pipe (3): reading report ...
hello4 script!!
hello waiting 3 sec ...
hello4 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
pipe (3): reading report ...
closing transmission ...
pipe (5): transmission done.
pipe (3): transmission done.
try read done. '0' pipes left.
try read done. '0' pipes left.
'
error:
'HELLO4 ERROR 0 !
HELLO4 ERROR 1 !
'
process (2743) 'hello1':
status [0]
log:
'pipe (4): reading report ...
cmd: 'sleep 3 ; ./hello.pl'
cmd rng ...
try read done. '2' pipes left.
pipe (7): reading error ...
try read done. '2' pipes left.
try read done. '2' pipes left.
pipe (7): reading error ...
pipe (4): reading report ...
hello script!!
hello waiting 3 sec ...
hello list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (7): transmission done.
pipe (4): transmission done.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
try read done. '0' pipes left.
'
error:
'HELLO ERROR 0 !
HELLO ERROR 1 !
'
process (2745) 'hello3':
status [0]
log:
'pipe (6): reading report ...
cmd: './hello3.pl ; sleep 2'
cmd rng ...
pipe (9): reading error ...
try read done. '2' pipes left.
pipe (9): reading error ...
try read done. '2' pipes left.
pipe (6): reading report ...
hello3 script!!
hello waiting 3 sec ...
hello3 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (9): transmission done.
pipe (6): transmission done.
try read done. '0' pipes left.
'
error:
'HELLO3 ERROR 0 !
HELLO3 ERROR 1 !
'
process (2746) 'hello2':
status [0]
log:
'pipe (8): reading report ...
cmd: './hello2.pl'
cmd rng ...
pipe (11): reading error ...
try read done. '2' pipes left.
pipe (8): reading report ...
hello2 script!!
hello waiting 3 sec ...
hello2 list:
cmd: 'ls -lah'
total 24K
drwxr-xr-x  2 bodo bodo   90 may  5 08:23 .
drwxr-xr-x 11 bodo bodo  128 may  4 18:49 ..
-rwxr-xr-x  1 bodo bodo  307 may  4 22:33 hello2.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:37 hello3.pl
-rwxr-xr-x  1 bodo bodo  307 may  4 22:34 hello4.pl
-rwxr-xr-x  1 bodo bodo  303 may  4 18:50 hello.pl
-rwxr-xr-x  1 bodo bodo 5,7K may  5 08:23 master.pl
res: '0'
hello waiting 2 sec ...
closing transmission ...
pipe (11): reading error ...
pipe (8): transmission done.
pipe (11): transmission done.
try read done. '0' pipes left.
'
error:
'HELLO2 ERROR 0 !
HELLO2 ERROR 1 !
'

但这只有在只有 1 个 Worker Script 时才有效。
如果 Master Script 尝试读取多个 Worker Scripts,则此传输管道会在执行 Worker Script hello.pl 之前由 Child 关闭。 最后读到的是:

        cmd: './hello.pl'
        cmd rng ...

主脚本使用 IO::Select 模块通过选择器对象从 logreadererrorreader 中读取。
这样我在进度日志中找到:

        cmd: './hello.pl'
        cmd rng ...
        pipe (8): transmission done.

但是 hello.pl 中的所有内容都丢失了。

有时我仍然会在错误报告

中找到hello.plSTDERR的第一行
HELLO ERROR 0 !

为什么管道有时会断有时不会?
我怎样才能防止这种情况发生?

现在使用此测试脚本我无法重现 Broken Pipe 的错误。
但即使在真实世界的应用程序中,它有时会发生,有时不会。
我在使用 perl v5.10 和 4 个进程 3 运行 和 1 个中断的系统上开发了这个应用程序。特别是如果它执行 sleep 调用。

什么情况下管道会破裂?


编辑 2018-05-05: 我用 perl v5.24.1

构建了主脚本 master.pl 和 运行

我认为您必须使用 wait 而不是 waitpidwait 等待任何 child 而 waitpid 只等待一个特定的 child.

由于没有人能在 1 个月内给我任何有用的答案甚至提示,我将 post 我所发现的关于 破管现象 的顺序帮助可能面临类似问题的其他人。


我终于找到了 管道破裂 的现象。
我添加了足够长的休眠时间,让我可以使用 strace 命令跟踪系统 Activity,它实际上记录了 Broken Pipe 错误:

$ strace -p 11729
Process 11729 attached
restart_syscall(<... resuming interrupted call ...>) = 0
write(1, "hello script!!\nhello waiting 20 "..., 67) = 67
pipe([3, 4])                            = 0
pipe([5, 6])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe670d5f9d0) = 13238
close(6)                                = 0
close(4)                                = 0
read(5, "", 4)                          = 0
close(5)                                = 0
ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff334b57b0) = -1 EINVAL (Invalid argument)
lseek(3, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
read(3, "total 228K\ndrwxr-xr-x.  4 usr15 "..., 4096) = 1868
read(3, "", 4096)                       = 0
close(3)                                = 0
rt_sigaction(SIGHUP, {SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0}, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7fe66f8e27e0}, {SIG_DFL, [], 0}, 8) = 0
wait4(13238, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 13238
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=13238, si_status=0, si_utime=0, si_stime=0} ---
rt_sigaction(SIGHUP, {SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0}, NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7fe66f8e27e0}, NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({30, 0}, 0x7fff334b5b90)      = 0
write(2, "HELLO ERROR 1 !\n", 16)       = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=11729, si_uid=501} ---
+++ killed by SIGPIPE +++

它记录了 hello.pl 正确完成并结束写入错误消息 write(2, "HELLO ERROR 1 !\n", 16) = -1
但是它发现管道坏了,导致错误 EPIPE (Broken pipe) 导致它崩溃 +++ killed by SIGPIPE +++

所以问题的答案
什么情况下管道会破裂?

  • 如果分叉的子进程已经终止

但是为什么在管道损坏之前我无法读取输出?

剪切输出的现象

    cmd: './hello.pl'
    cmd rng ...
    pipe (8): transmission done.
  • 分叉子进程中 print() 组合的结果 和父进程中的 sysread()
    输出是 实际上在Child端缓冲,最后写入Parent 边.

为什么要在接收到输出之前终止子进程?

  • 这来自实际中与Objecthood的结合 实现 Object::DESTROY 的真实世界应用程序 方法。
    该方法包括一个 kill() 静态清理逻辑 运行 理论上不应触发的子进程。
    但在真实世界的应用程序中,过程不会立即 阅读并收获是什么导致他们被 Object::DESTROY 逻辑。
    但是我在执行日志中找不到任何东西,因为它的对象已经被 Perl 的垃圾收集器.
  • 销毁了

因此,一些进程被杀死,而另一些则没有。有时他们会被杀死,但有时不会。