Tcl [exec] process leaves zombie 如果进程分叉并退出

Question

我有一个案例，Tcl 脚本运行是一个执行 fork() 的进程，将分叉进程留给运行，然后主进程退出。您可以简单地通过运行ning 任何分叉到后台的程序来尝试它，例如 gvim，前提是它在执行后在后台配置为运行：set res [exec gvim]。

主进程理论上立即退出，子进程运行在后台，但不知何故主进程挂了，不退出，一直处于僵尸状态（报告为<defunct> ps输出）。

在我的例子中，我开始的过程打印了一些东西，我想要那个东西，我想要这个过程退出，我说它完成了。问题是，如果我使用 open "|gvim" r 生成进程，那么我也无法识别进程完成的时刻。 [open] 返回的 fd 永远不会报告 [eof]，即使程序变成僵尸。当我尝试 [read]，只是为了读取进程可能打印的所有内容时，它完全挂断了。

更有趣的是，偶尔主进程和分叉进程都会打印一些东西，当我尝试使用 [gets] 读取它时，我得到了两者。如果我过早关闭描述符，则 [close] 会因管道损坏而抛出异常。可能这就是为什么 [read] 永远不会结束。

我需要一些方法来识别 main 进程退出的时刻，虽然这个进程可能已经产生了另一个子进程，但这个子进程可能完全分离并且我'我对它的作用不感兴趣。我想要一些主进程在退出前打印出来的东西，脚本应该继续它的工作，而后台进程运行ning 也在运行ning 并且我对它发生了什么不感兴趣。

我可以控制我正在启动的流程的来源。是的，我在 fork() 之前做了 signal(SIGCLD, SIG_IGN) - 没有帮助。

Answer 1

您的守护进程还可以调用 setsid() 和 setpgrp() 来启动新会话并从进程组中分离。但是这些对你的问题也没有帮助。

您将需要进行一些流程管理：

#!/usr/bin/tclsh

proc waitpid {pid} {
  set rc [catch {exec -- kill -0 $pid}]
  while { $rc == 0 } {
    set ::waitflag 0
    after 100 [list set ::waitflag 1]
    vwait ::waitflag
    set rc [catch {exec -- kill -0 $pid}]
  }
}

set pid [exec ./t1 &]
waitpid $pid
puts "exit tcl"
exit

编辑：又一个不合理的回答

如果分叉的子进程关闭打开的通道，Tcl 将不会等待它。

测试程序：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>

int
main (int argc, char *argv [])
{
  int   pid;
  FILE  *o;

  signal (SIGCHLD, SIG_IGN);
  pid = fork ();
  if (pid == 0) {
    /* should also call setsid() and setpgrp() to daemonize */
    printf ("child\n");
    fclose (stdout);
    fclose (stderr);
    sleep (10);
    o = fopen ("/dev/tty", "w");
    fprintf (o, "child exit\n");
    fclose (o);
  } else {
    printf ("parent\n");
    sleep (2);
  }
  printf ("t1 exit %d\n", pid);
  return 0;
}

测试 Tcl 程序：

#!/usr/bin/tclsh

puts [exec ./t1]
puts "exit tcl"

Answer 2

Tcl 在 下一次 调用 exec 时从后台进程调用中清除僵尸程序。由于僵尸确实不使用太多资源（只是进程中的一个条目 table；实际上没有其他东西），所以并不急于清理它们。

您在管道方面遇到的问题是您没有将其置于 non-blocking 模式。要检测管道的退出，您最好使用 fileevent ，当或者有一个字节（或更多）要从管道读取时触发或当管道的另一端关闭时。要区分这些情况，您必须实际尝试阅读，如果您 over-read 而您不在 non-blocking 模式下，这可能会阻塞。但是，Tcl 使使用 non-blocking 模式变得容易。

set pipeline [open |… "r"]
fileevent $pipeline readable [list handlePipeReadable $pipeline]
fconfigure $pipeline -blocking false

proc handlePipeReadable {pipe} {
    if {[gets $pipe line] >= 0} {
        # Managed to actually read a line; stored in $line now
    } elseif {[eof $pipe]} {
        # Pipeline was closed; get exit code, etc.
        if {[catch {close $pipe} msg opt]} {
            set exitinfo [dict get $opt -errorcode]
        } else {
            # Successful termination
            set exitinfo ""
        }
        # Stop the waiting in [vwait], below
        set ::donepipe $pipeline
    } else {
        # Partial read; things will be properly buffered up for now...
    }
}

vwait ::donepipe

请注意，在管道中使用 gvim 是......比平时更复杂，因为它是用户与之交互的应用程序。

您可能会发现在单独的线程中运行一个简单的 exec 更容易，前提是您的 Tcl 版本是 thread-enabled 并且安装了 Thread 包。（如果您使用的是 8.6，应该是这样，但我不知道是不是这样。）

package require Thread

set runner [thread::create {
    proc run {caller targetVariable args} {
        set res [catch {
            exec {*}$args
        } msg opt]
        set callback [list set $targetVariable [list $res $msg $opt]]
        thread::send -async $caller $callback
    }
    thread::wait
}]

proc runInBackground {completionVariable args} {
    global runner
    thread::send -async $runner [list run [thread::id] $completionVariable {*}$args]
}

runInBackground resultsVar gvim …
# You can do other things at this point

# Wait until the variable is set (by callback); alternatively, use a variable trace    
vwait resultsVar

# Process the results to extract the sense
lassign $resultsVar res msg opt
puts "code: $res"
puts "output: $msg"
puts "status dictionary: $opt"

尽管如此，对于像 gvim 这样的编辑器，我 实际上 希望它运行在前台（这不需要任何一样复杂的东西）因为只有其中一个可以真正与特定终端一次交互。

Answer 3

一开始你说：

I need some method to recognize the moment when the main process exits, while this process could have spawned another child process, but this child process may be completely detached and I'm not interested what it does.

稍后你说：

If the forked child process closes the open channels, Tcl will not wait on it.

这是两个相互矛盾的说法。一方面你只对 parent 进程感兴趣，另一方面你是否认为 child 已经完成即使你也声明你对 child 进程不感兴趣已经分离.最后我听说分叉和关闭 parents stdin、stdout 和 stderr 的 childs 副本正在分离（i.e.daemonizing child 进程）。我将这个快速程序写到运行上面包含简单的 c 程序，正如预期的那样，tcl 对 child 过程一无所知。我调用了程序的编译版本/tmp/compile/chuck。我没有 gvim，所以我使用了 emacs，但由于 emacs 不生成文本，我将 exec 包装在它自己的 tcl 脚本中并执行它。在这两种情况下，都会等待 parent 进程并检测到 eof。当 parent 退出 Runner::getData 运行并评估清理。

#!/bin/sh
exec /opt/usr8.6.3/bin/tclsh8.6  "[=10=]" ${1+"$@"}

namespace eval  Runner {
    variable close
    variable watch
    variable lastpid ""
    array set close {}
    array set watch {}


    proc run { program { message "" }  } {
        variable watch
        variable close
        variable lastpid
        if { $message ne "" } {
            set fname "/tmp/[lindex $program 0 ]-[pid].tcl" 
            set out [ open $fname "w" ]
            puts $out "#![info nameofexecutable]"
            puts $out " catch { exec $program } err "
            puts $out "puts \"$err\n$message\""
            close $out
            file attributes $fname -permissions 00777
            set fd [ open "|$fname " "r" ]
            set close([pid $fd]) "file delete -force $fname "
        } else {
            set fd [ open "|$program" "r" ]
            set close([pid $fd]) "puts \"cleanup\""
        } 
        fconfigure $fd -blocking 0 -buffering none
        fileevent $fd  readable [ list Runner::getData [ pid $fd ] $fd ]
    }

    proc getData { pid chan } {
        variable watch
        variable close
        variable lastpid
        set data [read $chan]
        append watch($pid)  "$data"
        if {[eof $chan]} {
            catch { close $chan }
            eval $close($pid) ; # cleanup
            set lastpid $pid
        }
    }
}
Runner::run /tmp/compile/chuck ""
Runner::run emacs   " Emacs complete"

while { 1 } {
    vwait Runner::lastpid
    set p $Runner::lastpid
    catch { exec ps -ef | grep chuck } output
    puts "program with pid $p just  ended" 
    puts "$Runner::watch($p)"
    puts " processes that match chuck "
    puts "$output" 
}

输出：请注意，在 child 报告它正在退出后，我退出了 emacs。

 [user1@linuxrocks workspace]$ ./test.tcl
 cleanup
 program with pid 27667 just  ended
 child
 parent
 t1 exit 27670
  processes that match chuck  avahi      936     1  0  2016 ? 
   00:04:35 avahi-daemon: running [linuxrocks.local] admin    27992     1  0
   19:37 pts/0    00:00:00 /tmp/compile/chuck admin    28006 27988  0
   19:37 pts/0    00:00:00 grep chuck

 child exit
 program with pid 27669 just  ended

  Emacs complete

Answer 4

好的，我在这里经过长时间的讨论找到了解决方案：

https://groups.google.com/forum/#!topic/comp.lang.tcl/rtaTOC95NJ0

下面的脚本演示了如何解决这个问题：

#!/usr/bin/tclsh 

lassign [chan pipe] input output 
chan configure $input -blocking no -buffering line ;# just for a case :) 

puts "Running $argv..." 
set ret [exec {*}$argv 2>@stderr >@$output] 
puts "Waiting for finished process..." 
set line [gets $input] 
puts "FIRST LINE: $line" 
puts "DONE. PROCESSES:" 
puts [exec ps -ef | grep [lindex $argv 0]] 
puts "EXITING."

剩下的唯一问题是仍然无法知道进程已经退出，但是下一个 [exec]（在这种特殊情况下可能是 [exec ps...] 命令执行了此操作）清理僵尸（对此没有通用的方法 - 在 POSIX 系统上你能做的最好的是 [exec /bin/true]）。在我的例子中，我得到父进程必须打印的一行就足够了，之后我可以简单地 "let it go".

不过，如果 [exec] 可以 return 我以某种方式知道第一个进程的 PID 并且有一个标准的 [wait] 命令可以阻塞直到进程退出或检查它的运行状态（此命令当前在 TclX 中可用）。

请注意 [chan pipe] 仅在 Tcl 8.6 中可用，您也可以使用 TclX 中的 [pipe]。

Tcl [exec] process leaves zombie 如果进程分叉并退出

Tcl [exec] process leaves zombie if the process forks and exits

tcl

exec

zombie-process