Hey Erik, thought I'd share this with you - I ran into a problem of slave
processes hanging in limbo after the master process died or was killed on
Beowulf, and several recent ps -aux calls demonstrated a few users
evidently and probably unwittinging experiencing the same problem.
It doesn't seem to cost much in the way of cpu cycles, but all the same
it's getting pretty messy - here's something that can be put in a script
to ensure everything stays clean:
kill `ps -aux | grep $USER | grep -v grep | grep -v ps | awk '{print
$2}'`
This unfortunately would have to be executed on every node that defunct
processes are running on, but it does seem to work. This will make
a script to kill across the nodes:
grep -v "#" < machines.my | awk '{print "ssh", $1, "./killbatch"}' > killall
chmod 700 killall
where machines.my can be replaced with the user's or the default machine
file. killbatch has to be in the user's main home
directory.
Thanks!
...matt