* lib/remote.exp (close_wait_program): New procedure.
(local_exec, standard_close): Use it.
The code that tries to make sure that a process dies in
lib/remote.exp:remote_close can kill the wrong process due to PID-reuse
races. The GDB buildbots show frequent misterious FAILs that turns out
are caused by this. The problem is this bit here:
exec sh -c "exec > /dev/null 2>&1 && (kill -2 $pgid || kill -2 $pid)
&& sleep 5 && (kill $pgid || kill $pid) && sleep 5 && (kill -9 $pgid || kill -9 $pid) &"
...
catch "wait -i $shell_id"
When this procedure is called to close the GDB process, GDB exits
promptly, but that whole cascade of kills carries on in the background,
thus potentially killing the unfortunate process that manages to be
spawned by one of the next tests and happens to reuse that $pid. [1]
So to fix this, kill that no-longer-needed pipeline as soon as "wait"
returns. There are two places in the DejaGnu with a similar pipeline,
so move that to a shared procedure.
[1] GDB'S testsuite spawns thousands of GDB instances and even more
inferior processes, and of those inferiors, some spawn thousands of
short lived threads in quick succession. Since threads and processes
share the number space in Linux, all that causes PID recycling
frequently. In addition, GDB's testsuite has a parallel test mode that
runs several tests/DejaGnu instances at the same time, further widening
the race window.
Signed-off-by: Ben Elliston <bje@gnu.org>