Subject: [PATCH] [BZ 149058] ziomon: no blktrace kill which can corrupt kernel blktrace state From: Steffen Maier Description: ziomon: no blktrace kill which can corrupt kernel blktrace state Symptom: Ziomon terminates with the following error messages: $ ziomon -d -o ... ... ziomon: Failed to stop trace on /dev/sdX ... ziomon: Failed to stop trace on /dev/sdY ziomon: blktrace has leftovers, manual cleanup required! Subsequent ziomon runs fail likely until the next reboot: $ ziomon -d -o ... ... Collecting data...BLKTRACESETUP(2) /dev/sdX failed: \ 2/No such file or directory BLKTRACESETUP(2) /dev/sdY failed: 2/No such file or directory ... Thread Z failed open /sys/kernel/debug/block/(null)/traceZ: \ 2/No such file or directory FAILED to start thread on CPU 0: 1/Operation not permitted ... Error: blktrace has errors, aborting ... ziomon: Failed to stop trace on /dev/sdX ... ziomon: Failed to stop trace on /dev/sdY ziomon: blktrace has leftovers, manual cleanup required! Problem: While we call blktrace with stopwatch option '-w $WRP_DURATION' to have itself terminate automatically after the (mandatory) sampling duration, there might be a race with our own timed blktrace checking loop between "Collecting data..." and "done" or any of SIGHUP SIGTERM SIGINT SIGQUIT from the user might prematurely trigger emergency_shutdown(), so blktrace might still be running or not have completed its cleanup when we reach the blktrace leftover checks in shutdown(). In that case, ziomon uses the meanwhile hidden kill option of blktrace which can corrupt kernel state if blktrace user space has not yet completed its regular cleanup. Solution: The correct way to have the blktrace process terminate and cleanup properly including any dependent kernel state, before blktrace's stopwatch makes it terminate and cleanup automatically, is to only send SIGINT to it. http://git.kernel.org/cgit/linux/kernel/git/axboe/blktrace.git 86596c7579c6 ("blktrace: remove -k from manpage synopsis") fb7f86674a51 ("blktrace: disable kill option - take 2") Additional explanation for blktrace -k: http://marc.info/?l=linux-btrace&m=131245973528087&w=2i linux-btrace ("Re: Recent changes") The following kernel commits do not change above termination statement: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 39cbb602b543 ("Remove double removal of blktrace directory") fd51d251e4cd ("blktrace: remove debugfs entries on bad path") f48fc4d32e24 ("block: get rid of the manual directory counting in blktrace") As of this writing, the (multithreaded) blktrace also wires SIGHUP SIGTERM SIGALRM with the same signal handler handle_sigint(), but to adhere to its documentation move from SIGTERM to SIGINT anyway. We must not use the meanwhile hidden kill option of blktrace because this corrupts kernel state if blktrace user space has not yet completed its regular cleanup. In order to avoid zombies we wait for all children. After all, we want everything to be completed before ziomon terminates and the user could safely run another instance of ziomon. Also, we get the possibility for users to detect and report issues with children that don't finish timely. Since we do not know how to manually cleanup, drop the check for blktrace leftovers in shutdown(). Reproduction: Run ziomon with many devices on the command line. Possibly (other) load on the system increases probability. Upstream-ID: - Problem-ID: 149058 Signed-off-by: Steffen Maier --- ziomon/ziomon | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) --- a/ziomon/ziomon +++ b/ziomon/ziomon @@ -5,7 +5,7 @@ # # Wrapper script to start all processes # -# Copyright IBM Corp. 2008 +# Copyright IBM Corp. 2008, 2016 # # Author(s): Stefan Raspl # @@ -84,7 +84,7 @@ function check_for_int() { function print_version() { echo "$WRP_TOOLNAME: I/O data collection utility, version %S390_TOOLS_VERSION%"; - echo "Copyright IBM Corp. 2008"; + echo "Copyright IBM Corp. 2008, 2016"; } @@ -356,8 +356,6 @@ function start_trace() { function shutdown() { - local failed=0; - echo "Shutting down"; # one more second to write final result sleep 2; @@ -365,7 +363,7 @@ function shutdown() { [ -d /proc/$WRP_ZIOMON_UTIL_PID ] && echo "Shutting down utilization process" && kill -s SIGTERM $WRP_ZIOMON_UTIL_PID; fi if [ "$WRP_BLKTRACE_PID" != "" ]; then - [ -d /proc/$WRP_BLKTRACE_PID ] && echo "Shutting down blktrace process" && kill -s SIGTERM $WRP_BLKTRACE_PID; + [ -d /proc/$WRP_BLKTRACE_PID ] && echo "Shutting down blktrace process" && kill -s SIGINT $WRP_BLKTRACE_PID; fi if [ "$WRP_BLKIOMON_PID" != "" ]; then [ -d /proc/$WRP_BLKIOMON_PID ] && echo "Shutting down blkiomon process" && kill -s SIGTERM $WRP_BLKIOMON_PID; @@ -381,12 +379,9 @@ function shutdown() { kill -s SIGTERM $WRP_ZIOMON_MGR_PID; fi fi - # kill running traces on devices - for dev in ${WRP_DEVICES[@]}; do - [ -d /sys/kernel/debug/block/${dev##*/} ] && debug "Trace on $dev exists, attempt to kill..." && blktrace -k $dev >/dev/null 2>&1; - [ -d /sys/kernel/debug/block/${dev##*/} ] && echo "$WRP_TOOLNAME: Failed to stop trace on $dev" && (( failed++ )); - done - [ $failed -ne 0 ] && echo "$WRP_TOOLNAME: blktrace has leftovers, manual cleanup required!"; + # synchronize with all children to avoid zombies + # and to prepare for a clean subsequent re-run of ziomon + wait if [ -e $WRP_MSG_Q_PATH ]; then if [ $WRP_DEBUG -gt 1 ]; then