a416fd0dcd
Bug fix from Beta testing. OBS-URL: https://build.opensuse.org/request/show/495131 OBS-URL: https://build.opensuse.org/package/show/Base:System/s390-tools?expand=0&rev=8
145 lines
6.5 KiB
Diff
145 lines
6.5 KiB
Diff
Subject: [PATCH] [BZ 149058] ziomon: no blktrace kill which can corrupt kernel blktrace state
|
|
From: Steffen Maier <maier@linux.vnet.ibm.com>
|
|
|
|
Description: ziomon: no blktrace kill which can corrupt kernel blktrace state
|
|
Symptom: Ziomon terminates with the following error messages:
|
|
|
|
$ ziomon -d <duration> -o <logfile> <device>...
|
|
...
|
|
ziomon: Failed to stop trace on /dev/sdX
|
|
...
|
|
ziomon: Failed to stop trace on /dev/sdY
|
|
ziomon: blktrace has leftovers, manual cleanup required!
|
|
|
|
Subsequent ziomon runs fail likely until the next reboot:
|
|
|
|
$ ziomon -d <duration> -o <logfile> <device>...
|
|
...
|
|
Collecting data...BLKTRACESETUP(2) /dev/sdX failed: \
|
|
2/No such file or directory
|
|
BLKTRACESETUP(2) /dev/sdY failed: 2/No such file or directory
|
|
...
|
|
Thread Z failed open /sys/kernel/debug/block/(null)/traceZ: \
|
|
2/No such file or directory
|
|
FAILED to start thread on CPU 0: 1/Operation not permitted
|
|
...
|
|
Error: blktrace has errors, aborting
|
|
...
|
|
ziomon: Failed to stop trace on /dev/sdX
|
|
...
|
|
ziomon: Failed to stop trace on /dev/sdY
|
|
ziomon: blktrace has leftovers, manual cleanup required!
|
|
Problem: While we call blktrace with stopwatch option '-w $WRP_DURATION'
|
|
to have itself terminate automatically after the (mandatory)
|
|
sampling duration, there might be a race
|
|
with our own timed blktrace checking loop between "Collecting
|
|
data..." and "done"
|
|
or any of SIGHUP SIGTERM SIGINT SIGQUIT from the user might
|
|
prematurely trigger emergency_shutdown(),
|
|
so blktrace might still be running or not have completed its
|
|
cleanup when we reach the blktrace leftover checks in shutdown().
|
|
|
|
In that case, ziomon uses the meanwhile hidden kill option of
|
|
blktrace which can corrupt kernel state if blktrace user space
|
|
has not yet completed its regular cleanup.
|
|
Solution: The correct way to have the blktrace process terminate and
|
|
cleanup properly including any dependent kernel state, before
|
|
blktrace's stopwatch makes it terminate and cleanup
|
|
automatically, is to only send SIGINT to it.
|
|
|
|
http://git.kernel.org/cgit/linux/kernel/git/axboe/blktrace.git
|
|
86596c7579c6 ("blktrace: remove -k from manpage synopsis")
|
|
fb7f86674a51 ("blktrace: disable kill option - take 2")
|
|
Additional explanation for blktrace -k:
|
|
http://marc.info/?l=linux-btrace&m=131245973528087&w=2i
|
|
linux-btrace ("Re: Recent changes")
|
|
The following kernel commits do not change above termination
|
|
statement:
|
|
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
|
|
39cbb602b543 ("Remove double removal of blktrace directory")
|
|
fd51d251e4cd ("blktrace: remove debugfs entries on bad path")
|
|
f48fc4d32e24
|
|
("block: get rid of the manual directory counting in blktrace")
|
|
|
|
As of this writing, the (multithreaded) blktrace also wires
|
|
SIGHUP SIGTERM SIGALRM with the same signal handler
|
|
handle_sigint(), but to adhere to its documentation move from
|
|
SIGTERM to SIGINT anyway.
|
|
|
|
We must not use the meanwhile hidden kill option of blktrace
|
|
because this corrupts kernel state if blktrace user space has not
|
|
yet completed its regular cleanup.
|
|
|
|
In order to avoid zombies we wait for all children. After all, we
|
|
want everything to be completed before ziomon terminates and the
|
|
user could safely run another instance of ziomon. Also, we get
|
|
the possibility for users to detect and report issues with
|
|
children that don't finish timely.
|
|
|
|
Since we do not know how to manually cleanup, drop the check for
|
|
blktrace leftovers in shutdown().
|
|
Reproduction: Run ziomon with many devices on the command line.
|
|
Possibly (other) load on the system increases probability.
|
|
Upstream-ID: -
|
|
Problem-ID: 149058
|
|
|
|
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
|
|
---
|
|
ziomon/ziomon | 17 ++++++-----------
|
|
1 file changed, 6 insertions(+), 11 deletions(-)
|
|
|
|
--- a/ziomon/ziomon
|
|
+++ b/ziomon/ziomon
|
|
@@ -5,7 +5,7 @@
|
|
#
|
|
# Wrapper script to start all processes
|
|
#
|
|
-# Copyright IBM Corp. 2008
|
|
+# Copyright IBM Corp. 2008, 2016
|
|
#
|
|
# Author(s): Stefan Raspl <raspl@linux.vnet.ibm.com>
|
|
#
|
|
@@ -84,7 +84,7 @@ function check_for_int() {
|
|
|
|
function print_version() {
|
|
echo "$WRP_TOOLNAME: I/O data collection utility, version %S390_TOOLS_VERSION%";
|
|
- echo "Copyright IBM Corp. 2008";
|
|
+ echo "Copyright IBM Corp. 2008, 2016";
|
|
}
|
|
|
|
|
|
@@ -356,8 +356,6 @@ function start_trace() {
|
|
|
|
|
|
function shutdown() {
|
|
- local failed=0;
|
|
-
|
|
echo "Shutting down";
|
|
# one more second to write final result
|
|
sleep 2;
|
|
@@ -365,7 +363,7 @@ function shutdown() {
|
|
[ -d /proc/$WRP_ZIOMON_UTIL_PID ] && echo "Shutting down utilization process" && kill -s SIGTERM $WRP_ZIOMON_UTIL_PID;
|
|
fi
|
|
if [ "$WRP_BLKTRACE_PID" != "" ]; then
|
|
- [ -d /proc/$WRP_BLKTRACE_PID ] && echo "Shutting down blktrace process" && kill -s SIGTERM $WRP_BLKTRACE_PID;
|
|
+ [ -d /proc/$WRP_BLKTRACE_PID ] && echo "Shutting down blktrace process" && kill -s SIGINT $WRP_BLKTRACE_PID;
|
|
fi
|
|
if [ "$WRP_BLKIOMON_PID" != "" ]; then
|
|
[ -d /proc/$WRP_BLKIOMON_PID ] && echo "Shutting down blkiomon process" && kill -s SIGTERM $WRP_BLKIOMON_PID;
|
|
@@ -381,12 +379,9 @@ function shutdown() {
|
|
kill -s SIGTERM $WRP_ZIOMON_MGR_PID;
|
|
fi
|
|
fi
|
|
- # kill running traces on devices
|
|
- for dev in ${WRP_DEVICES[@]}; do
|
|
- [ -d /sys/kernel/debug/block/${dev##*/} ] && debug "Trace on $dev exists, attempt to kill..." && blktrace -k $dev >/dev/null 2>&1;
|
|
- [ -d /sys/kernel/debug/block/${dev##*/} ] && echo "$WRP_TOOLNAME: Failed to stop trace on $dev" && (( failed++ ));
|
|
- done
|
|
- [ $failed -ne 0 ] && echo "$WRP_TOOLNAME: blktrace has leftovers, manual cleanup required!";
|
|
+ # synchronize with all children to avoid zombies
|
|
+ # and to prepare for a clean subsequent re-run of ziomon
|
|
+ wait
|
|
|
|
if [ -e $WRP_MSG_Q_PATH ]; then
|
|
if [ $WRP_DEBUG -gt 1 ]; then
|