diff --git a/README.SUSE b/README.SUSE index 6860040..9496910 100644 --- a/README.SUSE +++ b/README.SUSE @@ -316,6 +316,180 @@ Machine-specific Notes Shell> reset +Dump Triggering Methods +======================= + +This section talks about the various ways, other than a Kernel Panic, +in which Kdump can be triggered. These methods will enable the user +to invoke Kdump in cases where the system is experiencing a hard +hang. + +1) AltSysRq C + +On i386 and x86_64 machines, Kdump can be triggered with the +combination of the 'Alt','SysRq' and 'C' keyboard keys. This method +will work only on directly attached consoles, and not on remote +consoles. In cases where the machine is in a hung state with +interrupts disabled, AltSysRq C cannot be used. If any kind of +terminal access is still possible, the same result may be achieved +from the shell command line like so: + + # echo c > /proc/sysrq-trigger + +On PowerPC boxes also AltSysrq C can be used to initiate Kdump if a +directly attached console is available. In addition, Kdump can also +be triggered via Hardware Management Console(HMC) using 'Ctrl', 'O' +and 'C' keyboard keys. Inorder to use the Sysrq method for dump +triggering /proc/sys/kernel/sysrq needs to be enabled, which can be +done as follows: + + # echo 1 > /proc/sys/kernel/sysrq + +2) Kernel OOPs + +If we want to generate a dump everytime the Kernel OOPses, we can +achieve this by setting the 'Panic On OOPs' option as follows: + + # echo 1 > /proc/sys/kernel/panic_on_oops + + +3) NMI(Non maskable interrupt) button + +In cases where the system is in a hung state, and is not accepting +keyboard interrupts, using NMI button for triggering Kdump can be very +useful. NMI button is present on most of the newer x86 and x86_64 +machines. Please refer to the User guides/manuals to locate the +button, though in most occasions it is not very well documented. In +most cases it is hidden behind a small hole on the front or back panel +of the machine. You could use a toothpick or some other +non-conducting probe to press the button. + +For example, on the IBM X series 366 machine, the NMI button is +located behind a small hole on the bottom center of the rear panel. + +To enable this method of dump triggering using NMI button, you will +need to set the 'unknown_nmi_panic' option as follows: + + # echo 1 > /proc/sys/kernel/unknown_nmi_panic + +When enabling unknown_nmi_panic please be careful not to enable Nmi +Watchdog feature, else the system will panic. + +4) NMI WATCHDOG + +Nmi watchdog is a feature available in the x86 and x86_64 kernels +which uses NMI to monitor whether a CPU has locked up. On i386 +machines, nmi watchdog can be enabled by passing nmi_watchdog=1 in the +commandline of the kernel. On x86_64 machines, this is enabled by +default. To verify if your system has been configured with nmi +watchdog, look at the NMI entry in /proc/interrupts. If the count is +greater than zero then nmi watchdog has been confgured, else it is +not. + +Please refer to Documentation/nmi_watchdog.txt in the kernel source +for a more detailed description. + +Once this feature has been enabled in the kernel, any lockups will +result in an OOPs message to be generated, followed by Kdump being +triggered. This also requires 'Panic On OOPs' to be enabled as +explained in method 2 above. + +Please refrain from simultaneously enabling 'nmi_watchdog' and setting +/proc/sys/kernel/unknown_nmi_panic, as this would result in a Kernel +Panic from legitimate NMIs generated by the nmi_watchdog. + + +5) PowerPC specific methods: + +On IBM PowerPC machines, the following methods to issue a soft reset +can be used to trigger Kdump. On SLES10 systems, XMON(debugger) is +turned off by default. If the user wishes to enable XMON, he can do +so by booting the kernel with 'xmon=on' option. With XMON enabled, +issuing a soft reset will drop the user to the XMON prompt, where +typing a 'X' will trigger Kdump. If XMON is not enabled then a soft +reset will directly trigger Kdump. + +5.1) HMC + +Hardware Management Console(HMC) available on Power4 and Power5 +machines allow partitions to be reset remotely. This is specially +useful in hang situations where the system is not accepting any +keyboard inputs. + +Once you have HMC configured, the following steps will enable you to +trigger Kdump via a soft reset: + +On Power4 + Using GUI + + * In the right pane, right click on the partition you wish to + dump. + * Select "Operating System->Reset". + * Select "Soft Reset". + * Select "Yes". + + Using HMC Commandline + + # reset_partition -m -p -t soft + +On Power5 + Using GUI + + * In the right pane, right click on the partition you wish to + dump. + * Select "Restart Partition". + * Select "Dump". + * Select "OK". + + Using HMC Commandline + + # chsysstate -m -n -o dumprestart -r lpar + +5.2) Blade Management Console for Blade Center + +To initiate a dump operation, go to Power/Restart option under "Blade +Tasks" in the Blade Management Console. Select the corresponding +blade for which you want to initate the dump and then click "Restart +blade with NMI". This will issue a soft reset. + +5.3) Control Panel function for a standalone Power5 machine + +A standalone machine is one which does not have any LPARs configured +and also does not have a HMC available. In such cases the Control +Panel, usually located on the front panel of the machine (please refer +to the User guide of the specific model for details) can be used for +dump triggering in case the system has a hard hang. + +The control panel provides many functions for System Management +purposes; Function 22 is meant for invoking a Partition dump. This +function is available only in the Manual operating mode. + +To check if the system is operating in manual mode, + + * Select function 1 on the panel + * Press enter + * Read the Operating mode from the panel display + * If it is not 'M', then use function 2 to set it (see below) + +To set manual mode: + + * Select function 2 on the panel + * Press enter + * The current OS IPL type is displayed with a pointer + * Press enter to move to the Operating mode + * Use increment, decrement buttons to change the mode to M + * Press enter + +To trigger the dump: + + * Select function 22 on the panel + * Press enter + * Select function 22 on the panel + * Press enter + +Invoking function 22 twice will issue a soft reset to the machine. + + Dump Analysis ============= @@ -361,7 +535,7 @@ Where: vmcore -- The crash dump. GDB Helper Script ------------------ +================= The GDB-kdump script is provided to simplify use of GDB on dump images. The usage is "gdb-kdump [vmcore]". diff --git a/kdump b/kdump index ce913ff..afd943a 100644 --- a/kdump +++ b/kdump @@ -77,12 +77,18 @@ get_size_mb() # and save the vmcore save_core() { + dumpsize=`get_mem_size` + if [ -z "$dumpsize" -o "$dumpsize" = 0 ]; then + echo -n "Null size vmcore" + rc_status -s + rc_failed 6 + return + fi + if [ $KDUMP_KEEP_OLD_DUMPS -gt 0 ]; then purge_old_dumps fi - dumpsize=`get_mem_size` - if [ $KDUMP_FREE_DISK_SIZE -gt 0 ]; then restsize=`parse_rest_size "$KDUMP_SAVEDIR"` needsize=`expr $dumpsize + $KDUMP_FREE_DISK_SIZE` @@ -241,12 +247,8 @@ is_crash_kernel () test -f /proc/vmcore || return 1 # FIXME: any better way to detect crash environment? test -n "$CRASH" && return 0 - case `uname -i` in - ia64) - # ia64 has no kdump kernel - return 1;; - esac - return 0 + grep -q elfcorehdr= /proc/cmdline && return 0 + return 1 } # return success if we have a valid dump on the dump device diff --git a/kexec-tools.changes b/kexec-tools.changes index 59342ef..9b27171 100644 --- a/kexec-tools.changes +++ b/kexec-tools.changes @@ -1,3 +1,14 @@ +------------------------------------------------------------------- +Wed Mar 14 18:45:27 CET 2007 - tiwai@suse.de + +- add detailed description about dump triggering methods to + README.SUSE (#250134) + +------------------------------------------------------------------- +Wed Mar 14 15:18:06 CET 2007 - tiwai@suse.de + +- improve the check of crash kernel in kdump init script (#252632) + ------------------------------------------------------------------- Fri Mar 9 21:34:36 CET 2007 - bwalle@suse.de diff --git a/kexec-tools.spec b/kexec-tools.spec index 894b491..2d633f5 100644 --- a/kexec-tools.spec +++ b/kexec-tools.spec @@ -19,7 +19,7 @@ Requires: %insserv_prereq %fillup_prereq Autoreqprov: on Summary: Tools for fast kernel loading Version: 1.101 -Release: 82 +Release: 84 Source: %{name}-%{package_version}.tar.bz2 Source1: kdump Source2: sysconfig.kdump @@ -128,6 +128,11 @@ true # ignore errors %{_sbindir}/kdump-helper %changelog +* Wed Mar 14 2007 - tiwai@suse.de +- add detailed description about dump triggering methods to + README.SUSE (#250134) +* Wed Mar 14 2007 - tiwai@suse.de +- improve the check of crash kernel in kdump init script (#252632) * Fri Mar 09 2007 - bwalle@suse.de - added hint that VGA console doesn't work (#253173) * Thu Feb 15 2007 - bwalle@suse.de