2006-12-20 12:32:29 +01:00
|
|
|
Kdump README for SLES 10
|
|
|
|
|
|
|
|
Prerequisites
|
|
|
|
=============
|
|
|
|
|
|
|
|
Be sure that you have installed the kexec-tools rpm. For x86, x86-64
|
|
|
|
and ppc64, install kernel-kdump.rpm, too. The version of the
|
|
|
|
kernel-kdump rpm must match the version of the running system kernel.
|
|
|
|
|
|
|
|
|
|
|
|
Overview
|
|
|
|
========
|
|
|
|
|
|
|
|
Kdump uses kexec to quickly boot to a recovery kernel whenever a dump of
|
|
|
|
the system kernel's memory needs to be taken (for example, when the
|
|
|
|
system panics). The system memory image is preserved across the reboot
|
|
|
|
and is accessible to the debug kernel. You can use common Linux
|
|
|
|
commands, such as cp and scp, to copy the memory image to a dump file on
|
|
|
|
the local host, or across the network to a remote system.
|
|
|
|
|
|
|
|
Kdump and kexec are currently supported on the x86, x86_64, and PPC64
|
|
|
|
architectures.
|
|
|
|
|
|
|
|
The system kernel reserves a small section of memory for the capture
|
|
|
|
kernel at boot time of the system kernel. This ensures that ongoing
|
|
|
|
Direct Memory Access (DMA) from the system kernel does not corrupt the
|
|
|
|
capture kernel. The "kexec -p" command loads the capture kernel into
|
|
|
|
this reserved memory area.
|
|
|
|
|
|
|
|
On x86 machines, the first 640 KB of physical memory is needed to boot,
|
|
|
|
irrespective of where the kernel loads. Therefore, kexec preserves this
|
|
|
|
region immediately before rebooting into the recovery kernel.
|
|
|
|
|
|
|
|
All of the necessary information about the system kernel's core image is
|
|
|
|
encoded in the ELF format, and stored in a reserved area of memory
|
|
|
|
before a crash. The physical address of the start of the ELF header is
|
|
|
|
passed to the recovery kernel through the "elfcorehdr=" boot parameter.
|
|
|
|
|
|
|
|
In the capture kernel, you can access the memory image from the system
|
|
|
|
kernel in two ways:
|
|
|
|
|
|
|
|
1) Through a /dev/oldmem device interface. A capture utility can read the
|
|
|
|
device file and write out the memory in raw format. This is a raw dump
|
|
|
|
of memory. Analysis and capture tools must be intelligent enough to
|
|
|
|
determine where to look for the right information.
|
|
|
|
|
|
|
|
2) Through /proc/vmcore. This exports the memory dump as an ELF format
|
|
|
|
file that can be written out using any file copy command such as cp or
|
|
|
|
scp. Further, you can use analysis tools such as the GNU Debugger (GDB)
|
|
|
|
or Crash to debug the dump file. This method ensures that the dump pages
|
|
|
|
are ordered correctly.
|
|
|
|
|
|
|
|
|
|
|
|
Setup of Kdump on SLES 10
|
|
|
|
=========================
|
|
|
|
|
|
|
|
Be sure the prerequisite RPMs are installed.
|
|
|
|
|
|
|
|
To enable a crash dump, you need to add an option to the boot loader to
|
|
|
|
specify the size and offset of the recovery kernel memory area.
|
|
|
|
|
|
|
|
An example of this boot loader option is "crashkernel=64M@16M". The 64M
|
|
|
|
shows the reserved space for the Kdump recovery kernel, and the 16M is
|
|
|
|
the address of the reserved area. On ia64, the start offset is
|
|
|
|
calculated by the kernel, so @xxx offset is ignored.
|
|
|
|
|
|
|
|
You can add this option either with the YaST boot loader module, or by
|
|
|
|
manually editing the boot loader configuration file.
|
|
|
|
|
|
|
|
The recommended values by architecture for the "crashkernel" option are:
|
|
|
|
|
|
|
|
i386: crashkernel=64M@16M
|
|
|
|
x86_64: crashkernel=64M@16M
|
2007-03-10 23:59:41 +01:00
|
|
|
ia64: crashkernel=512M (on small machines use 256M)
|
2006-12-20 12:32:29 +01:00
|
|
|
PPC64: crashkernel=128M@32M
|
|
|
|
|
|
|
|
After setting the boot loader option, activate the Kdump init script,
|
|
|
|
which is not activated by default. To do this, use the YaST System
|
|
|
|
Services (Runlevel) module. Alternately, enable the service on the
|
|
|
|
command line with the following command: "/sbin/chkconfig kdump on".
|
|
|
|
|
|
|
|
***Warning*** You must activate kdump service permanently via
|
|
|
|
YaST or chkconfig like above. Starting kdump service temporarily
|
|
|
|
(e.g. "rckdump start") doesn't suffice. It's because the system
|
|
|
|
is once rebooted over kexec to another state, and the temporary
|
|
|
|
activation is abandoned at the kdump boot stage.
|
|
|
|
|
|
|
|
After enabling the Kdump init script, reboot the system so that the
|
|
|
|
Kdump kernel image is loaded properly.
|
|
|
|
|
|
|
|
Test your Kdump setup by issuing the following commands as the root
|
|
|
|
user:
|
|
|
|
|
|
|
|
***Warning*** This procedure will crash your system. Shut down all
|
|
|
|
applications and ensure that no users are logged on before performing
|
|
|
|
this test.
|
|
|
|
|
|
|
|
# sync
|
2007-03-10 23:59:41 +01:00
|
|
|
# echo u > /proc/sysrq-trigger (remount file systems read-only to
|
|
|
|
avoid recovery after reboot)
|
2006-12-20 12:32:29 +01:00
|
|
|
# echo c > /proc/sysrq-trigger
|
|
|
|
|
|
|
|
After the system recovers, verify that a vmcore file was generated in
|
|
|
|
the save dump directory. By default the vmcore file is located in
|
2007-02-07 19:31:50 +01:00
|
|
|
/var/log/dump/<date-string>.
|
2006-12-20 12:32:29 +01:00
|
|
|
|
|
|
|
When a crash occurs, the kernel crash handler starts the second recovery
|
|
|
|
kernel that the Kdump init script loaded earlier, and reboots the system
|
|
|
|
using the reserved memory up to the $KDUMP_RUNLEVEL runlevel.
|
|
|
|
|
|
|
|
During the boot of the recovery kernel, the Kdump init script loads
|
|
|
|
again, but this time it dumps the core image for later analysis.
|
|
|
|
|
|
|
|
When a crash happens in a graphical environment, you will likely have no
|
|
|
|
GUI in the second kernel boot. If you used a VGA console, you might
|
|
|
|
still have visual output from the secondary kernel. The default behavior
|
|
|
|
of the Kdump script is to save the old vmcore image, and then reboot the
|
|
|
|
system immediately. You can adjust the behavior of the Kdump script
|
|
|
|
through sysconfig variables described later in this document.
|
|
|
|
|
|
|
|
|
|
|
|
The Default Dumper
|
|
|
|
==================
|
|
|
|
|
|
|
|
By default, the Kdump script saves the vmcore file to a unique
|
|
|
|
sub-directory consisting of $KDUMP_SAVEDIR and the date string, such as
|
2007-06-11 18:14:03 +02:00
|
|
|
/var/log/dump/2006-02-21-13:20/vmcore. This directory can be on the
|
|
|
|
local machine or on FTP, SSH, NFS or CIFS (see $KDUMP_SAVEDIR below).
|
2006-12-20 12:32:29 +01:00
|
|
|
|
2007-06-11 18:14:03 +02:00
|
|
|
If a local directory is used, the default dumper does some system checks
|
|
|
|
before copying the vmcore file. First, it checks the number of old dump
|
|
|
|
directories and removes them if there are more than
|
|
|
|
$KDUMP_KEEP_OLD_DUMPS. Then, the dumper checks the free disk space in
|
|
|
|
the partition of the dump directory. If the free space is less than the
|
|
|
|
sum of the memory size and the value given in $KDUMP_FREE_DISK_SIZE,
|
|
|
|
then the dumper will not create a dump.
|
2006-12-20 12:32:29 +01:00
|
|
|
|
|
|
|
$KDUMP_RUNLEVEL specifies the runlevel of the Kdump (recovery) kernel
|
|
|
|
boot. When $KDUMP_IMMEDIATE_REBOOT is set to yes, then the init script
|
|
|
|
automatically reboots after saving the vmcore. By default, the dumper
|
|
|
|
uses KDUMP_RUNLEVEL=1 and KDUMP_IMMEDIATE_REBOOT=yes, in order to reduce
|
|
|
|
the possible risk of disk corruption in the recovery kernel environment.
|
|
|
|
|
|
|
|
If you want Kdump to run more complex jobs than set by the default
|
|
|
|
dumper configuration, set the name of the appropriate command or script
|
|
|
|
to be run via $KDUMP_TRANSFER, and change $KDUMP_RUNLEVEL and
|
|
|
|
$KDUMP_IMMEDIATE_REBOOT.
|
|
|
|
|
|
|
|
For example, setting $KDUMP_TRANSFER="scp /proc/vmcore remote:/dump" and
|
|
|
|
KDUMP_RUNLEVEL=3 will make Kdump act like a netdump. You can set
|
|
|
|
KDUMP_IMMEDIATE_REBOOT=no to prevent the immediate reboot. This could be
|
|
|
|
useful to check the system over the network, for example.
|
|
|
|
|
|
|
|
Note that the available memory size for the recovery kernel is limited.
|
|
|
|
Setting KDUMP_RUNLEVEL=5 (graphical login) is not recommended.
|
|
|
|
|
|
|
|
|
2007-01-26 00:49:07 +01:00
|
|
|
Initrd-based Dump Saving
|
|
|
|
========================
|
|
|
|
|
|
|
|
The problem with the procedure mentioned above is that your root file
|
|
|
|
system (or whatever partition your KDUMP_SAVEDIR is in) may be corrupted.
|
|
|
|
So the script may not be able to mount the device and is not able to
|
|
|
|
save your file to disk.
|
|
|
|
|
|
|
|
For this, you can configure KDUMP_DUMPDEV to point to an unused partition
|
|
|
|
that is large enough -- i.e. larger than the system's main memory -- to
|
|
|
|
hold the dump. Before mounting the root file system, the init script
|
|
|
|
writes the dump to that device. After rebooting, the normal boot script
|
|
|
|
saves the dump from that device to KDUMP_SAVEDIR. Because the data was
|
|
|
|
is saved to disk, you can safely turn off the computer and/or repair
|
|
|
|
the file system using some tool (for example, you may need to boot from
|
|
|
|
a CD which is no problem).
|
|
|
|
|
|
|
|
After you changed that value, you have to re-run mkinitrd on the kdump
|
|
|
|
kernel, or on all kernels.
|
|
|
|
|
|
|
|
|
2006-12-20 12:32:29 +01:00
|
|
|
Tuning parameters
|
|
|
|
=================
|
|
|
|
|
|
|
|
You can adjust the basic behavior of the Kdump script by editing the
|
|
|
|
/etc/sysconfig/kdump file. Edit the script values with the YaST runlevel
|
|
|
|
System Services editor, or manually edit the /etc/sysconfig/kdump file,
|
|
|
|
and then restart the kdump service.
|
|
|
|
|
|
|
|
|
|
|
|
Generic options
|
|
|
|
---------------
|
|
|
|
|
|
|
|
- KDUMP_KERNELVER
|
|
|
|
|
|
|
|
This is the kernel version string for the Kdump kernel; an example is
|
|
|
|
"2.6.16-5-kdump". The init script will use a kernel named
|
|
|
|
/boot/vmlinux-$KDUMP_KERNELVER. The kdump script is located in the
|
|
|
|
/etc/sysconfig file.
|
|
|
|
|
|
|
|
If you do not specify a version, then the init script will try to find a
|
|
|
|
Kdump kernel with the same version number as the running kernel. Using
|
|
|
|
the string "kdump" will default to the most recently installed Kdump
|
|
|
|
kernel (suitable for x86, x86-64 and ppc64). For ia64, keep this
|
|
|
|
string empty to point the same running kernel.
|
|
|
|
|
|
|
|
|
|
|
|
- KDUMP_COMMANDLINE
|
|
|
|
|
|
|
|
This sets the command string to be passed to the Kdump kernel. This will
|
|
|
|
usually match the contents of the grub kernel line. An example is
|
|
|
|
KDUMP_COMMANDLINE="ro root=LABEL=/".
|
|
|
|
|
|
|
|
If you do not give a command line, then the default will be taken from
|
|
|
|
/proc/cmdline.
|
|
|
|
|
|
|
|
|
2007-02-05 15:39:32 +01:00
|
|
|
- KDUMP_COMMANDLINE_APPEND
|
|
|
|
|
|
|
|
Set this variable if you only want to _append_ values to the default
|
|
|
|
command line string. The string gets also appended if KDUMP_COMMANDLINE
|
|
|
|
is set.
|
|
|
|
|
|
|
|
|
2006-12-20 12:32:29 +01:00
|
|
|
- KEXEC_OPTIONS
|
|
|
|
|
|
|
|
You can use this to pass additional arguments to kexec. For i386 and
|
|
|
|
x86-64, you likely need to pass "--args-linux" here.
|
|
|
|
|
|
|
|
|
|
|
|
- KDUMP-RUNLEVEL
|
|
|
|
|
|
|
|
This is the runlevel that the Kdump kernel boots to. The default is "1".
|
|
|
|
To enable network support in the Kdump recovery environment, set this to
|
|
|
|
"3".
|
|
|
|
|
|
|
|
|
|
|
|
- KDUMP_IMMEDIATE_REBOOT
|
|
|
|
|
|
|
|
This option specifies whether to reboot immediately after saving the
|
2007-01-26 00:49:07 +01:00
|
|
|
core in the Kdump kernel. This option is ignored when KDUMP_DUMPDEV is
|
|
|
|
set to a non-empty string. The default is "yes".
|
2006-12-20 12:32:29 +01:00
|
|
|
|
|
|
|
|
|
|
|
- KDUMP_TRANSFER
|
|
|
|
|
|
|
|
This is an option to execute a script or command to process or transfer
|
|
|
|
the dump image. It can read the dump image either through /proc/vmcore
|
|
|
|
or /dev/oldmem. An empty string will use the default dumper.
|
|
|
|
|
|
|
|
|
|
|
|
Options for the Default Dumper
|
|
|
|
------------------------------
|
|
|
|
|
|
|
|
- KDUMP_SAVEDIR
|
|
|
|
|
|
|
|
This option specifies the path to the directory where the dumps are
|
2007-06-11 18:14:03 +02:00
|
|
|
saved. This can be
|
|
|
|
|
|
|
|
- a local file, for example "file:///var/log/dump" (or, deprecated,
|
|
|
|
just "/var/log/dump")
|
|
|
|
- a FTP server, for example "ftp://user:password@host/var/log/dump"
|
|
|
|
- a SSH server, for example "ssh://user@host/var/log/dump"
|
|
|
|
please create a user that needs no password or set up public key
|
|
|
|
authorization for the root user of the system -- or you have to enter
|
|
|
|
the password on the serial console as the VGA console may not work!
|
|
|
|
- a NFS share, for example "nfs://server:/export:/var/log/dump"
|
|
|
|
- a CIFS (SMB) share, for example
|
|
|
|
"cifs://user:password@host:/share/var/log/dump"
|
|
|
|
|
|
|
|
For the exact URLs, see kdump-url_parser(8) manual page. Or use the
|
|
|
|
YaST2 kdump module to configure this if you're unsure.
|
|
|
|
|
|
|
|
The default is "/var/log/dump". See also KDUMP_DUMPDEV if you
|
2007-01-18 19:38:27 +01:00
|
|
|
don't want to save the dump at first on a raw device which helps if your
|
|
|
|
root file system is corrupted.
|
|
|
|
|
|
|
|
|
2007-06-11 18:14:03 +02:00
|
|
|
- KDUMP_DUMPLEVEL
|
|
|
|
|
|
|
|
Determines the dump level. If KDUMP_DUMPLEVEL != 0, then makedumpfile
|
|
|
|
is used to strip pages that may not be necessary for analysing. 0 means
|
|
|
|
no stripping, and 31 is the maximum dump level, i.e. 0 produces the
|
|
|
|
largest dump files and 31 the smallest.
|
|
|
|
|
|
|
|
The following table from makedumpfile(8) shows what each dump level
|
|
|
|
means:
|
|
|
|
|
|
|
|
dump | zero | cache|cache | user | free
|
|
|
|
level | page | page |private| data | page
|
|
|
|
-------+------+------+-------+------+------
|
|
|
|
0 | | | | |
|
|
|
|
1 | X | | | |
|
|
|
|
2 | | X | | |
|
|
|
|
3 | X | X | | |
|
|
|
|
4 | | X | X | |
|
|
|
|
5 | X | X | X | |
|
|
|
|
6 | | X | X | |
|
|
|
|
7 | X | X | X | |
|
|
|
|
8 | | | | X |
|
|
|
|
9 | X | | | X |
|
|
|
|
10 | | X | | X |
|
|
|
|
11 | X | X | | X |
|
|
|
|
12 | | X | X | X |
|
|
|
|
13 | X | X | X | X |
|
|
|
|
14 | | X | X | X |
|
|
|
|
15 | X | X | X | X |
|
|
|
|
16 | | | | | X
|
|
|
|
17 | X | | | | X
|
|
|
|
18 | | X | | | X
|
|
|
|
19 | X | X | | | X
|
|
|
|
20 | | X | X | | X
|
|
|
|
21 | X | X | X | | X
|
|
|
|
22 | | X | X | | X
|
|
|
|
23 | X | X | X | | X
|
|
|
|
24 | | | | X | X
|
|
|
|
25 | X | | | X | X
|
|
|
|
26 | | X | | X | X
|
|
|
|
27 | X | X | | X | X
|
|
|
|
28 | | X | X | X | X
|
|
|
|
29 | X | X | X | X | X
|
|
|
|
30 | | X | X | X | X
|
|
|
|
31 | X | X | X | X | X
|
|
|
|
|
|
|
|
|
|
|
|
- KDUMP_DUMPFORMAT
|
|
|
|
|
|
|
|
This variable specifies the dump format.
|
|
|
|
|
|
|
|
"ELF" has the advantage that it's a standard format and GDB can be used to
|
|
|
|
analyze the dumps. The disadvantage is that the dump files are larger.
|
|
|
|
|
|
|
|
"compressed" is the kdump compressed format (see makedumpfile(8)) that
|
|
|
|
produces small dumps. However, only "crash" can analyse the dumps and
|
|
|
|
you need makedumpfile to have installed (but you need it anyway if you
|
|
|
|
set KDUMP_DUMPLEVEL != 0 before).
|
|
|
|
|
|
|
|
|
2007-01-18 19:38:27 +01:00
|
|
|
- KDUMP_DUMPDEV
|
|
|
|
|
2007-04-13 16:35:12 +02:00
|
|
|
Specifies the dump device that is used for saving the dump with the kdump
|
|
|
|
kernel. The dump device normally is a disk partition. You don't need to
|
|
|
|
specify a dump device here. Then the dump is written to KDUMP_SAVEDIR when
|
|
|
|
booting from the kdump kernel.
|
|
|
|
|
|
|
|
If KDUMP_DUMPDEV points to a device file, the dump is written to that device
|
|
|
|
when running the kdump kernel. The advantage over writing the dump to
|
|
|
|
disk immediately is that you don't have to mount the root file system (which
|
|
|
|
may be corrupted!) just to write the dump. So if the root file system is
|
|
|
|
corrupted, you have the chance to fix the file system manually and reboot the
|
|
|
|
system without loosing the dump information. On the first normal boot which
|
|
|
|
is able to successfully mount the root file system, the dump is saved to
|
2007-01-18 19:38:27 +01:00
|
|
|
KDUMP_SAVEDIR as usual.
|
|
|
|
|
2007-04-13 16:35:12 +02:00
|
|
|
Important: The KDUMP_DUMPDEV is overwritten by kdump, so don't use it for
|
|
|
|
saving any data. Also don't use the currently used swap partition.
|
2006-12-20 12:32:29 +01:00
|
|
|
|
|
|
|
|
|
|
|
- KDUMP_KEEP_OLD_DUMPS
|
|
|
|
|
|
|
|
This option specifies how many previous dumps are kept. If the number of
|
|
|
|
saved dump files exceeds this number, the dumper removes older dumps.
|
|
|
|
You can prevent automatic removal by setting this to "0" (zero). The
|
|
|
|
default value is "5".
|
|
|
|
|
|
|
|
|
|
|
|
- KDUMP_FREE_DISK_SIZE
|
|
|
|
|
|
|
|
This specifies the minimum free disk space in megabytes of the dump
|
|
|
|
partition. If the free disk space is less than the sum of this value and
|
|
|
|
the memory size, then the default dumper will not save the vmcore file
|
|
|
|
in order to prevent disk corruption. Setting this option to "0" (zero)
|
|
|
|
forces the dumper to dump without checking the size. The default value
|
|
|
|
is "64".
|
|
|
|
|
|
|
|
|
2007-02-15 18:13:06 +01:00
|
|
|
- KDUMP_VERBOSE
|
|
|
|
|
|
|
|
Determines if kdump uses verbose output. This value is a bitmask:
|
|
|
|
|
|
|
|
1: kdump command line is written to system log when executing
|
|
|
|
/etc/init.d/kdump
|
|
|
|
2: progress is written to stdout while dumping
|
|
|
|
4: kdump command line is written so standard output when executing
|
|
|
|
/etc/init.d/kdump
|
2007-06-11 18:14:03 +02:00
|
|
|
8: Debugging for kdump transfer script
|
2007-02-15 18:13:06 +01:00
|
|
|
|
|
|
|
|
2007-03-10 23:59:41 +01:00
|
|
|
Machine-specific Notes
|
|
|
|
======================
|
|
|
|
|
|
|
|
- IA64
|
|
|
|
o On some Hewlett Packard platforms you need 'machvec=dig' in
|
|
|
|
KDUMP_COMMANDLINE_APPEND. For example: HP rx3600.
|
|
|
|
|
|
|
|
o On SGI SN2 machines, the kdump doesn't work when the VGA console
|
|
|
|
is active. To disable the VGA console execute following commands
|
|
|
|
in the EFI shell
|
|
|
|
|
|
|
|
Shell> set NoVGA 1
|
|
|
|
Shell> reset
|
|
|
|
|
|
|
|
|
2007-03-15 01:41:29 +01:00
|
|
|
Dump Triggering Methods
|
|
|
|
=======================
|
|
|
|
|
|
|
|
This section talks about the various ways, other than a Kernel Panic,
|
|
|
|
in which Kdump can be triggered. These methods will enable the user
|
|
|
|
to invoke Kdump in cases where the system is experiencing a hard
|
|
|
|
hang.
|
|
|
|
|
|
|
|
1) AltSysRq C
|
|
|
|
|
|
|
|
On i386 and x86_64 machines, Kdump can be triggered with the
|
|
|
|
combination of the 'Alt','SysRq' and 'C' keyboard keys. This method
|
|
|
|
will work only on directly attached consoles, and not on remote
|
|
|
|
consoles. In cases where the machine is in a hung state with
|
|
|
|
interrupts disabled, AltSysRq C cannot be used. If any kind of
|
|
|
|
terminal access is still possible, the same result may be achieved
|
|
|
|
from the shell command line like so:
|
|
|
|
|
|
|
|
# echo c > /proc/sysrq-trigger
|
|
|
|
|
|
|
|
On PowerPC boxes also AltSysrq C can be used to initiate Kdump if a
|
|
|
|
directly attached console is available. In addition, Kdump can also
|
|
|
|
be triggered via Hardware Management Console(HMC) using 'Ctrl', 'O'
|
|
|
|
and 'C' keyboard keys. Inorder to use the Sysrq method for dump
|
|
|
|
triggering /proc/sys/kernel/sysrq needs to be enabled, which can be
|
|
|
|
done as follows:
|
|
|
|
|
|
|
|
# echo 1 > /proc/sys/kernel/sysrq
|
|
|
|
|
|
|
|
2) Kernel OOPs
|
|
|
|
|
|
|
|
If we want to generate a dump everytime the Kernel OOPses, we can
|
|
|
|
achieve this by setting the 'Panic On OOPs' option as follows:
|
|
|
|
|
|
|
|
# echo 1 > /proc/sys/kernel/panic_on_oops
|
|
|
|
|
|
|
|
|
|
|
|
3) NMI(Non maskable interrupt) button
|
|
|
|
|
|
|
|
In cases where the system is in a hung state, and is not accepting
|
|
|
|
keyboard interrupts, using NMI button for triggering Kdump can be very
|
|
|
|
useful. NMI button is present on most of the newer x86 and x86_64
|
|
|
|
machines. Please refer to the User guides/manuals to locate the
|
|
|
|
button, though in most occasions it is not very well documented. In
|
|
|
|
most cases it is hidden behind a small hole on the front or back panel
|
|
|
|
of the machine. You could use a toothpick or some other
|
|
|
|
non-conducting probe to press the button.
|
|
|
|
|
|
|
|
For example, on the IBM X series 366 machine, the NMI button is
|
|
|
|
located behind a small hole on the bottom center of the rear panel.
|
|
|
|
|
|
|
|
To enable this method of dump triggering using NMI button, you will
|
|
|
|
need to set the 'unknown_nmi_panic' option as follows:
|
|
|
|
|
|
|
|
# echo 1 > /proc/sys/kernel/unknown_nmi_panic
|
|
|
|
|
|
|
|
When enabling unknown_nmi_panic please be careful not to enable Nmi
|
|
|
|
Watchdog feature, else the system will panic.
|
|
|
|
|
|
|
|
4) NMI WATCHDOG
|
|
|
|
|
|
|
|
Nmi watchdog is a feature available in the x86 and x86_64 kernels
|
|
|
|
which uses NMI to monitor whether a CPU has locked up. On i386
|
|
|
|
machines, nmi watchdog can be enabled by passing nmi_watchdog=1 in the
|
|
|
|
commandline of the kernel. On x86_64 machines, this is enabled by
|
|
|
|
default. To verify if your system has been configured with nmi
|
|
|
|
watchdog, look at the NMI entry in /proc/interrupts. If the count is
|
|
|
|
greater than zero then nmi watchdog has been confgured, else it is
|
|
|
|
not.
|
|
|
|
|
|
|
|
Please refer to Documentation/nmi_watchdog.txt in the kernel source
|
|
|
|
for a more detailed description.
|
|
|
|
|
|
|
|
Once this feature has been enabled in the kernel, any lockups will
|
|
|
|
result in an OOPs message to be generated, followed by Kdump being
|
|
|
|
triggered. This also requires 'Panic On OOPs' to be enabled as
|
|
|
|
explained in method 2 above.
|
|
|
|
|
|
|
|
Please refrain from simultaneously enabling 'nmi_watchdog' and setting
|
|
|
|
/proc/sys/kernel/unknown_nmi_panic, as this would result in a Kernel
|
|
|
|
Panic from legitimate NMIs generated by the nmi_watchdog.
|
|
|
|
|
|
|
|
|
|
|
|
5) PowerPC specific methods:
|
|
|
|
|
|
|
|
On IBM PowerPC machines, the following methods to issue a soft reset
|
|
|
|
can be used to trigger Kdump. On SLES10 systems, XMON(debugger) is
|
|
|
|
turned off by default. If the user wishes to enable XMON, he can do
|
|
|
|
so by booting the kernel with 'xmon=on' option. With XMON enabled,
|
|
|
|
issuing a soft reset will drop the user to the XMON prompt, where
|
|
|
|
typing a 'X' will trigger Kdump. If XMON is not enabled then a soft
|
|
|
|
reset will directly trigger Kdump.
|
|
|
|
|
|
|
|
5.1) HMC
|
|
|
|
|
|
|
|
Hardware Management Console(HMC) available on Power4 and Power5
|
|
|
|
machines allow partitions to be reset remotely. This is specially
|
|
|
|
useful in hang situations where the system is not accepting any
|
|
|
|
keyboard inputs.
|
|
|
|
|
|
|
|
Once you have HMC configured, the following steps will enable you to
|
|
|
|
trigger Kdump via a soft reset:
|
|
|
|
|
|
|
|
On Power4
|
|
|
|
Using GUI
|
|
|
|
|
|
|
|
* In the right pane, right click on the partition you wish to
|
|
|
|
dump.
|
|
|
|
* Select "Operating System->Reset".
|
|
|
|
* Select "Soft Reset".
|
|
|
|
* Select "Yes".
|
|
|
|
|
|
|
|
Using HMC Commandline
|
|
|
|
|
|
|
|
# reset_partition -m <machine> -p <partition> -t soft
|
|
|
|
|
|
|
|
On Power5
|
|
|
|
Using GUI
|
|
|
|
|
|
|
|
* In the right pane, right click on the partition you wish to
|
|
|
|
dump.
|
|
|
|
* Select "Restart Partition".
|
|
|
|
* Select "Dump".
|
|
|
|
* Select "OK".
|
|
|
|
|
|
|
|
Using HMC Commandline
|
|
|
|
|
|
|
|
# chsysstate -m <managed system name> -n <lpar name> -o dumprestart -r lpar
|
|
|
|
|
|
|
|
5.2) Blade Management Console for Blade Center
|
|
|
|
|
|
|
|
To initiate a dump operation, go to Power/Restart option under "Blade
|
|
|
|
Tasks" in the Blade Management Console. Select the corresponding
|
|
|
|
blade for which you want to initate the dump and then click "Restart
|
|
|
|
blade with NMI". This will issue a soft reset.
|
|
|
|
|
|
|
|
5.3) Control Panel function for a standalone Power5 machine
|
|
|
|
|
|
|
|
A standalone machine is one which does not have any LPARs configured
|
|
|
|
and also does not have a HMC available. In such cases the Control
|
|
|
|
Panel, usually located on the front panel of the machine (please refer
|
|
|
|
to the User guide of the specific model for details) can be used for
|
|
|
|
dump triggering in case the system has a hard hang.
|
|
|
|
|
|
|
|
The control panel provides many functions for System Management
|
|
|
|
purposes; Function 22 is meant for invoking a Partition dump. This
|
|
|
|
function is available only in the Manual operating mode.
|
|
|
|
|
|
|
|
To check if the system is operating in manual mode,
|
|
|
|
|
|
|
|
* Select function 1 on the panel
|
|
|
|
* Press enter
|
|
|
|
* Read the Operating mode from the panel display
|
|
|
|
* If it is not 'M', then use function 2 to set it (see below)
|
|
|
|
|
|
|
|
To set manual mode:
|
|
|
|
|
|
|
|
* Select function 2 on the panel
|
|
|
|
* Press enter
|
|
|
|
* The current OS IPL type is displayed with a pointer
|
|
|
|
* Press enter to move to the Operating mode
|
|
|
|
* Use increment, decrement buttons to change the mode to M
|
|
|
|
* Press enter
|
|
|
|
|
|
|
|
To trigger the dump:
|
|
|
|
|
|
|
|
* Select function 22 on the panel
|
|
|
|
* Press enter
|
|
|
|
* Select function 22 on the panel
|
|
|
|
* Press enter
|
|
|
|
|
|
|
|
Invoking function 22 twice will issue a soft reset to the machine.
|
|
|
|
|
|
|
|
|
2006-12-20 12:32:29 +01:00
|
|
|
Dump Analysis
|
2007-03-10 23:59:41 +01:00
|
|
|
=============
|
2006-12-20 12:32:29 +01:00
|
|
|
|
|
|
|
Dump analysis can be performed using GDB or the Crash utility. The Crash
|
|
|
|
utility is included in the crash RPM package. You must install a
|
|
|
|
debug-info kernel matching the version of the system kernel (of the
|
|
|
|
system where the dump was collected) on the system where the analysis is
|
|
|
|
to be performed. The debug-info kernel provides symbol and type
|
|
|
|
information that Crash and GDB use. You can find kernel debug
|
|
|
|
information RPMs on the SUSE support Web site. Alternately, you can
|
|
|
|
build a debug-info kernel from source by specifying the
|
|
|
|
CONFIG_DEBUG_INFO kernel parameter.
|
|
|
|
|
2007-01-18 19:38:27 +01:00
|
|
|
Even if you install kernel-debuginfo, you need to uncompress the kernel
|
|
|
|
image first. This depends on the architecture on which your system is
|
|
|
|
running. If you don't know, just run "uname -i" to get the architecture.
|
|
|
|
|
|
|
|
On i586, i686 and x86_64, s390 and s390x, you have to unpack the kernel
|
|
|
|
image:
|
|
|
|
|
|
|
|
$ gunzip /boot/vmlinux-<version>.gz
|
|
|
|
|
|
|
|
On IA64, the default kernel image is already a gzip'ed vmlinux image.
|
|
|
|
Run
|
|
|
|
|
|
|
|
$ zcat /boot/vmlinuz-<version> > /boot/vmlinux-<version>
|
|
|
|
|
|
|
|
On PPC and PPC64, you don't have do to anything as there the bootloader
|
|
|
|
already loads the vmlinux image.
|
|
|
|
|
2006-12-20 12:32:29 +01:00
|
|
|
The symbol information in the debug-info kernel may differ from the
|
|
|
|
running kernel, therefor; when running crash against a vmcore you
|
|
|
|
should specify both the System.map file and the debug-info kernel.
|
|
|
|
For example, to run crash against a vmcore use the following command
|
|
|
|
line:
|
|
|
|
|
|
|
|
$ crash /boot/System.map-version /boot/vmlinux-version vmcore
|
|
|
|
|
|
|
|
Where:
|
|
|
|
/boot/System.map-<version> -- The map file matching the kernel
|
2007-01-18 19:38:27 +01:00
|
|
|
being analyzed.
|
|
|
|
/boot/vmlinux-<version> -- The matching kernel.
|
|
|
|
vmcore -- The crash dump.
|
2006-12-20 12:32:29 +01:00
|
|
|
|
2007-06-11 18:14:03 +02:00
|
|
|
|
2006-12-20 12:32:29 +01:00
|
|
|
GDB Helper Script
|
2007-03-15 01:41:29 +01:00
|
|
|
=================
|
2006-12-20 12:32:29 +01:00
|
|
|
|
|
|
|
The GDB-kdump script is provided to simplify use of GDB on dump images.
|
|
|
|
The usage is "gdb-kdump [vmcore]".
|
|
|
|
|
|
|
|
The argument is the vmcore dump image to analyze. If you do not give an
|
|
|
|
argument, then the latest dump image will be taken. The script starts
|
|
|
|
GDB with the vmlinux of the currently running kernel. The script assumes
|
|
|
|
that the vmlinux file is at /boot/vmlinux-$kernel. If the script finds
|
|
|
|
only a gzip-compressed file, the file is automatically uncompressed.
|
|
|
|
|
|
|
|
Note that you will need to supply kernel-versionnumber-debuginfo, with
|
|
|
|
debug symbols. GDB-kdump also reads some useful macros for the Kdump
|
|
|
|
image, originally provided in /usr/src/linux/Documentation/kdump, at
|
|
|
|
startup. The following macros then become available: bttnobp, btt,
|
|
|
|
btpid, trapinfo, and dmesg. See the help topic of each command in GDB
|
|
|
|
for details.
|