forked from pool/kexec-tools
291 lines
11 KiB
Plaintext
291 lines
11 KiB
Plaintext
Kdump README for SLES 10
|
|
|
|
Prerequisites
|
|
=============
|
|
|
|
Be sure that you have installed the kexec-tools rpm. For x86, x86-64
|
|
and ppc64, install kernel-kdump.rpm, too. The version of the
|
|
kernel-kdump rpm must match the version of the running system kernel.
|
|
|
|
|
|
Overview
|
|
========
|
|
|
|
Kdump uses kexec to quickly boot to a recovery kernel whenever a dump of
|
|
the system kernel's memory needs to be taken (for example, when the
|
|
system panics). The system memory image is preserved across the reboot
|
|
and is accessible to the debug kernel. You can use common Linux
|
|
commands, such as cp and scp, to copy the memory image to a dump file on
|
|
the local host, or across the network to a remote system.
|
|
|
|
Kdump and kexec are currently supported on the x86, x86_64, and PPC64
|
|
architectures.
|
|
|
|
The system kernel reserves a small section of memory for the capture
|
|
kernel at boot time of the system kernel. This ensures that ongoing
|
|
Direct Memory Access (DMA) from the system kernel does not corrupt the
|
|
capture kernel. The "kexec -p" command loads the capture kernel into
|
|
this reserved memory area.
|
|
|
|
On x86 machines, the first 640 KB of physical memory is needed to boot,
|
|
irrespective of where the kernel loads. Therefore, kexec preserves this
|
|
region immediately before rebooting into the recovery kernel.
|
|
|
|
All of the necessary information about the system kernel's core image is
|
|
encoded in the ELF format, and stored in a reserved area of memory
|
|
before a crash. The physical address of the start of the ELF header is
|
|
passed to the recovery kernel through the "elfcorehdr=" boot parameter.
|
|
|
|
In the capture kernel, you can access the memory image from the system
|
|
kernel in two ways:
|
|
|
|
1) Through a /dev/oldmem device interface. A capture utility can read the
|
|
device file and write out the memory in raw format. This is a raw dump
|
|
of memory. Analysis and capture tools must be intelligent enough to
|
|
determine where to look for the right information.
|
|
|
|
2) Through /proc/vmcore. This exports the memory dump as an ELF format
|
|
file that can be written out using any file copy command such as cp or
|
|
scp. Further, you can use analysis tools such as the GNU Debugger (GDB)
|
|
or Crash to debug the dump file. This method ensures that the dump pages
|
|
are ordered correctly.
|
|
|
|
|
|
Setup of Kdump on SLES 10
|
|
=========================
|
|
|
|
Be sure the prerequisite RPMs are installed.
|
|
|
|
To enable a crash dump, you need to add an option to the boot loader to
|
|
specify the size and offset of the recovery kernel memory area.
|
|
|
|
An example of this boot loader option is "crashkernel=64M@16M". The 64M
|
|
shows the reserved space for the Kdump recovery kernel, and the 16M is
|
|
the address of the reserved area. On ia64, the start offset is
|
|
calculated by the kernel, so @xxx offset is ignored.
|
|
|
|
You can add this option either with the YaST boot loader module, or by
|
|
manually editing the boot loader configuration file.
|
|
|
|
The recommended values by architecture for the "crashkernel" option are:
|
|
|
|
i386: crashkernel=64M@16M
|
|
x86_64: crashkernel=64M@16M
|
|
ia64: crashkernel=128M
|
|
PPC64: crashkernel=128M@32M
|
|
|
|
After setting the boot loader option, activate the Kdump init script,
|
|
which is not activated by default. To do this, use the YaST System
|
|
Services (Runlevel) module. Alternately, enable the service on the
|
|
command line with the following command: "/sbin/chkconfig kdump on".
|
|
|
|
***Warning*** You must activate kdump service permanently via
|
|
YaST or chkconfig like above. Starting kdump service temporarily
|
|
(e.g. "rckdump start") doesn't suffice. It's because the system
|
|
is once rebooted over kexec to another state, and the temporary
|
|
activation is abandoned at the kdump boot stage.
|
|
|
|
After enabling the Kdump init script, reboot the system so that the
|
|
Kdump kernel image is loaded properly.
|
|
|
|
Test your Kdump setup by issuing the following commands as the root
|
|
user:
|
|
|
|
***Warning*** This procedure will crash your system. Shut down all
|
|
applications and ensure that no users are logged on before performing
|
|
this test.
|
|
|
|
# sync
|
|
# echo c > /proc/sysrq-trigger
|
|
|
|
After the system recovers, verify that a vmcore file was generated in
|
|
the save dump directory. By default the vmcore file is located in
|
|
/var/log/dumps/<date-string>.
|
|
|
|
When a crash occurs, the kernel crash handler starts the second recovery
|
|
kernel that the Kdump init script loaded earlier, and reboots the system
|
|
using the reserved memory up to the $KDUMP_RUNLEVEL runlevel.
|
|
|
|
During the boot of the recovery kernel, the Kdump init script loads
|
|
again, but this time it dumps the core image for later analysis.
|
|
|
|
When a crash happens in a graphical environment, you will likely have no
|
|
GUI in the second kernel boot. If you used a VGA console, you might
|
|
still have visual output from the secondary kernel. The default behavior
|
|
of the Kdump script is to save the old vmcore image, and then reboot the
|
|
system immediately. You can adjust the behavior of the Kdump script
|
|
through sysconfig variables described later in this document.
|
|
|
|
|
|
The Default Dumper
|
|
==================
|
|
|
|
By default, the Kdump script saves the vmcore file to a unique
|
|
sub-directory consisting of $KDUMP_SAVEDIR and the date string, such as
|
|
/var/log/dump/2006-02-21-13:20/vmcore.
|
|
|
|
Before copying the vmcore file, the default dumper does some system
|
|
checks. First, it checks the number of old dump directories and removes
|
|
them if there are more than $KDUMP_KEEP_OLD_DUMPS. Then, the dumper
|
|
checks the free disk space in the partition of the dump directory. If
|
|
the free space is less than the sum of the memory size and the value
|
|
given in $KDUMP_FREE_DISK_SIZE, then the dumper will not create a dump.
|
|
|
|
$KDUMP_RUNLEVEL specifies the runlevel of the Kdump (recovery) kernel
|
|
boot. When $KDUMP_IMMEDIATE_REBOOT is set to yes, then the init script
|
|
automatically reboots after saving the vmcore. By default, the dumper
|
|
uses KDUMP_RUNLEVEL=1 and KDUMP_IMMEDIATE_REBOOT=yes, in order to reduce
|
|
the possible risk of disk corruption in the recovery kernel environment.
|
|
|
|
If you want Kdump to run more complex jobs than set by the default
|
|
dumper configuration, set the name of the appropriate command or script
|
|
to be run via $KDUMP_TRANSFER, and change $KDUMP_RUNLEVEL and
|
|
$KDUMP_IMMEDIATE_REBOOT.
|
|
|
|
For example, setting $KDUMP_TRANSFER="scp /proc/vmcore remote:/dump" and
|
|
KDUMP_RUNLEVEL=3 will make Kdump act like a netdump. You can set
|
|
KDUMP_IMMEDIATE_REBOOT=no to prevent the immediate reboot. This could be
|
|
useful to check the system over the network, for example.
|
|
|
|
Note that the available memory size for the recovery kernel is limited.
|
|
Setting KDUMP_RUNLEVEL=5 (graphical login) is not recommended.
|
|
|
|
|
|
Tuning parameters
|
|
=================
|
|
|
|
You can adjust the basic behavior of the Kdump script by editing the
|
|
/etc/sysconfig/kdump file. Edit the script values with the YaST runlevel
|
|
System Services editor, or manually edit the /etc/sysconfig/kdump file,
|
|
and then restart the kdump service.
|
|
|
|
|
|
Generic options
|
|
---------------
|
|
|
|
- KDUMP_KERNELVER
|
|
|
|
This is the kernel version string for the Kdump kernel; an example is
|
|
"2.6.16-5-kdump". The init script will use a kernel named
|
|
/boot/vmlinux-$KDUMP_KERNELVER. The kdump script is located in the
|
|
/etc/sysconfig file.
|
|
|
|
If you do not specify a version, then the init script will try to find a
|
|
Kdump kernel with the same version number as the running kernel. Using
|
|
the string "kdump" will default to the most recently installed Kdump
|
|
kernel (suitable for x86, x86-64 and ppc64). For ia64, keep this
|
|
string empty to point the same running kernel.
|
|
|
|
|
|
- KDUMP_COMMANDLINE
|
|
|
|
This sets the command string to be passed to the Kdump kernel. This will
|
|
usually match the contents of the grub kernel line. An example is
|
|
KDUMP_COMMANDLINE="ro root=LABEL=/".
|
|
|
|
If you do not give a command line, then the default will be taken from
|
|
/proc/cmdline.
|
|
|
|
|
|
- KEXEC_OPTIONS
|
|
|
|
You can use this to pass additional arguments to kexec. For i386 and
|
|
x86-64, you likely need to pass "--args-linux" here.
|
|
|
|
|
|
- KDUMP-RUNLEVEL
|
|
|
|
This is the runlevel that the Kdump kernel boots to. The default is "1".
|
|
To enable network support in the Kdump recovery environment, set this to
|
|
"3".
|
|
|
|
|
|
- KDUMP_IMMEDIATE_REBOOT
|
|
|
|
This option specifies whether to reboot immediately after saving the
|
|
core in the Kdump kernel. The default is "yes."
|
|
|
|
|
|
- KDUMP_TRANSFER
|
|
|
|
This is an option to execute a script or command to process or transfer
|
|
the dump image. It can read the dump image either through /proc/vmcore
|
|
or /dev/oldmem. An empty string will use the default dumper.
|
|
|
|
|
|
Options for the Default Dumper
|
|
------------------------------
|
|
|
|
- KDUMP_SAVEDIR
|
|
|
|
This option specifies the path to the directory where the dumps are
|
|
saved. The default is "/var/log/dump".
|
|
|
|
|
|
- KDUMP_KEEP_OLD_DUMPS
|
|
|
|
This option specifies how many previous dumps are kept. If the number of
|
|
saved dump files exceeds this number, the dumper removes older dumps.
|
|
You can prevent automatic removal by setting this to "0" (zero). The
|
|
default value is "5".
|
|
|
|
|
|
- KDUMP_FREE_DISK_SIZE
|
|
|
|
This specifies the minimum free disk space in megabytes of the dump
|
|
partition. If the free disk space is less than the sum of this value and
|
|
the memory size, then the default dumper will not save the vmcore file
|
|
in order to prevent disk corruption. Setting this option to "0" (zero)
|
|
forces the dumper to dump without checking the size. The default value
|
|
is "64".
|
|
|
|
|
|
Dump Analysis
|
|
-------------
|
|
|
|
Dump analysis can be performed using GDB or the Crash utility. The Crash
|
|
utility is included in the crash RPM package. You must install a
|
|
debug-info kernel matching the version of the system kernel (of the
|
|
system where the dump was collected) on the system where the analysis is
|
|
to be performed. The debug-info kernel provides symbol and type
|
|
information that Crash and GDB use. You can find kernel debug
|
|
information RPMs on the SUSE support Web site. Alternately, you can
|
|
build a debug-info kernel from source by specifying the
|
|
CONFIG_DEBUG_INFO kernel parameter.
|
|
|
|
The symbol information in the debug-info kernel may differ from the
|
|
running kernel, therefor; when running crash against a vmcore you
|
|
should specify both the System.map file and the debug-info kernel.
|
|
For example, to run crash against a vmcore use the following command
|
|
line:
|
|
|
|
$ crash /boot/System.map-version /boot/vmlinux-version vmcore
|
|
|
|
Where:
|
|
/boot/System.map-<version> -- The map file matching the kernel
|
|
being analyzed.
|
|
/boot/vmlinux-<version> -- The matching kernel.
|
|
vmcore -- The crash dump.
|
|
|
|
The kernel rpm contains only gz-compressed file. Uncompress it
|
|
manually before using crash.
|
|
|
|
GDB Helper Script
|
|
-----------------
|
|
|
|
The GDB-kdump script is provided to simplify use of GDB on dump images.
|
|
The usage is "gdb-kdump [vmcore]".
|
|
|
|
The argument is the vmcore dump image to analyze. If you do not give an
|
|
argument, then the latest dump image will be taken. The script starts
|
|
GDB with the vmlinux of the currently running kernel. The script assumes
|
|
that the vmlinux file is at /boot/vmlinux-$kernel. If the script finds
|
|
only a gzip-compressed file, the file is automatically uncompressed.
|
|
|
|
Note that you will need to supply kernel-versionnumber-debuginfo, with
|
|
debug symbols. GDB-kdump also reads some useful macros for the Kdump
|
|
image, originally provided in /usr/src/linux/Documentation/kdump, at
|
|
startup. The following macros then become available: bttnobp, btt,
|
|
btpid, trapinfo, and dmesg. See the help topic of each command in GDB
|
|
for details.
|