forked from pool/kexec-tools
.gitattributes | ||
.gitignore | ||
gdb-kdump | ||
gdbinit.kdump | ||
kdump | ||
kdump-helper-0.1.1.tar.bz2 | ||
kexec-help.diff | ||
kexec-longer-cmdline.diff | ||
kexec-tools-testing-20070205.tar.bz2 | ||
kexec-tools.changes | ||
kexec-tools.spec | ||
README.SUSE | ||
ready | ||
sysconfig.kdump |
Kdump README for SLES 10 Prerequisites ============= Be sure that you have installed the kexec-tools rpm. For x86, x86-64 and ppc64, install kernel-kdump.rpm, too. The version of the kernel-kdump rpm must match the version of the running system kernel. Overview ======== Kdump uses kexec to quickly boot to a recovery kernel whenever a dump of the system kernel's memory needs to be taken (for example, when the system panics). The system memory image is preserved across the reboot and is accessible to the debug kernel. You can use common Linux commands, such as cp and scp, to copy the memory image to a dump file on the local host, or across the network to a remote system. Kdump and kexec are currently supported on the x86, x86_64, and PPC64 architectures. The system kernel reserves a small section of memory for the capture kernel at boot time of the system kernel. This ensures that ongoing Direct Memory Access (DMA) from the system kernel does not corrupt the capture kernel. The "kexec -p" command loads the capture kernel into this reserved memory area. On x86 machines, the first 640 KB of physical memory is needed to boot, irrespective of where the kernel loads. Therefore, kexec preserves this region immediately before rebooting into the recovery kernel. All of the necessary information about the system kernel's core image is encoded in the ELF format, and stored in a reserved area of memory before a crash. The physical address of the start of the ELF header is passed to the recovery kernel through the "elfcorehdr=" boot parameter. In the capture kernel, you can access the memory image from the system kernel in two ways: 1) Through a /dev/oldmem device interface. A capture utility can read the device file and write out the memory in raw format. This is a raw dump of memory. Analysis and capture tools must be intelligent enough to determine where to look for the right information. 2) Through /proc/vmcore. This exports the memory dump as an ELF format file that can be written out using any file copy command such as cp or scp. Further, you can use analysis tools such as the GNU Debugger (GDB) or Crash to debug the dump file. This method ensures that the dump pages are ordered correctly. Setup of Kdump on SLES 10 ========================= Be sure the prerequisite RPMs are installed. To enable a crash dump, you need to add an option to the boot loader to specify the size and offset of the recovery kernel memory area. An example of this boot loader option is "crashkernel=64M@16M". The 64M shows the reserved space for the Kdump recovery kernel, and the 16M is the address of the reserved area. On ia64, the start offset is calculated by the kernel, so @xxx offset is ignored. You can add this option either with the YaST boot loader module, or by manually editing the boot loader configuration file. The recommended values by architecture for the "crashkernel" option are: i386: crashkernel=64M@16M x86_64: crashkernel=64M@16M ia64: crashkernel=128M PPC64: crashkernel=128M@32M After setting the boot loader option, activate the Kdump init script, which is not activated by default. To do this, use the YaST System Services (Runlevel) module. Alternately, enable the service on the command line with the following command: "/sbin/chkconfig kdump on". ***Warning*** You must activate kdump service permanently via YaST or chkconfig like above. Starting kdump service temporarily (e.g. "rckdump start") doesn't suffice. It's because the system is once rebooted over kexec to another state, and the temporary activation is abandoned at the kdump boot stage. After enabling the Kdump init script, reboot the system so that the Kdump kernel image is loaded properly. Test your Kdump setup by issuing the following commands as the root user: ***Warning*** This procedure will crash your system. Shut down all applications and ensure that no users are logged on before performing this test. # sync # echo c > /proc/sysrq-trigger After the system recovers, verify that a vmcore file was generated in the save dump directory. By default the vmcore file is located in /var/log/dump/<date-string>. When a crash occurs, the kernel crash handler starts the second recovery kernel that the Kdump init script loaded earlier, and reboots the system using the reserved memory up to the $KDUMP_RUNLEVEL runlevel. During the boot of the recovery kernel, the Kdump init script loads again, but this time it dumps the core image for later analysis. When a crash happens in a graphical environment, you will likely have no GUI in the second kernel boot. If you used a VGA console, you might still have visual output from the secondary kernel. The default behavior of the Kdump script is to save the old vmcore image, and then reboot the system immediately. You can adjust the behavior of the Kdump script through sysconfig variables described later in this document. The Default Dumper ================== By default, the Kdump script saves the vmcore file to a unique sub-directory consisting of $KDUMP_SAVEDIR and the date string, such as /var/log/dump/2006-02-21-13:20/vmcore. Before copying the vmcore file, the default dumper does some system checks. First, it checks the number of old dump directories and removes them if there are more than $KDUMP_KEEP_OLD_DUMPS. Then, the dumper checks the free disk space in the partition of the dump directory. If the free space is less than the sum of the memory size and the value given in $KDUMP_FREE_DISK_SIZE, then the dumper will not create a dump. $KDUMP_RUNLEVEL specifies the runlevel of the Kdump (recovery) kernel boot. When $KDUMP_IMMEDIATE_REBOOT is set to yes, then the init script automatically reboots after saving the vmcore. By default, the dumper uses KDUMP_RUNLEVEL=1 and KDUMP_IMMEDIATE_REBOOT=yes, in order to reduce the possible risk of disk corruption in the recovery kernel environment. If you want Kdump to run more complex jobs than set by the default dumper configuration, set the name of the appropriate command or script to be run via $KDUMP_TRANSFER, and change $KDUMP_RUNLEVEL and $KDUMP_IMMEDIATE_REBOOT. For example, setting $KDUMP_TRANSFER="scp /proc/vmcore remote:/dump" and KDUMP_RUNLEVEL=3 will make Kdump act like a netdump. You can set KDUMP_IMMEDIATE_REBOOT=no to prevent the immediate reboot. This could be useful to check the system over the network, for example. Note that the available memory size for the recovery kernel is limited. Setting KDUMP_RUNLEVEL=5 (graphical login) is not recommended. Initrd-based Dump Saving ======================== The problem with the procedure mentioned above is that your root file system (or whatever partition your KDUMP_SAVEDIR is in) may be corrupted. So the script may not be able to mount the device and is not able to save your file to disk. For this, you can configure KDUMP_DUMPDEV to point to an unused partition that is large enough -- i.e. larger than the system's main memory -- to hold the dump. Before mounting the root file system, the init script writes the dump to that device. After rebooting, the normal boot script saves the dump from that device to KDUMP_SAVEDIR. Because the data was is saved to disk, you can safely turn off the computer and/or repair the file system using some tool (for example, you may need to boot from a CD which is no problem). After you changed that value, you have to re-run mkinitrd on the kdump kernel, or on all kernels. Tuning parameters ================= You can adjust the basic behavior of the Kdump script by editing the /etc/sysconfig/kdump file. Edit the script values with the YaST runlevel System Services editor, or manually edit the /etc/sysconfig/kdump file, and then restart the kdump service. Generic options --------------- - KDUMP_KERNELVER This is the kernel version string for the Kdump kernel; an example is "2.6.16-5-kdump". The init script will use a kernel named /boot/vmlinux-$KDUMP_KERNELVER. The kdump script is located in the /etc/sysconfig file. If you do not specify a version, then the init script will try to find a Kdump kernel with the same version number as the running kernel. Using the string "kdump" will default to the most recently installed Kdump kernel (suitable for x86, x86-64 and ppc64). For ia64, keep this string empty to point the same running kernel. - KDUMP_COMMANDLINE This sets the command string to be passed to the Kdump kernel. This will usually match the contents of the grub kernel line. An example is KDUMP_COMMANDLINE="ro root=LABEL=/". If you do not give a command line, then the default will be taken from /proc/cmdline. - KDUMP_COMMANDLINE_APPEND Set this variable if you only want to _append_ values to the default command line string. The string gets also appended if KDUMP_COMMANDLINE is set. - KEXEC_OPTIONS You can use this to pass additional arguments to kexec. For i386 and x86-64, you likely need to pass "--args-linux" here. - KDUMP-RUNLEVEL This is the runlevel that the Kdump kernel boots to. The default is "1". To enable network support in the Kdump recovery environment, set this to "3". - KDUMP_IMMEDIATE_REBOOT This option specifies whether to reboot immediately after saving the core in the Kdump kernel. This option is ignored when KDUMP_DUMPDEV is set to a non-empty string. The default is "yes". - KDUMP_TRANSFER This is an option to execute a script or command to process or transfer the dump image. It can read the dump image either through /proc/vmcore or /dev/oldmem. An empty string will use the default dumper. Options for the Default Dumper ------------------------------ - KDUMP_SAVEDIR This option specifies the path to the directory where the dumps are saved. The default is "/var/log/dump". See also KDUMP_DUMPDEV if you don't want to save the dump at first on a raw device which helps if your root file system is corrupted. - KDUMP_DUMPDEV Specifies the dump device that is used for saving the dump in the kdump kernel. You don't need to specify a dump device here. Then the dump is written to KDUMP_SAVEDIR when booting from the kdump kernel. If KDUMP_DUMPDEV points to a device file, the dump is written to that device when booting from the kdump kernel. The advantage over is that you don't have to mount the root file system (which may be corrupted!) just to write the dump. On the first normal boot which is able to successfully mount the root file system, the dump is saved to KDUMP_SAVEDIR as usual. Important: The KDUMP_DUMPDEV is overwritten by kdump, so don't use it for saving any data. Also don't use the currently used swap partition. - KDUMP_KEEP_OLD_DUMPS This option specifies how many previous dumps are kept. If the number of saved dump files exceeds this number, the dumper removes older dumps. You can prevent automatic removal by setting this to "0" (zero). The default value is "5". - KDUMP_FREE_DISK_SIZE This specifies the minimum free disk space in megabytes of the dump partition. If the free disk space is less than the sum of this value and the memory size, then the default dumper will not save the vmcore file in order to prevent disk corruption. Setting this option to "0" (zero) forces the dumper to dump without checking the size. The default value is "64". Dump Analysis ------------- Dump analysis can be performed using GDB or the Crash utility. The Crash utility is included in the crash RPM package. You must install a debug-info kernel matching the version of the system kernel (of the system where the dump was collected) on the system where the analysis is to be performed. The debug-info kernel provides symbol and type information that Crash and GDB use. You can find kernel debug information RPMs on the SUSE support Web site. Alternately, you can build a debug-info kernel from source by specifying the CONFIG_DEBUG_INFO kernel parameter. Even if you install kernel-debuginfo, you need to uncompress the kernel image first. This depends on the architecture on which your system is running. If you don't know, just run "uname -i" to get the architecture. On i586, i686 and x86_64, s390 and s390x, you have to unpack the kernel image: $ gunzip /boot/vmlinux-<version>.gz On IA64, the default kernel image is already a gzip'ed vmlinux image. Run $ zcat /boot/vmlinuz-<version> > /boot/vmlinux-<version> On PPC and PPC64, you don't have do to anything as there the bootloader already loads the vmlinux image. The symbol information in the debug-info kernel may differ from the running kernel, therefor; when running crash against a vmcore you should specify both the System.map file and the debug-info kernel. For example, to run crash against a vmcore use the following command line: $ crash /boot/System.map-version /boot/vmlinux-version vmcore Where: /boot/System.map-<version> -- The map file matching the kernel being analyzed. /boot/vmlinux-<version> -- The matching kernel. vmcore -- The crash dump. GDB Helper Script ----------------- The GDB-kdump script is provided to simplify use of GDB on dump images. The usage is "gdb-kdump [vmcore]". The argument is the vmcore dump image to analyze. If you do not give an argument, then the latest dump image will be taken. The script starts GDB with the vmlinux of the currently running kernel. The script assumes that the vmlinux file is at /boot/vmlinux-$kernel. If the script finds only a gzip-compressed file, the file is automatically uncompressed. Note that you will need to supply kernel-versionnumber-debuginfo, with debug symbols. GDB-kdump also reads some useful macros for the Kdump image, originally provided in /usr/src/linux/Documentation/kdump, at startup. The following macros then become available: bttnobp, btt, btpid, trapinfo, and dmesg. See the help topic of each command in GDB for details.