Kdump README for SLES 10 Prerequisites ============= Be sure that you have installed the kexec-tools rpm. For x86, x86-64 and ppc64, install kernel-kdump.rpm, too. The version of the kernel-kdump rpm must match the version of the running system kernel. Overview ======== Kdump uses kexec to quickly boot to a recovery kernel whenever a dump of the system kernel's memory needs to be taken (for example, when the system panics). The system memory image is preserved across the reboot and is accessible to the debug kernel. You can use common Linux commands, such as cp and scp, to copy the memory image to a dump file on the local host, or across the network to a remote system. Kdump and kexec are currently supported on the x86, x86_64, and PPC64 architectures. The system kernel reserves a small section of memory for the capture kernel at boot time of the system kernel. This ensures that ongoing Direct Memory Access (DMA) from the system kernel does not corrupt the capture kernel. The "kexec -p" command loads the capture kernel into this reserved memory area. On x86 machines, the first 640 KB of physical memory is needed to boot, irrespective of where the kernel loads. Therefore, kexec preserves this region immediately before rebooting into the recovery kernel. All of the necessary information about the system kernel's core image is encoded in the ELF format, and stored in a reserved area of memory before a crash. The physical address of the start of the ELF header is passed to the recovery kernel through the "elfcorehdr=" boot parameter. In the capture kernel, you can access the memory image from the system kernel in two ways: 1) Through a /dev/oldmem device interface. A capture utility can read the device file and write out the memory in raw format. This is a raw dump of memory. Analysis and capture tools must be intelligent enough to determine where to look for the right information. 2) Through /proc/vmcore. This exports the memory dump as an ELF format file that can be written out using any file copy command such as cp or scp. Further, you can use analysis tools such as the GNU Debugger (GDB) or Crash to debug the dump file. This method ensures that the dump pages are ordered correctly. Setup of Kdump on SLES 10 ========================= Be sure the prerequisite RPMs are installed. To enable a crash dump, you need to add an option to the boot loader to specify the size and offset of the recovery kernel memory area. An example of this boot loader option is "crashkernel=64M@16M". The 64M shows the reserved space for the Kdump recovery kernel, and the 16M is the address of the reserved area. On ia64, the start offset is calculated by the kernel, so @xxx offset is ignored. You can add this option either with the YaST boot loader module, or by manually editing the boot loader configuration file. The recommended values by architecture for the "crashkernel" option are: i386: crashkernel=64M@16M x86_64: crashkernel=64M@16M ia64: crashkernel=512M (on small machines use 256M) PPC64: crashkernel=128M@32M After setting the boot loader option, activate the Kdump init script, which is not activated by default. To do this, use the YaST System Services (Runlevel) module. Alternately, enable the service on the command line with the following command: "/sbin/chkconfig kdump on". ***Warning*** You must activate kdump service permanently via YaST or chkconfig like above. Starting kdump service temporarily (e.g. "rckdump start") doesn't suffice. It's because the system is once rebooted over kexec to another state, and the temporary activation is abandoned at the kdump boot stage. After enabling the Kdump init script, reboot the system so that the Kdump kernel image is loaded properly. Test your Kdump setup by issuing the following commands as the root user: ***Warning*** This procedure will crash your system. Shut down all applications and ensure that no users are logged on before performing this test. # sync # echo u > /proc/sysrq-trigger (remount file systems read-only to avoid recovery after reboot) # echo c > /proc/sysrq-trigger After the system recovers, verify that a vmcore file was generated in the save dump directory. By default the vmcore file is located in /var/log/dump/. When a crash occurs, the kernel crash handler starts the second recovery kernel that the Kdump init script loaded earlier, and reboots the system using the reserved memory up to the $KDUMP_RUNLEVEL runlevel. During the boot of the recovery kernel, the Kdump init script loads again, but this time it dumps the core image for later analysis. When a crash happens in a graphical environment, you will likely have no GUI in the second kernel boot. If you used a VGA console, you might still have visual output from the secondary kernel. The default behavior of the Kdump script is to save the old vmcore image, and then reboot the system immediately. You can adjust the behavior of the Kdump script through sysconfig variables described later in this document. The Default Dumper ================== By default, the Kdump script saves the vmcore file to a unique sub-directory consisting of $KDUMP_SAVEDIR and the date string, such as /var/log/dump/2006-02-21-13:20/vmcore. Before copying the vmcore file, the default dumper does some system checks. First, it checks the number of old dump directories and removes them if there are more than $KDUMP_KEEP_OLD_DUMPS. Then, the dumper checks the free disk space in the partition of the dump directory. If the free space is less than the sum of the memory size and the value given in $KDUMP_FREE_DISK_SIZE, then the dumper will not create a dump. $KDUMP_RUNLEVEL specifies the runlevel of the Kdump (recovery) kernel boot. When $KDUMP_IMMEDIATE_REBOOT is set to yes, then the init script automatically reboots after saving the vmcore. By default, the dumper uses KDUMP_RUNLEVEL=1 and KDUMP_IMMEDIATE_REBOOT=yes, in order to reduce the possible risk of disk corruption in the recovery kernel environment. If you want Kdump to run more complex jobs than set by the default dumper configuration, set the name of the appropriate command or script to be run via $KDUMP_TRANSFER, and change $KDUMP_RUNLEVEL and $KDUMP_IMMEDIATE_REBOOT. For example, setting $KDUMP_TRANSFER="scp /proc/vmcore remote:/dump" and KDUMP_RUNLEVEL=3 will make Kdump act like a netdump. You can set KDUMP_IMMEDIATE_REBOOT=no to prevent the immediate reboot. This could be useful to check the system over the network, for example. Note that the available memory size for the recovery kernel is limited. Setting KDUMP_RUNLEVEL=5 (graphical login) is not recommended. Initrd-based Dump Saving ======================== The problem with the procedure mentioned above is that your root file system (or whatever partition your KDUMP_SAVEDIR is in) may be corrupted. So the script may not be able to mount the device and is not able to save your file to disk. For this, you can configure KDUMP_DUMPDEV to point to an unused partition that is large enough -- i.e. larger than the system's main memory -- to hold the dump. Before mounting the root file system, the init script writes the dump to that device. After rebooting, the normal boot script saves the dump from that device to KDUMP_SAVEDIR. Because the data was is saved to disk, you can safely turn off the computer and/or repair the file system using some tool (for example, you may need to boot from a CD which is no problem). After you changed that value, you have to re-run mkinitrd on the kdump kernel, or on all kernels. Tuning parameters ================= You can adjust the basic behavior of the Kdump script by editing the /etc/sysconfig/kdump file. Edit the script values with the YaST runlevel System Services editor, or manually edit the /etc/sysconfig/kdump file, and then restart the kdump service. Generic options --------------- - KDUMP_KERNELVER This is the kernel version string for the Kdump kernel; an example is "2.6.16-5-kdump". The init script will use a kernel named /boot/vmlinux-$KDUMP_KERNELVER. The kdump script is located in the /etc/sysconfig file. If you do not specify a version, then the init script will try to find a Kdump kernel with the same version number as the running kernel. Using the string "kdump" will default to the most recently installed Kdump kernel (suitable for x86, x86-64 and ppc64). For ia64, keep this string empty to point the same running kernel. - KDUMP_COMMANDLINE This sets the command string to be passed to the Kdump kernel. This will usually match the contents of the grub kernel line. An example is KDUMP_COMMANDLINE="ro root=LABEL=/". If you do not give a command line, then the default will be taken from /proc/cmdline. - KDUMP_COMMANDLINE_APPEND Set this variable if you only want to _append_ values to the default command line string. The string gets also appended if KDUMP_COMMANDLINE is set. - KEXEC_OPTIONS You can use this to pass additional arguments to kexec. For i386 and x86-64, you likely need to pass "--args-linux" here. - KDUMP-RUNLEVEL This is the runlevel that the Kdump kernel boots to. The default is "1". To enable network support in the Kdump recovery environment, set this to "3". - KDUMP_IMMEDIATE_REBOOT This option specifies whether to reboot immediately after saving the core in the Kdump kernel. This option is ignored when KDUMP_DUMPDEV is set to a non-empty string. The default is "yes". - KDUMP_TRANSFER This is an option to execute a script or command to process or transfer the dump image. It can read the dump image either through /proc/vmcore or /dev/oldmem. An empty string will use the default dumper. Options for the Default Dumper ------------------------------ - KDUMP_SAVEDIR This option specifies the path to the directory where the dumps are saved. The default is "/var/log/dump". See also KDUMP_DUMPDEV if you don't want to save the dump at first on a raw device which helps if your root file system is corrupted. - KDUMP_DUMPDEV Specifies the dump device that is used for saving the dump in the kdump kernel. You don't need to specify a dump device here. Then the dump is written to KDUMP_SAVEDIR when booting from the kdump kernel. If KDUMP_DUMPDEV points to a device file, the dump is written to that device when booting from the kdump kernel. The advantage over is that you don't have to mount the root file system (which may be corrupted!) just to write the dump. On the first normal boot which is able to successfully mount the root file system, the dump is saved to KDUMP_SAVEDIR as usual. Important: The KDUMP_DUMPDEV is overwritten by kdump, so don't use it for saving any data. Also don't use the currently used swap partition. - KDUMP_KEEP_OLD_DUMPS This option specifies how many previous dumps are kept. If the number of saved dump files exceeds this number, the dumper removes older dumps. You can prevent automatic removal by setting this to "0" (zero). The default value is "5". - KDUMP_FREE_DISK_SIZE This specifies the minimum free disk space in megabytes of the dump partition. If the free disk space is less than the sum of this value and the memory size, then the default dumper will not save the vmcore file in order to prevent disk corruption. Setting this option to "0" (zero) forces the dumper to dump without checking the size. The default value is "64". - KDUMP_VERBOSE Determines if kdump uses verbose output. This value is a bitmask: 1: kdump command line is written to system log when executing /etc/init.d/kdump 2: progress is written to stdout while dumping 4: kdump command line is written so standard output when executing /etc/init.d/kdump Machine-specific Notes ====================== - IA64 o On some Hewlett Packard platforms you need 'machvec=dig' in KDUMP_COMMANDLINE_APPEND. For example: HP rx3600. o On SGI SN2 machines, the kdump doesn't work when the VGA console is active. To disable the VGA console execute following commands in the EFI shell Shell> set NoVGA 1 Shell> reset Dump Triggering Methods ======================= This section talks about the various ways, other than a Kernel Panic, in which Kdump can be triggered. These methods will enable the user to invoke Kdump in cases where the system is experiencing a hard hang. 1) AltSysRq C On i386 and x86_64 machines, Kdump can be triggered with the combination of the 'Alt','SysRq' and 'C' keyboard keys. This method will work only on directly attached consoles, and not on remote consoles. In cases where the machine is in a hung state with interrupts disabled, AltSysRq C cannot be used. If any kind of terminal access is still possible, the same result may be achieved from the shell command line like so: # echo c > /proc/sysrq-trigger On PowerPC boxes also AltSysrq C can be used to initiate Kdump if a directly attached console is available. In addition, Kdump can also be triggered via Hardware Management Console(HMC) using 'Ctrl', 'O' and 'C' keyboard keys. Inorder to use the Sysrq method for dump triggering /proc/sys/kernel/sysrq needs to be enabled, which can be done as follows: # echo 1 > /proc/sys/kernel/sysrq 2) Kernel OOPs If we want to generate a dump everytime the Kernel OOPses, we can achieve this by setting the 'Panic On OOPs' option as follows: # echo 1 > /proc/sys/kernel/panic_on_oops 3) NMI(Non maskable interrupt) button In cases where the system is in a hung state, and is not accepting keyboard interrupts, using NMI button for triggering Kdump can be very useful. NMI button is present on most of the newer x86 and x86_64 machines. Please refer to the User guides/manuals to locate the button, though in most occasions it is not very well documented. In most cases it is hidden behind a small hole on the front or back panel of the machine. You could use a toothpick or some other non-conducting probe to press the button. For example, on the IBM X series 366 machine, the NMI button is located behind a small hole on the bottom center of the rear panel. To enable this method of dump triggering using NMI button, you will need to set the 'unknown_nmi_panic' option as follows: # echo 1 > /proc/sys/kernel/unknown_nmi_panic When enabling unknown_nmi_panic please be careful not to enable Nmi Watchdog feature, else the system will panic. 4) NMI WATCHDOG Nmi watchdog is a feature available in the x86 and x86_64 kernels which uses NMI to monitor whether a CPU has locked up. On i386 machines, nmi watchdog can be enabled by passing nmi_watchdog=1 in the commandline of the kernel. On x86_64 machines, this is enabled by default. To verify if your system has been configured with nmi watchdog, look at the NMI entry in /proc/interrupts. If the count is greater than zero then nmi watchdog has been confgured, else it is not. Please refer to Documentation/nmi_watchdog.txt in the kernel source for a more detailed description. Once this feature has been enabled in the kernel, any lockups will result in an OOPs message to be generated, followed by Kdump being triggered. This also requires 'Panic On OOPs' to be enabled as explained in method 2 above. Please refrain from simultaneously enabling 'nmi_watchdog' and setting /proc/sys/kernel/unknown_nmi_panic, as this would result in a Kernel Panic from legitimate NMIs generated by the nmi_watchdog. 5) PowerPC specific methods: On IBM PowerPC machines, the following methods to issue a soft reset can be used to trigger Kdump. On SLES10 systems, XMON(debugger) is turned off by default. If the user wishes to enable XMON, he can do so by booting the kernel with 'xmon=on' option. With XMON enabled, issuing a soft reset will drop the user to the XMON prompt, where typing a 'X' will trigger Kdump. If XMON is not enabled then a soft reset will directly trigger Kdump. 5.1) HMC Hardware Management Console(HMC) available on Power4 and Power5 machines allow partitions to be reset remotely. This is specially useful in hang situations where the system is not accepting any keyboard inputs. Once you have HMC configured, the following steps will enable you to trigger Kdump via a soft reset: On Power4 Using GUI * In the right pane, right click on the partition you wish to dump. * Select "Operating System->Reset". * Select "Soft Reset". * Select "Yes". Using HMC Commandline # reset_partition -m -p -t soft On Power5 Using GUI * In the right pane, right click on the partition you wish to dump. * Select "Restart Partition". * Select "Dump". * Select "OK". Using HMC Commandline # chsysstate -m -n -o dumprestart -r lpar 5.2) Blade Management Console for Blade Center To initiate a dump operation, go to Power/Restart option under "Blade Tasks" in the Blade Management Console. Select the corresponding blade for which you want to initate the dump and then click "Restart blade with NMI". This will issue a soft reset. 5.3) Control Panel function for a standalone Power5 machine A standalone machine is one which does not have any LPARs configured and also does not have a HMC available. In such cases the Control Panel, usually located on the front panel of the machine (please refer to the User guide of the specific model for details) can be used for dump triggering in case the system has a hard hang. The control panel provides many functions for System Management purposes; Function 22 is meant for invoking a Partition dump. This function is available only in the Manual operating mode. To check if the system is operating in manual mode, * Select function 1 on the panel * Press enter * Read the Operating mode from the panel display * If it is not 'M', then use function 2 to set it (see below) To set manual mode: * Select function 2 on the panel * Press enter * The current OS IPL type is displayed with a pointer * Press enter to move to the Operating mode * Use increment, decrement buttons to change the mode to M * Press enter To trigger the dump: * Select function 22 on the panel * Press enter * Select function 22 on the panel * Press enter Invoking function 22 twice will issue a soft reset to the machine. Dump Analysis ============= Dump analysis can be performed using GDB or the Crash utility. The Crash utility is included in the crash RPM package. You must install a debug-info kernel matching the version of the system kernel (of the system where the dump was collected) on the system where the analysis is to be performed. The debug-info kernel provides symbol and type information that Crash and GDB use. You can find kernel debug information RPMs on the SUSE support Web site. Alternately, you can build a debug-info kernel from source by specifying the CONFIG_DEBUG_INFO kernel parameter. Even if you install kernel-debuginfo, you need to uncompress the kernel image first. This depends on the architecture on which your system is running. If you don't know, just run "uname -i" to get the architecture. On i586, i686 and x86_64, s390 and s390x, you have to unpack the kernel image: $ gunzip /boot/vmlinux-.gz On IA64, the default kernel image is already a gzip'ed vmlinux image. Run $ zcat /boot/vmlinuz- > /boot/vmlinux- On PPC and PPC64, you don't have do to anything as there the bootloader already loads the vmlinux image. The symbol information in the debug-info kernel may differ from the running kernel, therefor; when running crash against a vmcore you should specify both the System.map file and the debug-info kernel. For example, to run crash against a vmcore use the following command line: $ crash /boot/System.map-version /boot/vmlinux-version vmcore Where: /boot/System.map- -- The map file matching the kernel being analyzed. /boot/vmlinux- -- The matching kernel. vmcore -- The crash dump. GDB Helper Script ================= The GDB-kdump script is provided to simplify use of GDB on dump images. The usage is "gdb-kdump [vmcore]". The argument is the vmcore dump image to analyze. If you do not give an argument, then the latest dump image will be taken. The script starts GDB with the vmlinux of the currently running kernel. The script assumes that the vmlinux file is at /boot/vmlinux-$kernel. If the script finds only a gzip-compressed file, the file is automatically uncompressed. Note that you will need to supply kernel-versionnumber-debuginfo, with debug symbols. GDB-kdump also reads some useful macros for the Kdump image, originally provided in /usr/src/linux/Documentation/kdump, at startup. The following macros then become available: bttnobp, btt, btpid, trapinfo, and dmesg. See the help topic of each command in GDB for details.