Knowledge Fact: How to use kdump for Linux Kernel Crash Analysis

Kdump is an utility used to capture the system core dump in the event of system crashes.
These captured core dumps can be used later to analyze the exact cause of the system failure and implement the necessary fix to prevent the crashes in future.
Kdump reserves a small portion of the memory for the secondary kernel called crashkernel.
This secondary or crash kernel is used the capture the core dump image whenever the system crashes.

1. Install Kdump Tools

First, install the kdump, which is part of kexec-tools package.

# yum install kexec-tools

2. Set crashkernel in grub.conf

Once the package is installed, edit /boot/grub/grub.conf file and set the amount of memory to be reserved for the kdump crash kernel.
You can edit the /boot/grub/grub.conf for the value crashkernel and set it to either auto or user-specified value. It is recommended to use minimum of 128M for a machine with 2G memory or higher.

In the following example, look for the line that start with “kernel”, where it is set to “crashkernel=auto”.

# vi /boot/grub/grub.conf
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux (2.6.32-419.el6.x86_64)
  root (hd0,0)
  kernel /vmlinuz-2.6.32-419.el6.x86_64 ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap SYSFONT=latarcyrheb-sun16 crashkernel=auto rd_LVM_LV=VolGroup/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet
  initrd /initramfs-2.6.32-419.el6.x86_64.img

3. Configure Dump Location

Once the kernel crashes, the core dump can be captured to local filesystem or remote filesystem(NFS) based on the settings defined in /etc/kdump.conf (in SLES operating system the path is /etc/sysconfig/kdump).
This file is automatically created when the kexec-tools package is installed.
All the entries in this file will be commented out by default. You can uncomment the ones that are needed for your best options.

# vi /etc/kdump.conf
#raw /dev/sda5
#ext4 /dev/sda3
#ext4 LABEL=/boot
#ext4 UUID=03138356-5e61-4ab3-b58e-27507ac41937
#net my.server.com:/export/tmp
#net user@my.server.com
path /var/crash
core_collector makedumpfile -c --message-level 1 -d 31
#core_collector scp
#core_collector cp --sparse=always
#extra_bins /bin/cp
#link_delay 60
#kdump_post /var/crash/scripts/kdump-post.sh
#extra_bins /usr/bin/lftp
#disk_timeout 30
#extra_modules gfs2
#options modulename options
#default shell
#debug_mem_level 0
#force_rebuild 1
#sshkey /root/.ssh/kdump_id_rsa

In the above file:

To write the dump to a raw device, you can uncomment “raw /dev/sda5″ and change it to point to correct dump location.
If you want to change the path of the dump location, uncomment and change “path /var/crash” to point to the new location.
For NFS, you can uncomment “#net my.server.com:/export/tmp” and point to the current NFS server location.

4. Configure Core Collector

The next step is to configure the core collector in Kdump configuration file. It is important to compress the data captured and filter all the unnecessary information from the captured core file.
To enable the core collector, uncomment the following line that starts with core_collector.

core_collector makedumpfile -c --message-level 1 -d 31

makedumpfile specified in the core_collector actually makes a small DUMPFILE by compressing the data.
makedumpfile provides two DUMPFILE formats (the ELF format and the kdump-compressed format).
By default, makedumpfile makes a DUMPFILE in the kdump-compressed format.
The kdump-compressed format can be read only with the crash utility, and it can be smaller than the ELF format because of the compression support.
The ELF format is readable with GDB and the crash utility.
-c is to compresses dump data by each page
-d is the number of pages that are unnecessary and can be ignored.

If you uncomment the line #default shell then the shell is invoked if the kdump fails to collect the core. Then the administrator can manually take the core dump using makedumpfile commands.

5. Restart kdump Services

Once kdump is configured, restart the kdump services,

# service kdump restart
Stopping kdump:   [  OK  ]
Starting kdump:   [  OK  ]

# service kdump status
Kdump is operational

If you have any issues in starting the services, then kdump module or crashkernel parameter has not been setup properly. So, verify /proc/cmdline and make sure it reflects to include the crashkernel value.

6. Manually Trigger the Core Dump

You can manually trigger the core dump using the following commands:

echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

The server will reboot itself and the crash dump will be generated.

7. View the Core Files

Once the server is rebooted, you will see the core file is generated under /var/crash based on location defined in /var/crash.
You will see vmcore and vmcore-dmseg.txt file:

# ls -lR /var/crash
drwxr-xr-x. 2 root root 4096 Mar 26 11:06 127.0.0.1-2014-03-26-11:06:43

/var/crash/127.0.0.1-2014-03-26-11:06:43:
-rw-------. 1 root root 33595159 Mar 26 11:06 vmcore
-rw-r--r--. 1 root root    79498 Mar 26 11:06 vmcore-dmesg.txt

8. Kdump analysis using crash

Crash utility is used to analyze the core file captured by kdump.
It can also be used to analyze the core files created by other dump utilities like netdump, diskdump, xendump.
You need to ensure the “kernel-debuginfo” package is present and it is at the same level as the kernel.
Launch the crash tool as shown below. After you this command, you will get a cash prompt, where you can execute crash commands:

# crash /var/crash/127.0.0.1-2014-03-26-12\:24\:39/vmcore /usr/lib/debug/lib/modules/`uname –r`/vmlinux

crash>

9. View the Process when System Crashed

Execute ps command at the crash prompt, which will display all the running process when the system crashed.

crash> ps
   PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
      0      0   0  ffffffff81a8d020  RU   0.0       0      0  [swapper]
      1      0   0  ffff88013e7db500  IN   0.0   19356   1544  init
      2      0   0  ffff88013e7daaa0  IN   0.0       0      0  [kthreadd]
      3      2   0  ffff88013e7da040  IN   0.0       0      0  [migration/0]
      4      2   0  ffff88013e7e9540  IN   0.0       0      0  [ksoftirqd/0]
      7      2   0  ffff88013dc19500  IN   0.0       0      0  [events/0]

10. View Swap space when System Crashed

Execute swap command at the crash prompt, which will display the swap space usage when the system crashed.

crash> swap
FILENAME           TYPE         SIZE      USED   PCT  PRIORITY
/dm-1            PARTITION    2064376k       0k   0%     -1

11. View IPCS when System Crashed

Execute ipcs command at the crash prompt, which will display the shared memory usage when the system crashed.

crash> ipcs
SHMID_KERNEL     KEY      SHMID      UID   PERMS BYTES      NATTCH STATUS
(none allocated)

SEM_ARRAY        KEY      SEMID      UID   PERMS NSEMS
ffff8801394c0990 00000000 0          0     600   1
ffff880138f09bd0 00000000 65537      0     600   1

MSG_QUEUE        KEY      MSQID      UID   PERMS USED-BYTES   MESSAGES
(none allocated)

12. View IRQ when System Crashed

Execute irq command at the crash prompt, which will display the IRQ stats when the system crashed.

crash> irq -s
           CPU0
  0:        149  IO-APIC-edge     timer
  1:        453  IO-APIC-edge     i8042
  7:          0  IO-APIC-edge     parport0
  8:          0  IO-APIC-edge     rtc0
  9:          0  IO-APIC-fasteoi  acpi
 12:        111  IO-APIC-edge     i8042
 14:        108  IO-APIC-edge     ata_piix
 .
 .

vtop – This command translates a user or kernel virtual address to its physical address.
foreach – This command displays data for multiple tasks in the system
waitq – This command displays all the tasks queued on a wait queue.

13. View the Virtual Memory when System Crashed

Execute vm command at the crash prompt, which will display the virtual memory usage when the system crashed.

crash> vm
PID: 5210   TASK: ffff8801396f6aa0  CPU: 0   COMMAND: "bash"
       MM                 PGD          RSS    TOTAL_VM
ffff88013975d880  ffff88013a0c5000  1808k   108340k
      VMA           START       END     FLAGS FILE
ffff88013a0c4ed0     400000     4d4000 8001875 /bin/bash
ffff88013cd63210 3804800000 3804820000 8000875 /lib64/ld-2.12.so
ffff880138cf8ed0 3804c00000 3804c02000 8000075 /lib64/libdl-2.12.so

14. View the Open Files when System Crashed

Execute files command at the crash prompt, which will display the open files when the system crashed.

crash> files
PID: 5210   TASK: ffff8801396f6aa0  CPU: 0   COMMAND: "bash"
ROOT: /    CWD: /root
 FD       FILE            DENTRY           INODE       TYPE PATH
  0 ffff88013cf76d40 ffff88013a836480 ffff880139b70d48 CHR  /tty1
  1 ffff88013c4a5d80 ffff88013c90a440 ffff880135992308 REG  /proc/sysrq-trigger
255 ffff88013cf76d40 ffff88013a836480 ffff880139b70d48 CHR  /tty1
..

15. View System Information when System Crashed

Execute sys command at the crash prompt, which will display system information when the system crashed.

crash> sys
      KERNEL: /usr/lib/debug/lib/modules/2.6.32-431.5.1.el6.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2014-03-26-12:24:39/vmcore  [PARTIAL DUMP]
        CPUS: 1
        DATE: Wed Mar 26 12:24:36 2014
      UPTIME: 00:01:32
LOAD AVERAGE: 0.17, 0.09, 0.03
       TASKS: 159
    NODENAME: elserver1.abc.com
     RELEASE: 2.6.32-431.5.1.el6.x86_64
     VERSION: #1 SMP Fri Jan 10 14:46:43 EST 2014
     MACHINE: x86_64  (2132 Mhz)
      MEMORY: 4 GB
       PANIC: "Oops: 0002 [#1] SMP " (check log for details)

Knowledge Fact

Wednesday, August 13, 2014

How to use kdump for Linux Kernel Crash Analysis