Mini Kernel Dump

High Reliable Capturing Crash Dumps for Linux

last updated March. 30 2006



What's new

2006.3.30
mkdump 3.0 released.
  • now a mini kernel is loaded to the standard address.
  • the dump output part of a mini kernel can be a kernel module.
  • initrd is supported.
  • RHEL3/4 is available without a patch.
2005.12.28
mkdump 2.0 update 1 released. The main enhance is to support RHEL3.
2005.9.7
mkdutils released. (it is in version 1.0 release. it can be used for 2.0 too.)
2005.8.31
mkdump release 2.0 for Linux kernel 2.6.12 and Debian 2.6.8-16
2005.8.26
update of mkdump 1.0 released (for version 1.0 users). it includes some critical bug fixes
2005.7.15
mkdump release 1.90 for Linux kernel 2.6.12; i386 only
2005.5.26
The dump format conversion tools released
2005.5.2
CVS opened
2005.3.31
Version 1.0 released
  • fix bug when ACPI on (x86_64)
  • now include kernel 2.4 version of mkexec (for 2.4.27) (beta)
2004.12.28
Beta-3 released
  • We checked the code from crash occur to the mini kernel start carefully and eliminate the possibility of the deadlock/hang condition.
2004.11.19
Beta-2 released
  • x86_64 support
  • non PAE support (i386)
2004.10.14:
Mini Kernel Dump Beta-1 released

Documentation



Overview


Introduction

As Linux is increasingly being employed by enterprise-class systems, the importance of fault analysis likewise increases. As it stands now, when a fault occurs on a production system, the debugging process is extremely difficult. Furthermore, there are many cases in which these faults cannot be easily reproduced. In response to these kinds of cases, analysis via a crash dump becomes the last-line of defense. Hence, the ability to accurately capture dumps at the moment a system crashes is an absolutely vital prerequisite for fault analysis.

Problems with existing Crash Dumps

Standard Linux does not include the capability of creating crash dumps. Software such as LKCD[1], netdump[2], and diskdump[3] have been offered as solutions for those who want to add crash dump generation capabilities to Linux. However, from the standpoint of accurately capturing dumps, these crash dump utilities all have two major problems.

First, there is a problem with resources (notably the problem of resources locking up). All current crash dump systems employ existing drivers, which use services in the main kernel, to output to the dump device. If the operation that has caused the crash has also locked up, and locked resources necessary to output the dump, the dump operation will cause deadlock. Likewise, on multi-process systems as operations being run on other CPUs are forced to stop when the crash occurs, there is the possibility that resources necessary for copying the dump may be locked, and accordingly, the dump operation is put into deadlock. Even if this doesn't result in a lock-up, insufficient system resources may also cause the dump operation to fail.

The source of the second problem is the reliability of a control table. A kernel crash means that some kind of inconsistency has occurred within the kernel and that there is a strong possibility a control table has been damaged. As the existing crash dump solutions make use of standard kernel functions for outputting the dump, there is the very real possibility that the damaged control table will be referenced. Inherently, the crash dump function should be designed with the intent of using the smallest and most workable route, and be designed from the premise that the kernel can no longer be trusted. The problem with current Linux crash dumps is that they are constructed based on the premise that the kernel can be trusted and will in fact be operating in a normal fashion (this is particularly true with LKCD).

As the Linux kernel becomes more complex, and this is true even if one eliminates the SCSI sub-system, it is very difficult to fully grasp the operation of the entire system. It is also commonly thought that setting up a controllable dump route within the Linux kernel is very difficult. Thus, we believe that it is necessary to develop a method capable of capturing crash dumps independent from the existing kernel.

Mini Kernel Dump

We have been able to greatly increase reliability and accuracy in capturing crash dumps through a method we refer to as a "mini kernel dump." The basic idea behind the mini kernel dump is to initiate a separate kernel at the moment of the crash and then have this mini kernel obtain the crash dump. The operation of the mini kernel crash dump works in the following manner:
  1. A very small kernel (mini kernel), designed only to carry out a crash dump operation, is prepared in advance.
  2. Memory which the mini kernel will use is specifically defined and the mini kernel is then loaded.
  3. The mini kernel is initiated in the event of a crash. This does not make use of the BIOS and does not clear the memory.
  4. The mini kernel copies all memory into the dump device and then reboots.
The second operation should be done as soon as possible after the system is booted. About 4 MB of space is adequate for the mini kernel to operate. Obviously, the consumption of a mere 4 MB of space should not be a problem for multi-GB systems. Furthermore, as the mini kernel operates irrespectively of the resources used by the crashed kernel, we do not face the problems mentioned above.

Mkexec

Mini kernel operations employ the same scheme as kexec[4]. In fact, mini kernel's code actually uses modified kexec code. The major differences between the mini kernel's kexec (mkexec) and the standard kexec are as follows: It is also possible to have multiple memory spaces allocated for the mini kernel dump. However, as 4 MB is sufficient for carrying out the dump operation, allocating a single space is the simplest method. Mkexec is also designed to work as a kernel module. However, a very small patch is necessary to accomplish this. If kexec is supported in the standard kernel, the necessary mkexec kernel patch then acts only as a hook, which calls on the crash dump operation at the time of a crash. It is also possible for mkexec and kexec to coexist with one another. With mkexec, after the mini kernel has been loaded, allocated memory space is set to read-only. This, however slightly, also helps to reduce the possibility of damaging the mini kernel when a kernel crash occurs.

Mini Kernel

You can use both a special build of mini kernel or you can use the same binary image as you booted for the standard/parent one. Still It is recommended to rather prepare a special stripped down build just for the crash dumping purposes. In order to build the mini kernel you must apply the "minik" patch consisting of: Next, do the configuration and make the kernel. Compile only the minimum required drivers statically.

In order to accurately capture a dump, it is important that extraneous operations are avoided to the utmost degree possible. It is particularly important not to mount the file system, as there is the possibility that it has been damaged at the time of the crash.

In order to increase flexibility for the mini kernel's tasks, the same scheme as initrd will be employed. The root file system is prepared in advance and, along with the mini kernel, is simultaneously stored in memory. The root file system is mounted from a RAM disk. However, one should be aware that the most important point of the mini kernel dump is to accurately capture dumps, and keep this point in mind when extending its features.

There is no modification of device drivers with the mini kernel. Accordingly, all devices supported by the standard kernel can be used. This is a major strong point when compared with diskdump, which requires modification of the HBA driver. The disk unit is the only dump device which mini kernel dump uses. Of course, it is also possible to have the dump device be a network, but, again, when one sets a priority on accurately capturing dumps, a network dump is not considered to be preferable.

Dump Format

When the system crashes, the context is saved before the mini kernel boots up. Basically, the mini kernel copies all memory as it is to the dump device. As the control tables cannot be trusted when a crash occurs, it is best to copy the memory as it as. Analysis of the crash dump is done using "lcrash" or "crash". We also have prepared a tool which allows for the copied crash dump to be converted to the LKCD or netdump format. This is a very simple conversion. Finally, we are also working on improving these fault analysis tools.

Current State of our Project

We currently are building the mini kernel dump with IA-32 and are also in the process of verifying the quality of our product. We are also comparing the accuracy of the crash dumps captured via mini kernel dump with other existing crash dump software. Furthermore, we have plans to make mini kernel dump compliant with x86-64 and IA-64.

Conclusion

In this paper I introduced the mini kernel dump, a method we have developed to facilitate the accurate capture of crash dumps. As mini kernel dump operates separately from the resources of the crashed kernel, the level of accuracy in capturing crash dumps is greatly increased. As there is very little need for modification of the existing kernel, the hurdles to be cleared to get mini kernel dump working are very low. Furthermore, the fact that the mini kernel does not require modifications to the device drivers is also one of its strong points.

Reference:

[1] LKCD
http://lkcd.sourceforge.net/
[2] netdump
http://www.redhat.com/support/wpapers/redhat/netdump/
[3] diskdump
http://sourceforge.net/projects/lkdump/
[4] kexec
http://developer.osdl.org/rddunlap/kexec/

Project(download):

http://sourceforge.net/projects/mkdump/

Portions Copyright 2004 NTT DATA CORPORATION.
Portions Copyright 2004 VA Linux Systems Japan K.K.
Itsuro Oda (a.k.a. oda @ valinux japan)