Mini Kernel Dump

First, there is a problem with resources (notably the problem of resources locking up). All current crash dump systems employ existing drivers, which use services in the main kernel, to output to the dump device. If the operation that has caused the crash has also locked up, and locked resources necessary to output the dump, the dump operation will cause deadlock. Likewise, on multi-process systems as operations being run on other CPUs are forced to stop when the crash occurs, there is the possibility that resources necessary for copying the dump may be locked, and accordingly, the dump operation is put into deadlock. Even if this doesn't result in a lock-up, insufficient system resources may also cause the dump operation to fail.

The source of the second problem is the reliability of a control table. A kernel crash means that some kind of inconsistency has occurred within the kernel and that there is a strong possibility a control table has been damaged. As the existing crash dump solutions make use of standard kernel functions for outputting the dump, there is the very real possibility that the damaged control table will be referenced. Inherently, the crash dump function should be designed with the intent of using the smallest and most workable route, and be designed from the premise that the kernel can no longer be trusted. The problem with current Linux crash dumps is that they are constructed based on the premise that the kernel can be trusted and will in fact be operating in a normal fashion (this is particularly true with LKCD).

As the Linux kernel becomes more complex, and this is true even if one eliminates the SCSI sub-system, it is very difficult to fully grasp the operation of the entire system. It is also commonly thought that setting up a controllable dump route within the Linux kernel is very difficult. Thus, we believe that it is necessary to develop a method capable of capturing crash dumps independent from the existing kernel.

Mini Kernel Dump

We have been able to greatly increase reliability and accuracy in capturing crash dumps through a method we refer to as a "mini kernel dump." The basic idea behind the mini kernel dump is to initiate a separate kernel at the moment of the crash and then have this mini kernel obtain the crash dump. The operation of the mini kernel crash dump works in the following manner:

A very small kernel (mini kernel), designed only to carry out a crash dump operation, is prepared in advance.
Memory which the mini kernel will use is specifically defined and the mini kernel is then loaded.
The mini kernel is initiated in the event of a crash. This does not make use of the BIOS and does not clear the memory.
The mini kernel copies all memory into the dump device and then reboots.

The second operation should be done as soon as possible after the system is booted. About 4 MB of space is adequate for the mini kernel to operate. Obviously, the consumption of a mere 4 MB of space should not be a problem for multi-GB systems. Furthermore, as the mini kernel operates irrespectively of the resources used by the crashed kernel, we do not face the problems mentioned above.

Mkexec

Mini kernel operations employ the same scheme as kexec[4]. In fact, mini kernel's code actually uses modified kexec code. The major differences between the mini kernel's kexec (mkexec) and the standard kexec are as follows:

Whereas kexec loads the kernel in piecemeal single page units, mkexec loads contiguously.
Mkexec allocates memory space needed for the mini-kernel to operate.
Kexec boots up the kernel after having copied it to its original location. Mkexec doesn't copy.

It is also possible to have multiple memory spaces allocated for the mini kernel dump. However, as 4 MB is sufficient for carrying out the dump operation, allocating a single space is the simplest method. Mkexec is also designed to work as a kernel module. However, a very small patch is necessary to accomplish this. If kexec is supported in the standard kernel, the necessary mkexec kernel patch then acts only as a hook, which calls on the crash dump operation at the time of a crash. It is also possible for mkexec and kexec to coexist with one another. With mkexec, after the mini kernel has been loaded, allocated memory space is set to read-only. This, however slightly, also helps to reduce the possibility of damaging the mini kernel when a kernel crash occurs.

Mini Kernel

You can use both a special build of mini kernel or you can use the same binary image as you booted for the standard/parent one. Still It is recommended to rather prepare a special stripped down build just for the crash dumping purposes. In order to build the mini kernel you must apply the "minik" patch consisting of:

Image Relocability Patch
This patch is needed because the mini kernel's loaded address differs from that of the standard kernel.
Dump Capture Patch
This patch assures that no other operations outside of the dump capture operational structure and the dump capture operation itself occur. After initialization, the dump capture operation is carried out and the system is simply rebooted. There is no mounting of the root file system.

Next, do the configuration and make the kernel. Compile only the minimum required drivers statically.

In order to accurately capture a dump, it is important that extraneous operations are avoided to the utmost degree possible. It is particularly important not to mount the file system, as there is the possibility that it has been damaged at the time of the crash.

In order to increase flexibility for the mini kernel's tasks, the same scheme as initrd will be employed. The root file system is prepared in advance and, along with the mini kernel, is simultaneously stored in memory. The root file system is mounted from a RAM disk. However, one should be aware that the most important point of the mini kernel dump is to accurately capture dumps, and keep this point in mind when extending its features.

There is no modification of device drivers with the mini kernel. Accordingly, all devices supported by the standard kernel can be used. This is a major strong point when compared with diskdump, which requires modification of the HBA driver. The disk unit is the only dump device which mini kernel dump uses. Of course, it is also possible to have the dump device be a network, but, again, when one sets a priority on accurately capturing dumps, a network dump is not considered to be preferable.

Dump Format

When the system crashes, the context is saved before the mini kernel boots up. Basically, the mini kernel copies all memory as it is to the dump device. As the control tables cannot be trusted when a crash occurs, it is best to copy the memory as it as. Analysis of the crash dump is done using "lcrash" or "crash". We also have prepared a tool which allows for the copied crash dump to be converted to the LKCD or netdump format. This is a very simple conversion. Finally, we are also working on improving these fault analysis tools.

Current State of our Project

We currently are building the mini kernel dump with IA-32 and are also in the process of verifying the quality of our product. We are also comparing the accuracy of the crash dumps captured via mini kernel dump with other existing crash dump software. Furthermore, we have plans to make mini kernel dump compliant with x86-64 and IA-64.

Conclusion

In this paper I introduced the mini kernel dump, a method we have developed to facilitate the accurate capture of crash dumps. As mini kernel dump operates separately from the resources of the crashed kernel, the level of accuracy in capturing crash dumps is greatly increased. As there is very little need for modification of the existing kernel, the hurdles to be cleared to get mini kernel dump working are very low. Furthermore, the fact that the mini kernel does not require modifications to the device drivers is also one of its strong points.

Reference:

[1] LKCD
http://lkcd.sourceforge.net/
[2] netdump
http://www.redhat.com/support/wpapers/redhat/netdump/
[3] diskdump
http://sourceforge.net/projects/lkdump/
[4] kexec
http://developer.osdl.org/rddunlap/kexec/

Project(download):

http://sourceforge.net/projects/mkdump/

Itsuro Oda (a.k.a. oda @ valinux japan)