Overview
Introduction
As Linux is increasingly being employed by enterprise-class systems,
the importance of fault analysis likewise increases. As it stands now,
when a fault occurs on a production system, the debugging process is
extremely difficult. Furthermore, there are many cases in which these
faults cannot be easily reproduced. In response to these kinds of cases,
analysis via a crash dump becomes the last-line of defense. Hence, the
ability to accurately capture dumps at the moment a system crashes is
an absolutely vital prerequisite for fault analysis.
Problems with existing Crash Dumps
Standard Linux does not include the capability of creating crash dumps.
Software such as LKCD[1], netdump[2], and diskdump[3] have been offered as
solutions for those who want to add crash dump generation capabilities to
Linux. However, from the standpoint of accurately capturing dumps, these
crash dump utilities all have two major problems.
First, there is a problem with resources (notably the problem of resources
locking up). All current crash dump systems employ existing drivers,
which use services in the main kernel, to output to the dump device. If
the operation that has caused the crash has also locked up, and locked
resources necessary to output the dump, the dump operation will cause
deadlock. Likewise, on multi-process systems as operations being run on
other CPUs are forced to stop when the crash occurs, there is the possibility
that resources necessary for copying the dump may be locked, and accordingly,
the dump operation is put into deadlock. Even if this doesn't result in a
lock-up, insufficient system resources may also cause the dump operation to fail.
The source of the second problem is the reliability of a control table.
A kernel crash means that some kind of inconsistency has occurred within
the kernel and that there is a strong possibility a control table has been
damaged. As the existing crash dump solutions make use of standard kernel
functions for outputting the dump, there is the very real possibility that
the damaged control table will be referenced. Inherently, the crash dump
function should be designed with the intent of using the smallest and most
workable route, and be designed from the premise that the kernel can no longer
be trusted. The problem with current Linux crash dumps is that they are
constructed based on the premise that the kernel can be trusted and will in
fact be operating in a normal fashion (this is particularly true with LKCD).
As the Linux kernel becomes more complex, and this is true even if one
eliminates the SCSI sub-system, it is very difficult to fully grasp the
operation of the entire system. It is also commonly thought that setting up a
controllable dump route within the Linux kernel is very difficult. Thus, we
believe that it is necessary to develop a method capable of capturing crash
dumps independent from the existing kernel.
Mini Kernel Dump
We have been able to greatly increase reliability and accuracy in capturing
crash dumps through a method we refer to as a "mini kernel dump." The basic
idea behind the mini kernel dump is to initiate a separate kernel at the
moment of the crash and then have this mini kernel obtain the crash dump.
The operation of the mini kernel crash dump works in the following manner:
- A very small kernel (mini kernel), designed only to carry out a crash
dump operation, is prepared in advance.
- Memory which the mini kernel will use is specifically defined and the mini
kernel is then loaded.
- The mini kernel is initiated in the event of a crash. This does not make
use of the BIOS and does not clear the memory.
- The mini kernel copies all memory into the dump device and then reboots.
The second operation should be done as soon as possible after the system is booted.
About 4 MB of space is adequate for the mini kernel to operate. Obviously,
the consumption of a mere 4 MB of space should not be a problem for multi-GB
systems. Furthermore, as the mini kernel operates irrespectively of the
resources used by the crashed kernel, we do not face the problems mentioned above.
Mkexec
Mini kernel operations employ the same scheme as kexec[4]. In fact, mini
kernel's code actually uses modified kexec code. The major differences between
the mini kernel's kexec (mkexec) and the standard kexec are as follows:
- Whereas kexec loads the kernel in piecemeal single page units, mkexec
loads contiguously.
- Mkexec allocates memory space needed for the mini-kernel to operate.
- Kexec boots up the kernel after having copied it to its original location.
Mkexec doesn't copy.
It is also possible to have multiple memory spaces allocated for the mini
kernel dump. However, as 4 MB is sufficient for carrying out the dump operation,
allocating a single space is the simplest method. Mkexec is also designed to
work as a kernel module. However, a very small patch is necessary to accomplish
this. If kexec is supported in the standard kernel, the necessary mkexec kernel
patch then acts only as a hook, which calls on the crash dump operation at the
time of a crash. It is also possible for mkexec and kexec to coexist with one
another. With mkexec, after the mini kernel has been loaded, allocated memory
space is set to read-only. This, however slightly, also helps to reduce the
possibility of damaging the mini kernel when a kernel crash occurs.
Mini Kernel
You can use both a special build of mini kernel or you can use the same binary
image as you booted for the standard/parent one. Still It is recommended to
rather prepare a special stripped down build just for the crash dumping
purposes. In order to build the mini kernel you must apply the "minik" patch
consisting of:
- Image Relocability Patch
This patch is needed because the mini kernel's loaded address differs from
that of the standard kernel.
- Dump Capture Patch
This patch assures that no other operations outside of the dump capture
operational structure and the dump capture operation itself occur. After
initialization, the dump capture operation is carried out and the system is
simply rebooted. There is no mounting of the root file system.
Next, do the configuration and make the kernel. Compile only the minimum required
drivers statically.
In order to accurately capture a dump, it is important that extraneous
operations are avoided to the utmost degree possible. It is particularly important
not to mount the file system, as there is the possibility that it has been
damaged at the time of the crash.
In order to increase flexibility for the mini kernel's tasks, the same scheme
as initrd will be employed. The root file system is prepared in advance and,
along with the mini kernel, is simultaneously stored in memory. The root
file system is mounted from a RAM disk. However, one should be aware that
the most important point of the mini kernel dump is to accurately capture
dumps, and keep this point in mind when extending its features.
There is no modification of device drivers with the mini kernel. Accordingly,
all devices supported by the standard kernel can be used. This is a major
strong point when compared with diskdump, which requires modification of the
HBA driver. The disk unit is the only dump device which mini kernel dump uses.
Of course, it is also possible to have the dump device be a network, but,
again, when one sets a priority on accurately capturing dumps, a network dump
is not considered to be preferable.
Dump Format
When the system crashes, the context is saved before the mini kernel boots up.
Basically, the mini kernel copies all memory as it is to the dump device.
As the control tables cannot be trusted when a crash occurs, it is best to
copy the memory as it as. Analysis of the crash dump is done using "lcrash"
or "crash". We also have prepared a tool which allows for the copied crash
dump to be converted to the LKCD or netdump format. This is a very simple
conversion. Finally, we are also working on improving these fault analysis tools.
Current State of our Project
We currently are building the mini kernel dump with IA-32 and are also in
the process of verifying the quality of our product. We are also comparing
the accuracy of the crash dumps captured via mini kernel dump with other
existing crash dump software. Furthermore, we have plans to make mini kernel
dump compliant with x86-64 and IA-64.
Conclusion
In this paper I introduced the mini kernel dump, a method we have developed
to facilitate the accurate capture of crash dumps. As mini kernel dump
operates separately from the resources of the crashed kernel, the level of
accuracy in capturing crash dumps is greatly increased.
As there is very little need for modification of the existing kernel,
the hurdles to be cleared to get mini kernel dump working are very low.
Furthermore, the fact that the mini kernel does not require modifications to
the device drivers is also one of its strong points.
Reference:
[1] LKCD