English 中文(简体)
Current Linux Kernel debugging techniques
原标题:

A linux machine freezes few hours after booting and running software (including custom drivers). I m looking a method to debug such problem. Recently, there has been significant progress in Linux Kernel debugging techniques, hasn t it?

I kindly ask to share some experience on the topic.

问题回答

If you can reproduce the problem inside a VM, there is indeed a fairly new (AFAIK) technique which might be useful: debugging the virtual machine from the host machine it runs on.

See for example this: Debugging Linux Kernel in VMWare with Windows host

VMware Workstation 7 also enables a powerful technique that lets you record system execution deterministically and then replay it as desired, even backwards. So as soon as the system crashes you can go backwards and see what was happening then (and even try changing something and see if it still crashes). IIRC I read somewhere you can t do this and debug the kernel using VMware/gdb at the same time.

Obviously, you need a VMM for this. I don t know what VMM s other than VMware s VMM family support this, and I don t know if any free VMware versions support this. Likely not; one can t really expect a commercial company to give away everything for free. The trial version is 30 days.

If your custom drivers are for hardware inside the machine, then I suppose this probably won t work.

SystemTap seems to be to Linux what Dtrace is to Solaris .. however I find it rather hostile to use. Still, you may want to give it a try. NB: compile the kernel with debug info and spend some time with the kernel instrumentation hooks.

This is why so many are still using printk() after empirically narrowing a bug down to a specific module.

I m not recommending it, just pointing out that it exists. I may not be smart enough to appreciate some underlying beauty .. I just write drivers for odd devices.

There are many and varied techniques depending on the sort of problems you want to debug. In your case the first question is "is the system really frozen?". You can enable the magic sysrq key and examine the system state at freeze and go from there.

Probably the most directly powerful method is to enable the kernel debugger and connect to it via a serial cable.

One option is to use Kprobes. A quick search on google will show you all the information you need. It isn t particularly hard to use. Kprobes was created by IBM I believe as a solution for kernel debugging. It is essentially a elaborate form of printk() however it allows you to handle any "breakpoints" you insert using handlers. It may be what you are looking for. All you need to do is write and insmod a module into the kernel which will handle any "breakpoints" hit that you specify in the module.

Hope that can be a useful option...

How I debug this kind of bug, was to run my OS inside the VirtualBox, and compile the kernel with kgdb builtin. Then I setup a serial console on the VirtualBox so that I can gdb to the kernel inside the VirtualBox s OS via the serial console. Anytime the OS hang, just like magic sysrq key, I can enter ctrl-c on the gdb to stop and understand the kernel at that point in time.

Normally kernel stack tracing is just too difficult to pinpoint the culprit process, so the best way I think is still generic "top" command, just looking at the application logs to see what are the cause of hanging - this will need a reboot to see the log of course.





相关问题
Signed executables under Linux

For security reasons, it is desirable to check the integrity of code before execution, avoiding tampered software by an attacker. So, my question is How to sign executable code and run only trusted ...

encoding of file shell script

How can I check the file encoding in a shell script? I need to know if a file is encoded in utf-8 or iso-8859-1. Thanks

How to write a Remote DataModule to run on a linux server?

i would like to know if there are any solution to do this. Does anyone? The big picture: I want to access data over the web, using my delphi thin clients. But i´would like to keep my server/service ...

How can I use exit codes to run shell scripts sequentially?

Since cruise control is full of bugs that have wasted my entire week, I have decided the existing shell scripts I have are simpler and thus better. Here is what I have so far svn update /var/www/...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

热门标签