English 中文(简体)
Tools to evaluate callgrind s call profiles?
原标题:

Somehow related to this question, which tool would you recommend to evaluate the profiling data created with callgrind?

It does not have to have a graphical interface, but it should prepare the results in a concise, clear and easy-to-interpret way. I know about e.g. kcachegrind, but this program is missing some features such as data export of the tables shown or simply copying lines from the display.

问题回答

Years ago I wrote a profiler to run under DOS.

If you are using KCacheGrind here s what I would have it do. It might not be too difficult to write it, or you can just do it by hand.

KCacheGrind has a toolbar button "Force Dump", with which you can trigger a dump manually at a random time. The capture of stack traces at random or pseudo-random times, in the interval when you are waiting for the program, is the heart of the technique.

Not many samples are needed - 20 is usually more than enough. If a bottleneck costs a large amount, like more than 50%, 5 samples may be quite enough.

The processing of the samples is very simple. Each stack trace consists of a series of lines of code (actually addresses), where all but the last are function/method calls.

  • Collect a list of all the lines of code that appear on the samples, and eliminate duplicates.

  • For each line of code, count what fraction of samples it appears on. For example, if you take 20 samples, and the line of code appears on 3 of them, even if it appears more than once in some sample (due to recursion) the count is 3/20 or 15%. That is a direct measure of the cost of each statement.

  • Display the most costly 100 or so lines of code. Your bottlenecks are in that list.

What I typically do with this information is choose a line with high cost, and then manually take stack samples until it appears (or look at the ones I ve already got), and ask myself "Why is it doing that line of code, not just in a local sense, but in a global sense." Another way to put it is "What in a global sense was the program trying to accomplish at the time slice when that sample was taken". The reason I ask this is because that tells me if it was really necessary to be spending what that line is costing.

I don t want to be critical of all the great work people do developing profilers, but sadly there is a lot of firmly entrenched myth on the subject, including:

  • that precise measuring, with lots of samples, is important. Rather the emphasis should be on finding the bottlenecks. Precise measurement is not a prerequisite for that. For typical bottlenecks, costing between 10% and 90%, the measurement can be quite coarse.

  • that functions matter more than lines of code. If you find a costly function, you still have to search within it for the lines that are the bottleneck. That information is right there, in the stack traces - no need to hunt for it.

  • that you need to distinguish CPU from wall-clock time. If you re waiting for it, it s wall clock time (wrist-watch time?). If you have a bottleneck consisting of extraneous I/O, for example, do you want to ignore that because it s not CPU time?

  • that the distinction between exclusive time and inclusive time is useful. That only makes sense if you re timing functions and you want some clue whether the time is spent not in callees. If you look at lines of code, the only thing that matters is inclusive time. Another way to put it is, every instruction is a call instruction, even if it only calls microcode.

  • that recursion matters. It is irrelevant, because it doesn t affect the fraction of samples a line is on and is therefore responsible for.

  • that the invocation count of a line or function matters. Whether it s fast and is called too many times, or slow and called once, the cost is the percent of time it uses, and that s what the stack samples estimate.

  • that performance of sampling matters. I don t mind taking a stack sample and looking at it for several minutes before continuing, assuming that doesn t make the bottlenecks move.

Here s a more complete explanation.

There are some CLI tools for working with callgrind data:

callgrind_annotate

and cachegrind tool which can show some information from callgrind.out

cg_annotate




相关问题
Java: Comparing memory heap dumps in Netbeans

How do I compare memory heap dumps in Netbeans? What I have done is to configure my project to use profiling, then added several profiling points at chosen lines of code (similar to setting ...

Tools to evaluate callgrind s call profiles?

Somehow related to this question, which tool would you recommend to evaluate the profiling data created with callgrind? It does not have to have a graphical interface, but it should prepare the ...

Java: Finding out what is using all the memory

I have a java application that runs out of memory, but I have no idea which code is allocating the memory. Is there an application with which I can check this? I use Eclipse.

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Profiling Vim startup time

I’ve got a lot of plugins enabled when using Vim – I have collected plugins over the years. I’m a bit fed up with how long Vim takes to start now, so I’d like to profile its startup and see which of ...

热门标签