English 中文(简体)
How can I test the performance of a C function?
原标题:

Are there some good ways to know how a function performs in C? I would like to, for example compare my own function to a library function.

问题回答

You need high-resolution timers.

On Linux, gettimeofday() is a decent choice, it gives you microsecond resolution. On Windows, QueryPerformanceCounter() is typical. Make sure you run your function many times, to get stable readings.

Quick sample, for Linux:

struct timeval t0, t1;
unsigned int i;

gettimeofday(&t0, NULL);
for(i = 0; i < 100000; i++)
  function_to_measure();
gettimeofday(&t1, NULL);
printf("Did %u calls in %.2g seconds
", i, t1.tv_sec - t0.tv_sec + 1E-6 * (t1.tv_usec - t0.tv_usec));

You would of course adjust the count (100,000) to match the performance of the function. It s best if the function really takes a while to run, otherwise the loop and/or the function call overhead might dominate.

The open-source Callgrind profiler (for Linux) is a really awesome way to measure performance. Coupled with KCacheGrind, you get really great visualizations of where your time is spent.

Callgrind is part of Valgrind.

  • Art

Hello i will give you an example and explain it:

#include <stdio.h>
#include <time.h>

int main(void)
{

    clock_t start_clk = clock();

    /*
        put any code here
    */

    printf("Processor time used by program: %lg sec.
", 
    (clock() - start_clk) / (long double) CLOCKS_PER_SEC);

    return 0;
}

output: Processor time used by program: 4.94066e-324 sec.

time.h:

declares clock_t which is an arithmetic(you can do math on this value like i do in the example) time value. basically put any code where the comment is.

CLOCKS_PER_SEC is a macro declared in time.h, use it as the denominator to convert the value into seconds.

it is essential to cast it to long double for two reasons:

  1. we don t know what type clock_t actually is, but we want to print it (what conversion will you put in printf?).
  2. long double is a very precise type which can represent really small values.

Run it (them) several million times (each) and measure the time it takes.
The one that completes faster is the better performant.

gprof can help :)

Here s the result of gprof when I run a program of mine for 10 seconds (function names changed)

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 60.29      8.68     8.68 115471546     0.00     0.00  workalot
 39.22     14.32     5.64       46   122.70   311.32  work_b
  0.49     14.39     0.07                             inlined
  0.07     14.40     0.01       46     0.22     0.22  work_c
  0.00     14.40     0.00      460     0.00     0.00  find_minimum
  0.00     14.40     0.00      460     0.00     0.00  feedback
  0.00     14.40     0.00       46     0.00     0.00  work_a

Fred, I notice that you said in a comment that you re on OS X. The best way to get very accurate timings of small-scale functions on OS X is with the mach_absoute_time( ) function. You can use it as follows:

#include <mach/mach_time.h>
#include <stdint.h>

int loopCount;

uint64_t startTime = mach_absolute_time( );
for (loopCount = 0; loopCount < iterations; ++loopCount) {
    functionBeingTimed( );
}
uint64_t endTime = mach_absolute_time( );
double averageTime = (double)(endTime-startTime) / iterations;

This gives you the average timing across iterations calls to the function. This can be affected somewhat by effects outside of your process on the system. Thus, you may instead want to take the fastest time:

#include <mach/mach_time.h>
#include <stdint.h>

int loopCount;

double bestTime = __builtin_inf();
for (loopCount = 0; loopCount < iterations; ++loopCount) {
    uint64_t startTime = mach_absolute_time( );
    functionBeingTimed( );
    uint64_t endTime = mach_absolute_time( );
    double bestTime = __builtin_fmin(bestTime, (double)(endTime-startTime));
}

This can have its own problems, especially if the function being timed is very very fast. You need to think about what you are really trying to measure and pick an approach that is scientifically justified (good experimental design is hard). I often use a hybrid between these two approaches as a first attempt at measuring a novel task (a minimum of averages over many calls).

Note also that in the code samples above, the timings are in "mach time units". If you just want to compare algorithms, this is usually fine. For some other purposes, you may want to convert them to nanoseconds or cycles. To do this, you can use the following functions:

#include <mach/mach_time.h>
#include <sys/sysctl.h>
#include <stdint.h>

double ticksToNanoseconds(double ticks) {
    static double nanosecondsPerTick = 0.0;
    // The first time the function is called
    // ask the system how to convert mach
    // time units to nanoseconds
    if (0.0 == nanosecondsPerTick) {
        mach_timebase_info_data_t timebase;
        // to be completely pedantic, check the return code of this call:
        mach_timebase_info(&timebase);
        nanosecondsPerTick = (double)timebase.numer / timebase.denom;
    }
    return ticks * nanosecondsPerTick;
}

double nanosecondsToCycles(double nanoseconds) {
    static double cyclesPerNanosecond = 0.0;
    // The first time the function is called
    // ask the system what the CPU frequency is
    if (0.0 == cyclesPerNanosecond) {
        uint64_t freq;
        size_t freqSize = sizeof(freq);
        // Again, check the return code for correctness =)
        sysctlbyname("hw.cpufrequency", &freq, &freqSize, NULL, 0L );
        cyclesPerNanosecond = (double)freq * 1e-9;
    }
    return nanoseconds * cyclesPerNanosecond;
}

Be aware that the conversion to nanoseconds will always be sound, but the conversion to cycles can go awry in various ways, because modern CPUs do not run at one fixed speed. Nonetheless, it generally works pretty well.

All of these other answers are using some variant of gettimeofday() for timing. This is pretty crude since you usually need to run the kernel many times to get reproducible results. Putting it in a tight loop changes the state of both code and data caches so these results may not be indicative of the real performance.

A much better alternative is to actually use the CPU cycle counter. On x86, you can do this with the rdtsc instruction. This is from x264:

static inline uint32_t read_time(void)
{
    uint32_t a = 0;
#if defined(__GNUC__) && (defined(ARCH_X86) || defined(ARCH_X86_64))
    asm volatile( "rdtsc" :"=a"(a) ::"edx" );
#elif defined(ARCH_PPC)
    asm volatile( "mftb %0" : "=r" (a) );
#elif defined(ARCH_ARM)     // ARMv7 only
    asm volatile( "mrc p15, 0, %0, c9, c13, 0" : "=r"(a) );
#endif
    return a;
}

For more on profiling using various hardware counters, see PAPI. For some purposes, simulators (like Callgrind and interrupt-based profilers (Oprofile) are useful.

Store off the system time before you enter the function. Store off the system time after you return from the function. Subtract the difference and compare the two implementations.

Checkout HighResTimer for a high performance timer.

You ll probably find storing the time before/after isn t accurate enough and will probably result in 0 unless you have a longer running function.

Check out RDTSC but it is better do it like below.

0 - Call system s Sleep or Yield function so that when it returns, you have a new timeslice

1 - RDTSC

2 - Call your function

3 - RDTSC

If your function is a long running one, you have to use some sort of profiling tool like gprof (it is very easy to use) & Intel s VTune application (which I have not used for a long time). After seeing Art s answer, I changed my mind from gprof to Callgrind. I used only the Memcheck tool of Valgrind in the past and it was a magnificent tool. I have not used Callgrind before but I am sure it is better than gprof...

  • Store timestamp before entering function

  • Store timestamp after exiting function

  • Compare timestamps

Make sure to use a significant sample as the time resolution might vary your results. This is especially true for short duration functions. Use high-resolution timers (microseconds resolution is available on most platforms).

As the simplest and portable approach you can use the standard function time(), which returns the current number of seconds since the Epoch.


#include <time.h>

time_t starttime, endtime;

starttime = time(NULL);
for (i = 0; i < 1000000; i++)
{
    testfunc();
}
endtime = time(NULL);

printf("Time in seconds is %d
", (int)(endtime-starttime));

Adjust the number of iterations to your needs. If one function call needs 5 seconds then you need a laaarge cup of coffee for 1000000 iterations... When the difference is less than 1 second, even for a large number, you should 1) ask yourself if it matters and if yes, 2) check if your favorite compiler already has builtin profiling functions.





相关问题
Fastest method for running a binary search on a file in C?

For example, let s say I want to find a particular word or number in a file. The contents are in sorted order (obviously). Since I want to run a binary search on the file, it seems like a real waste ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Tips for debugging a made-for-linux application on windows?

I m trying to find the source of a bug I have found in an open-source application. I have managed to get a build up and running on my Windows machine, but I m having trouble finding the spot in the ...

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

Encoding, decoding an integer to a char array

Please note that this is not homework and i did search before starting this new thread. I got Store an int in a char array? I was looking for an answer but didn t get any satisfactory answer in the ...

热门标签