English 中文(简体)
在C++类中拥有虚方法的性能成本是什么?
原标题:
  • 时间:2009-03-20 19:30:37
  •  标签:

在C++类(或其任何父类)中至少拥有一个虚方法意味着该类将拥有一个虚表,每个实例都将拥有一个虚指针。

因此,内存成本非常清晰。最重要的是实例的内存成本(特别是如果实例很小,例如如果它们只是用来包含一个整数:在这种情况下,每个实例都有一个虚拟指针可能会使实例的大小加倍。至于虚拟表占用的内存空间,我猜它通常与实际方法代码占用的空间相比微不足道。

这引出了我的问题:将一个方法声明为虚方法是否会导致成本增加(比如速度减慢)?这会在每次调用方法时在运行时进行虚表查找,如果对这个方法进行的调用非常频繁,而且这个方法非常短,那么可能会有明显的性能损失?我猜这取决于平台,但有人进行过一些基准测试吗?

我问的原因是,我遇到了一个程序员忘记定义方法为虚函数导致的bug。这不是我第一次看到这种错误。我想:为什么我们要在需要时添加虚关键字,而不是在确信需要时移除虚关键字呢?如果性能成本很低,我认为我会在我的团队中建议以下做法:默认情况下,在每个类中将每个方法,包括析构函数,都设为虚函数,只有在需要时才将其删除。这听起来对你来说很疯狂吗?

最佳回答

我在一台3ghz的顺序执行PowerPC处理器上进行了一些计时。 在该架构上,虚函数调用比直接(非虚拟)函数调用多花费7纳秒。

所以,除非函数像微不足道的Get() / Set()访问器那样,否则不值得担心成本,其中除了内联之外的任何东西都有点浪费。在内联为0.5ns的函数上承受7ns的开销是严重的;对于需要500ms执行的函数而言,7ns的开销无意义。

虚函数的大成本实际上不是在vtable中查找函数指针(通常只需要一个周期),而是由于间接跳转通常无法进行分支预测。这可能会导致大型管道气泡,因为处理器在间接跳转(通过函数指针调用)退役并计算新的指令指针之前无法获取任何指令。因此,虚函数调用的成本比从汇编代码中看到的要高得多...但仍然只有7纳秒。

编辑: Andrew、Not Sure 和其他人也提出了一个非常好的观点,即虚函数调用可能会导致指令缓存未命中:如果跳转到不在缓存中的代码地址,则整个程序都会停顿,直到指令从主存中获取。这总是一个显着的停顿:在 Xenon 上,大约需要 650 个周期(根据我的测试)。

然而,这并不是仅限于虚函数的问题,因为即使直接函数调用也会在跳转到不在缓存中的指令时导致缓存未命中。重要的是函数最近是否运行过(这使它更有可能在缓存中),以及您的架构是否可以预测静态(非虚拟)分支并提前将这些指令提取到缓存中。我的 PPC 不可以,但是也许英特尔的最新硬件可以。

我的定时控制可控制指令高速缓存未命中对执行的影响(这是有意为之的,因为我试图独立地检查CPU流水线),因此它们忽略了这种成本。

问题回答

There is definitely measurable overhead when calling a virtual function - the call must use the vtable to resolve the address of the function for that type of object. The extra instructions are the least of your worries. Not only do vtables prevent many potential compiler optimizations (since the type is polymorphic the compiler) they can also thrash your I-Cache.

Of course whether these penalties are significant or not depends on your application, how often those code paths are executed, and your inheritance patterns.

In my opinion though, having everything as virtual by default is a blanket solution to a problem you could solve in other ways.

Perhaps you could look at how classes are designed/documented/written. Generally the header for a class should make quite clear which functions can be overridden by derived classes and how they are called. Having programmers write this documentation is helpful in ensuring they are marked correctly as virtual.

I would also say that declaring every function as virtual could lead to more bugs than just forgetting to mark something as virtual. If all functions are virtual everything can be replaced by base classes - public, protected, private - everything becomes fair game. By accident or intention subclasses could then change the behavior of functions that then cause problems when used in the base implementation.

It depends. :) (Had you expected anything else?)

Once a class gets a virtual function, it can no longer be a POD datatype, (it may not have been one before either, in which case this won t make a difference) and that makes a whole range of optimizations impossible.

std::copy() on plain POD types can resort to a simple memcpy routine, but non-POD types have to be handled more carefully.

Construction becomes a lot slower because the vtable has to be initialized. In the worst case, the difference in performance between POD and non-POD datatypes can be significant.

In the worst case, you may see 5x slower execution (that number is taken from a university project I did recently to reimplement a few standard library classes. Our container took roughly 5x as long to construct as soon as the data type it stored got a vtable)

Of course, in most cases, you re unlikely to see any measurable performance difference, this is simply to point out that in some border cases, it can be costly.

However, performance shouldn t be your primary consideration here. Making everything virtual is not a perfect solution for other reasons.

Allowing everything to be overridden in derived classes makes it much harder to maintain class invariants. How does a class guarantee that it stays in a consistent state when any one of its methods could be redefined at any time?

Making everything virtual may eliminate a few potential bugs, but it also introduces new ones.

If you need the functionality of virtual dispatch, you have to pay the price. The advantage of C++ is that you can use a very efficient implementation of virtual dispatch provided by the compiler, rather than a possibly inefficient version you implement yourself.

However, lumbering yourself with the overhead if you don t needx it is possibly going a bit too far. And most classesare not designed to be inherited from - to create a good base class requires more than making its functions virtual.

Virtual dispatch is an order of magnitude slower than some alternatives - not due to indirection so much as the prevention of inlining. Below, I illustrate that by contrasting virtual dispatch with an implementation embedding a "type(-identifying) number" in the objects and using a switch statement to select the type-specific code. This avoids function call overhead completely - just doing a local jump. There is a potential cost to maintainability, recompilation dependencies etc through the forced localisation (in the switch) of the type-specific functionality.


IMPLEMENTATION

#include <iostream>
#include <vector>

// virtual dispatch model...

struct Base
{
    virtual int f() const { return 1; }
};

struct Derived : Base
{
    virtual int f() const { return 2; }
};

// alternative: member variable encodes runtime type...

struct Type
{
    Type(int type) : type_(type) { }
    int type_;
};

struct A : Type
{
    A() : Type(1) { }
    int f() const { return 1; }
};

struct B : Type
{
    B() : Type(2) { }
    int f() const { return 2; }
};

struct Timer
{
    Timer() { clock_gettime(CLOCK_MONOTONIC, &from); }
    struct timespec from;
    double elapsed() const
    {
        struct timespec to;
        clock_gettime(CLOCK_MONOTONIC, &to);
        return to.tv_sec - from.tv_sec + 1E-9 * (to.tv_nsec - from.tv_nsec);
    }
};

int main(int argc)
{
  for (int j = 0; j < 3; ++j)
  {
    typedef std::vector<Base*> V;
    V v;

    for (int i = 0; i < 1000; ++i)
        v.push_back(i % 2 ? new Base : (Base*)new Derived);

    int total = 0;

    Timer tv;

    for (int i = 0; i < 100000; ++i)
        for (V::const_iterator i = v.begin(); i != v.end(); ++i)
            total += (*i)->f();

    double tve = tv.elapsed();

    std::cout << "virtual dispatch: " << total <<     << tve <<  
 ;

    // ----------------------------

    typedef std::vector<Type*> W;
    W w;

    for (int i = 0; i < 1000; ++i)
        w.push_back(i % 2 ? (Type*)new A : (Type*)new B);

    total = 0;

    Timer tw;

    for (int i = 0; i < 100000; ++i)
        for (W::const_iterator i = w.begin(); i != w.end(); ++i)
        {
            if ((*i)->type_ == 1)
                total += ((A*)(*i))->f();
            else
                total += ((B*)(*i))->f();
        }

    double twe = tw.elapsed();

    std::cout << "switched: " << total <<     << twe <<  
 ;

    // ----------------------------

    total = 0;

    Timer tw2;

    for (int i = 0; i < 100000; ++i)
        for (W::const_iterator i = w.begin(); i != w.end(); ++i)
            total += (*i)->type_;

    double tw2e = tw2.elapsed();

    std::cout << "overheads: " << total <<     << tw2e <<  
 ;
  }
}

PERFORMANCE RESULTS

On my Linux system:

~/dev  g++ -O2 -o vdt vdt.cc -lrt
~/dev  ./vdt                     
virtual dispatch: 150000000 1.28025
switched: 150000000 0.344314
overhead: 150000000 0.229018
virtual dispatch: 150000000 1.285
switched: 150000000 0.345367
overhead: 150000000 0.231051
virtual dispatch: 150000000 1.28969
switched: 150000000 0.345876
overhead: 150000000 0.230726

This suggests an inline type-number-switched approach is about (1.28 - 0.23) / (0.344 - 0.23) = 9.2 times as fast. Of course, that s specific to the exact system tested / compiler flags & version etc., but generally indicative.


COMMENTS RE VIRTUAL DISPATCH

It must be said though that virtual function call overheads are something that s rarely significant, and then only for oft-called trivial functions (like getters and setters). Even then, you might be able to provide a single function to get and set a whole lot of things at once, minimising the cost. People worry about virtual dispatch way too much - so do do the profiling before finding awkward alternatives. The main issue with them is that they perform an out-of-line function call, though they also delocalise the code executed which changes the cache utilisation patterns (for better or (more often) worse).

The extra cost is virtually nothing in most scenarios. (pardon the pun). ejac has already posted sensible relative measures.

The biggest thing you give up is possible optimizations due to inlining. They can be especially good if the function is called with constant parameters. This rarely makes a real difference, but in a few cases, this can be huge.


Regarding optimizations:
It is important to know and consider the relative cost of constructs of your language. Big O notation is onl half of the story - how does your application scale. The other half is the constant factor in front of it.

As a rule of thumb, I wouldn t go out of my way to avoid virtual functions, unless there are clear and specific indications that it is a bottle neck. A clean design always comes first - but it is only one stakeholder that should not unduly hurt others.


Contrived Example: An empty virtual destructor on an array of one million small elements may plow through at least 4MB of data, thrashing your cache. If that destructor can be inlined away, the data won t be touched.

When writing library code, such considerations are far from premature. You never know how many loops will be put around your function.

While everyone else is correct about the performance of virtual methods and such, I think the real problem is whether the team knows about the definition of the virtual keyword in C++.

Consider this code, what is the output?

#include <stdio.h>

class A
{
public:
    void Foo()
    {
        printf("A::Foo()
");
    }
};

class B : public A
{
public:
    void Foo()
    {
        printf("B::Foo()
");
    }
};

int main(int argc, char** argv)
{    
    A* a = new A();
    a->Foo();

    B* b = new B();
    b->Foo();

    A* a2 = new B();
    a2->Foo();

    return 0;
}

Nothing surprising here:

A::Foo()
B::Foo()
A::Foo()

As nothing is virtual. If the virtual keyword is added to the front of Foo in both A and B classes, we get this for the output:

A::Foo()
B::Foo()
B::Foo()

Pretty much what everyone expects.

Now, you mentioned that there are bugs because someone forgot to add a virtual keyword. So consider this code (where the virtual keyword is added to A, but not B class). What is the output then?

#include <stdio.h>

class A
{
public:
    virtual void Foo()
    {
        printf("A::Foo()
");
    }
};

class B : public A
{
public:
    void Foo()
    {
        printf("B::Foo()
");
    }
};

int main(int argc, char** argv)
{    
    A* a = new A();
    a->Foo();

    B* b = new B();
    b->Foo();

    A* a2 = new B();
    a2->Foo();

    return 0;
}

Answer: The same as if the virtual keyword is added to B? The reason is that the signature for B::Foo matches exactly as A::Foo() and because A s Foo is virtual, so is B s.

Now consider the case where B s Foo is virtual and A s is not. What is the output then? In this case, the output is

A::Foo()
B::Foo()
A::Foo()

The virtual keyword works downwards in the hierarchy, not upwards. It never makes the base class methods virtual. The first time a virtual method is encountered in the hierarchy is when the polymorphism begins. There isn t a way for later classes to make previous classes have virtual methods.

Don t forget that virtual methods mean that this class is giving future classes the ability to override/change some of its behaviors.

So if you have a rule to remove the virtual keyword, it may not have the intended effect.

The virtual keyword in C++ is a powerful concept. You should make sure each member of the team really knows this concept so that it can be used as designed.

Depending on your platform, the overhead of a virtual call can be very undesirable. By declaring every function virtual you re essentially calling them all through a function pointer. At the very least this is an extra dereference, but on some PPC platforms it will use microcoded or otherwise slow instructions to accomplish this.

I d recommend against your suggestion for this reason, but if it helps you prevent bugs then it may be worth the trade off. I can t help but think that there must be some middle ground that is worth finding, though.

It will require just a couple of extra asm instruction to call virtual method.

But I don t think you worry that fun(int a, int b) has a couple of extra push instructions compared to fun(). So don t worry about virtuals too, until you are in special situation and see that it really leads to problems.

P.S. If you have a virtual method, make sure you have a virtual destructor. This way you ll avoid possible problems


In response to xtofl and Tom comments. I did small tests with 3 functions:

  1. Virtual
  2. Normal
  3. Normal with 3 int parameters

My test was a simple iteration:

for(int it = 0; it < 100000000; it ++) {
    test.Method();
}

And here the results:

  1. 3,913 sec
  2. 3,873 sec
  3. 3,970 sec

It was compiled by VC++ in debug mode. I did only 5 tests per method and computed the mean value (so results may be pretty inaccurate)... Any way, the values are almost equal assuming 100 million calls. And the method with 3 extra push/pop was slower.

The main point is that if you don t like the analogy with the push/pop, think of extra if/else in your code? Do you think about CPU pipeline when you add extra if/else ;-) Also, you never know on what CPU the code will be running... Usual compiler can generates code more optimal for one CPU and less optimal for an other (Intel C++ Compiler)





相关问题
热门标签