Thursday, November 04, 2010

Return your objects by value (at least sometimes)

Starting where we left a year and a half back (and I swear I had no time for the blog in this intervening period), we'll look at temporaries again, but in a slightly different light. We saw in that article that it's a great idea to eliminate temporaries of non-POD types (or more correctly, types with non-trivial copy semantics) as far as possible. One big advantage of eliminating temporaries is the elimination of redundant copying from a temporary to a named instance which is how a lot of temporaries inevitably end up. Common ways in which temporaries get created are when objects are returned by value from a function call, and also when a function (say foo) is invoked in-place in the argument list of the invocation of another function (say bar) - and thus the return value of foo() is passed to bar(). Some of these cases are illustrated in the following code:

Fud foo(int i) // Fud is a copiable class
{
return Fud(i); // Temporary created
}
void bar(Fud);
...
bar(foo(1)); // Temporary passed to bar
Fud f = foo(2);
...

Now suppose we were to rewrite the above code to eliminate temporaries. There are a few different ways, but let's try a fairly simple approach:

Fud foo(int i) // Fud is a copiable class
{
return Fud(i); // Temporary created
}
void bar(const Fud&);
...
bar(foo(1)); // const-reference to temporary, no copying
Fud f = foo(2);
...

In the above code, we have eliminated the pass by value of a Fud object to bar(...) but foo(...) still returns a temporary by value. Here is an enhancement:


void foo(int i, Fud *&f) // Fud is a copiable class
{
f = new Fud(i);
}
void bar(const Fud&);
...
Fud *pfud = NULL;
foo(2, pfud);
bar(*pfud); // const-reference to temporary, no copying
...

This time there don't seem to be any temporaries and consequently no redundant copying. But the code is no longer simple. At the least, you cannot do something like bar(foo(...)) any more - nesting calls is a natural algebraic operation used to compose functions but the elimination of references takes that ability away from us. Strictly speaking, with Boost shared_ptr, we could eliminate this limitation (how? left as an exercise to the reader, an author's exclusive prerogative). But it still means that we have to incur the cost of dynamic memory allocation. Again, dynamic memory allocation is not necessarily evil - sometimes, it is even more welcome than allocating a large stack based object. But the overall readability of code has suffered as well.
What do we do? Well, it depends.
A good idea is to start by passing temporaries back by value. Now don't cross your eyes - what you just read is exactly what I said and there is a little something that the C++ standard allows (without mandating) and most standard compilers implement as an optimization, which makes this possible. It is called Copy Elision and in a special form, Return Value Optimization or RVO for short.

Copy Elision


Copy elision simply refers to elimination of redundant copying, if at all possible. For example, consider the following code:

Fud obj = Fud(1);

What's happening there? A Fud instance called obj is initialized from a temporary Fud object created through Fud(1). Normally, we would expect that Fud's constructor will be invoked to create a Fud object with an initialization parameter value of 1. Next, the instance called obj will be created and initialized through a call to its copy-constructor which will copy the state of the temporary object to obj. However, it is not hard to see that the only purpose of the temporary object is to help initialize the obj instance. This code could have been written in a much simpler way as:

Fud obj(1);

There would have been no calls necessary to the copy constructor of obj, nor would there be any temporary instance to destruct. The behaviour of the code would have been exactly the same (unless of course the copy constructor or destructor had side-effects - always a bad idea). Copy elision is the inbuilt optimization in the compiler which causes code like this:

Fud obj = Fud(1);

to generate the equivalent of code like this:

Fud obj(1);

and thus prevent needless copying. An interesting special case is Return Value Optimization (RVO). It is best illustrated with the following example:

#include <vector>
#include <string>
#include <iostream>

using std::vector;
using std::string;
using std::cout;
using std::endl;

struct TestClass
{
TestClass(int i) : i_(i)
{}

TestClass(const TestClass& tc) : i_(tc.i_)
{
cout << "Copied." << endl;
}

private:
int i_;
};

vector<TestClass> make_vec()
{
vector<TestClass> vec;
vec.reserve(20);

cout << "Starting populating vector." << endl;
for (int i = 0; i < 10; i++) {
vec.push_back(TestClass(i));
}
cout << "Vector populated." << endl;

return vec;
}

int main()
{
vector<TestClass> vec = make_vec();
}

Look at line 40 above. The function make_vec() returns a vector which is copied to the vector vec - what looks like copy initialization. Since the contained type of the vector is TestClass, you'd expect the TestClass instances to be copied, first time when they are enqueued in the vector inside make_vec (line 31) and again when the returned temporary vector from make_vec is used for copy-initialization of vec at line 40. That would be conformant behaviour, but that's often not the behaviour you get. Running on gcc 4.3.2 on OpenSuSE gave me this output:

Starting populating vector.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Vector populated.

This clearly shows that there is no copying during copy-initialization of vec - in other words there is no copy-initialization. This is an example of Copy Elision - a special case known as RVO. Internally, the compiler might generate code that arranges for a reference to vec on line 40 to be passed to the function make_vec, and obviate the local vector inside make_vec so that all push_back operations happen on this reference instead. A slight change to the above code can completely disable Copy Elision / RVO, as illustrated below:

return vec;
}

int main()
{
vector<TestClass> vec;
vec = make_vec();
}

I got this:

Starting populating vector.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Vector populated.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.
Copied.

Quite clearly, the copy-initialization takes place and has not been optimized away.

The big advantage of RVO is that the syntax of function calls can be made to match the semantics of the mapping that the function represents. The return value rather than an out-parameter is a return value. This also allows for nested function calls, a natural algebraic expression of functional composition.

Therefore, depending on how the return value of function is going to be used - it might be a good idea to return an object by value.

1 comment:

Chris said...

If copying the object is heavy, it can be avoided in some situations with smart pointers.

For the foo example:
boost::shared_ptr foo(int i)
{
boost::shared_ptr p = new Fud(i);
return p;
}
...
boost::shared_ptr pfud = foo(2);
...