Wednesday, August 13, 2008

Aliasing pointers: using "weakness" to strengthen your code

In one of my earlier articles introducing programming with Boost libraries, I had demonstrated some simple techniques using boost::shared_ptr to share dynamically allocated object instances across scopes, and manage the life cycle of such objects effectively. This was an important step in the direction of eliminating memory leaks. But this was not a complete answer to all forms of memory leaks. This article helps you distinguish between "owning" and "aliasing" semantics for pointers and introduces another class from Boost's smart pointer stable - the boost::weak_ptr. Although not a smart pointer itself, the weak_ptr adds important capability that is vital in handling memory issues and thread safety of objects.

Imagine that you are building a linked list, and you decide to store the "head" node as a shared_ptr<Node> and encapsulate inside this node, the life cycle management of all the nodes in the entire list. That way, when the "head" node goes out of scope, all the other nodes are automatically deleted. How can you do it? Well, the head pointer stores a shared_ptr<Node> pointing to the first node, the first node stores a shared_ptr<Node> pointing to the second node, and so on.

Here is the code, and it works.


#include <iostream>
#include <boost/shared_ptr.hpp>

using std::cout;
using std::endl;
using boost::shared_ptr;

struct Node {
int n_;
shared_ptr<Node> next;

Node(int n = 0) : n_(n) {
}

~Node() {
cout << "~Node() invoked for " << n_ << endl;
}
};

shared_ptr<Node> getList() {
shared_ptr<Node> head( new Node(0) );
shared_ptr<Node> next1( new Node(1) );
shared_ptr<Node> next2( new Node(2) );

head->next = next1;
next1->next = next2;

return head;
}

int main() {
shared_ptr<Node> head = getList();
if ( head )
cout << "Got head" << endl;

return 0;
}

This works remarkably well, and by now your penchant for using shared_ptr and RAII is seeing you write elegant code and wish away memory leaks. And then you are asked to enhance your linked list to a doubly linked list. With a deft, touch you make the necessary changes - and you are up and running again ...


#include <iostream>
#include <boost/shared_ptr.hpp>

using std::cout;
using std::endl;
using boost::shared_ptr;

struct Node {
int n_;
shared_ptr<Node> next;
shared_ptr<Node> prev;

Node(int n = 0) : n_(n) {
}

~Node() {
cout << "~Node() invoked for " << n_ << endl;
}
};

shared_ptr<Node> getList() {
shared_ptr<Node> head( new Node(0) );
shared_ptr<Node> next1( new Node(1) );
shared_ptr<Node> next2( new Node(2) );

head->next = next1;
next1->next = next2;
next1->prev = head;
next2->prev = next1;

return head;
}

int main() {
shared_ptr<Node> head = getList();
if ( head )
cout << "Got head" << endl;

return 0;
}

... except that you have a memory leak. Believe it or not, what I promised a couple of articles ago was a fib - you can leak memory without even so much as nibbling on a pointer, and in spite of having wrapped all your heap memory in shared_ptr. Run it and you'll not see any of the messages from the destructor of Node. Come to think of it and you'll know the reason.

In the function getList, initially, the nodes head, next1 and next2 each have a reference count of 1 (at line 22, 23 and 24 respectively). Next at line 26, next1 has its ref-count bumped up to 2. In the following line 27, next2 has its ref-count bumped up to 2. So far so good - the real fun starts now. In the next line 28, head has its ref-count bumped to 2, and finally next1 has its ref-count bumped to 3. When this function returns, each of the Smart pointers have their reference counts decreased by 1. But head is copied as a return value - so the decrement nullifies, and when we reach main function at line 35, its reference count remains 2. At this point, the reference count of next1 and next2 are 2 and 1 respectively - but these objects are no longer in scope - completely encapsulated by head. At the end of the main function's scope, the ref-count of head will drop back a notch to 1 - it will not get to 0 - necessary for its internal pointer to get "destructed" and trigger the destruction of its subsequent nodes.

What caused the problem? We only introduced three lines of code - lines 11, 28 and 29 - so the problem would have to be here in these three lines.

Why did all hell break lose? Because, we used the shared_ptr indiscriminately. The shared_ptr implements a "shared ownership" idiom. In this case, head "owns" the next1 node - meaning that before head is completely "destructed", it would have ensured complete "destruction" of next1. But next1 does not own head - it only has an alias to head, in the form of the member prev. But by making it into a shared_ptr<Node>, we have unwittingly made next1 the owner of head. So we hit the chicken-and-egg problem. To reiterate what we just said about ownership semantics - before head is completely "destructed", it should ensure complete "destruction" of next1, and before next1 is completely "destructed", it should ensure complete "destruction" of head. Of course that's an impossible scenario. So we need a different construct for handling the semantics of an aliasing pointer. Here is the code example again with some changes that work reasonably well.


#include <iostream>
#include <boost/shared_ptr.hpp>

using std::cout;
using std::endl;
using boost::shared_ptr;

struct Node {
int n_;
shared_ptr<Node> next;
shared_ptr<Node> *prev;

Node(int n = 0) : n_(n) {
}

~Node() {
cout << "~Node() invoked for " << n_ << endl;
}
};

shared_ptr<Node> getList() {
shared_ptr<Node> head( new Node(0) );
shared_ptr<Node> next1( new Node(1) );
shared_ptr<Node> next2( new Node(2) );

head->next = next1;
next1->next = next2;
next1->prev = &head;
next2->prev = &next1;

return head;
}

int main() {
shared_ptr<Node> head = getList();
if ( head )
cout << "Got head" << endl;

return 0;
}

At this point of time, you decide to remove the first node of the list head. As part of the process, you must do something like this:


int main() {
shared_ptr<Node> new_head;
{
shared_ptr<Node> head = getList();
new_head = head->next;
new_head->prev = 0;
}

return 0;
}

The statement at line 6, is crucial because if prev is not set to 0 at this point, an accidental dereferencing of new_head->prev at a later point in the code will bomb. At the end of the scope, at line 7, the head goes out of scope and is "destructed". What if this automatically set the prev member pointing to it to be NULL. Then we could afford to forget to write new_head->prev = 0, without facing any dire consquences.

The boost::weak_ptr is tailor-made for the purpose shown here - it models a "non-owning alias" which is automatically set to a null state when the object it points to is destructed. For this, weak_ptr always works in conjunction with a shared_ptr, and always points to an object via the shared_ptr that owns it. The following example should make all this clear:


#include <boost/shared_ptr.hpp>
#include <boost/weak_ptr.hpp>

using std::cout;
using std::endl;
using boost::shared_ptr;
using boost::weak_ptr;

struct Node {
int n_;
shared_ptr<Node> next;
weak_ptr<Node> prev;

Node(int n = 0) : n_(n) {
}

~Node() {
cout << "~Node() invoked for " << n_ << endl;
}
};

shared_ptr<Node> getList() {
shared_ptr<Node> head( new Node(0) );
shared_ptr<Node> next1( new Node(1) );
shared_ptr<Node> next2( new Node(2) );

head->next = next1;
next1->next = next2;
next1->prev = head;
next2->prev = next1;

return head;
}

int main() {
shared_ptr<Node> head = getList();
if ( head )
cout << "Got head" << endl;

return 0;
}

The weak_ptr maintains a reference to a shared_ptr owning the object. But it does not bump the reference count of a shared_ptr - therefore any number of weak_ptr's can point to a shared_ptr, without affecting its life cycle in anyway. If the shared_ptr goes out of scope and the underlying object is destructed - the weak_ptr's are automatically "updated" about this change through the reference to the shared_ptr that it maintains.

In an informal way, weak_ptr's are to shared_ptr's, what soft links are to hard links on a Unix file system.

You can check the state of the underlying object through the weak_ptr, using the weak_ptr::expired() member function. You can also construct an owning reference - a shared_ptr which contributes to the reference count of the original shared_ptr - through the weak_ptr::lock() member function. This can indirectly be used to check the same condition as weak_ptr::expired() does. To understand that - look at the final example here:


#include <iostream>
#include <boost/shared_ptr.hpp>
#include <boost/weak_ptr.hpp>

using std::cout;
using std::endl;
using boost::shared_ptr;
using boost::weak_ptr;

struct Node {
int n_;
shared_ptr<Node> next;
weak_ptr<Node> prev;

Node(int n = 0) : n_(n) {
}

~Node() {
cout << "~Node() invoked for " << n_ << endl;
}
};

shared_ptr<Node> getList() {
shared_ptr<Node> head( new Node(0) );
shared_ptr<Node> next1( new Node(1) );
shared_ptr<Node> next2( new Node(2) );

head->next = next1;
next1->next = next2;
next1->prev = head;
next2->prev = next1;

return head;
}

int main() {
shared_ptr<Node> new_head;
weak_ptr<Node> alias;
{
shared_ptr<Node> head = getList();
new_head = head->next;
alias = head;
}

assert(alias.expired());

alias = new_head;
shared_ptr<Node> new_head2 = alias.lock();
assert(new_head2.use_count() == 2); // because new_head and new_head2 share ownership

return 0;
}

The lock method is particularly crucial when sharing an object across threads. Some threads might only have an aliasing reference to the shared object in the form of a weak_ptr. This can be used to great effect. At any time when such a thread needs to share ownership of this object - it can call lock on the weak_ptr. It will get a non-null shared_ptr only if the object is still valid, and otherwise it will get a null shared_ptr.

Notice that the weak_ptr provides no ways of dereferencing the underlying pointer through it. So there are no overloaded -> or * operators for example. If you need to use these operators - use the lock method to try and get a shared_ptr and then apply these methods on the shared_ptr. In fact - a weak_ptr is not a smart pointer - I call it a "smarter" pointer.

Read more!

Sunday, August 03, 2008

Expression Templates Demystified: Part 2

In this article, we would see how the use of templates can simplify and generalize the Expression Functors code, and improve performance by eliminating virtual functions.

In the example from the previous segment of this article, the Expression abstract base class represented the family of all types that could be used to represent different kinds of expressions. Two specific classes, Variable and Constant, represented the leafs of an expression tree - numeric literals and single variable names. These could be combined to form other expressions - generally represented by the ComplexExpression class. All these classes implemented the interface defined by the Expression abstract base class.

We now plan to switch from dynamic polymorphism to template driven static polymorphism. For example, given classes with a common interface but without a common base class, like the ones below:

Show line numbers
 struct Foo {
void bar() {
// implementation
}
};

struct Foo2 {
void bar() {
// implementation
}
};

we can wrap them using a class template that has a similar interface as these:

Show line numbers
 template<class F>
struct AllFoo {
AllFoo(F& f) : f_(f) {}

void bar() {
f_.bar();
}
};

This enforces that AllFoo can only be instantiated with such classes which have the bar() method in their public interface. Thus we write the following alternate definitions:

Show line numbers
 template<class E>
struct Expr {
Expr(E& e) : e_(e) {
}

double operator() (double d) {
return e_(d);
}

E e_;
};

Here Expr is a class that can encapsulate all expression objects, like the ones of type Constant or Variable below.

Show line numbers
 struct Constant {
Constant(double d) : d_(d) { }
Constant(int d) : d_(d) { }
double operator() (double) {
return d_;
}

double d_;
};


struct Variable {
double operator() (double d) {
return d;
}
};

Finally, a class to represent non-terminal, complex expressions:

Show line numbers
 template<class E1, class E2, class Op>
struct ComplexExpr {
ComplexExpr(Expr<E1> l, Expr<E2> r) : l_(l), r_(r) {
}

double operator() (double d) {
return Op::apply(l_(d), r_(d));
}

Expr<E1> l_;
Expr<E2> r_;
};

We have just carried over the definition of the ComplexExpression class from the previous article and made it into a template class. The operator classes like Add, Subtract, Multiply and Divide should continue to work unchanged. Guess what, we just have to define the operator overloads (for +, -, * and /) and we will be done with defining our framework of algebraic expression.

Now, we would want to write expressions such as:

Variable x;
cout << (x+3)(2); // prints 5
cout << ((x*x+3)*(x+3))(2); // prints 35

In the above, x is a Variable, 3 should generate a Constant, x + 3 should result in a ComplexExpr<Variable, Constant>. Further, x*x is a ComplexExpr<Variable, Variable> and (x*x+3)*(x+3) is a ComplexExpr<ComplexExpr< ComplexExpr<Variable, Variable>, Constant>, ComplexExpr<Variable, Constant> >. One can trace the template parameter types as sub-trees of the class template ComplexExpression.

Clearly, to write an expression like x+3, we need an overload of operator+ between a Variable and an integer. To write an expression like (x*x+3)*(x+3) - we need an operator* between two ComplexExpressions. To write something like, x+Constant(3), we need an overload of operator+ between Variable and Constant. Now, just as we defined the basic operators on Expression in the previous article, we could do the same on the Expr class template here, instead of overloading each operator for different combinations of types. For example:

Show line numbers
 template<class E1, class E2>
Expr<ComplexExpr<Expr<E1>, Expr<E2>, Add> > operator+ (E1 e1, E2 e2) {
typedef ComplexExpr<Expr<E1>, Expr<E2>, Add> ExprType;
return Expr<ExprType>( ExprType(Expr<E1>(e1), Expr<E2>(e2)) );
}

The other operators are not very difficult to write, following the above code. However, turns out that while these do take care of operators between complex expressions as well as Variables and Constants, they cannot handle real number or integer literals. To handle real number literals, we define the following overload pairs for each operator.

Show line numbers
 template<class E1>
Expr<ComplexExpr<Expr<E1>, Expr<Constant>, Multiply> > operator* (E1 e1, double d) {
typedef ComplexExpr<Expr<E1>, Expr<Constant>, Multiply> ExprType;
return Expr<ExprType>( ExprType(Expr<E1>(e1), Expr<Constant>(Constant(d))) );
}

template<class E1>
Expr<ComplexExpr<Expr<Constant>, Expr<E1>, Multiply> > operator* (double d, E1 e1) {
typedef ComplexExpr<Expr<Constant>, Expr<E1>, Multiply> ExprType;
return Expr<ExprType>( ExprType(Expr<Constant>(Constant(d)), Expr<E1>(e1)) );
}

These take care of all cases - except for a small glitch which we can pepare to live with. We can write such expressions as:

Variable x;
cout << (x+3.0)(2); // prints 5
cout << ((x*x+3.0)*(x+3.0))(2); // prints 35


but not ones like:

Variable x;
cout << (x+3)(2); // prints 5
cout << ((x*x+3)*(x+3))(2); // prints 35


If you have to be able to do this, add additional overloads like:

Show line numbers
 template<class E1>
Expr<ComplexExpr<Expr<E1>, Expr<Constant>, Multiply> > operator* (E1 e1, int d) {
typedef ComplexExpr<Expr<E1>, Expr<Constant>, Multiply> ExprType;
return Expr<ExprType>( ExprType(Expr<E1>(e1), Expr<Constant>(Constant(d))) );
}

template<class E1>
Expr<ComplexExpr<Expr<Constant>, Expr<E1>, Multiply> > operator* (int d, E1 e1) {
typedef ComplexExpr<Expr<Constant>, Expr<E1>, Multiply> ExprType;
return Expr<ExprType>( ExprType(Expr<Constant>(Constant(d)), Expr<E1>(e1)) );
}
As it should be already apparent - there are no objects allocated on free store, no reference counted proxy wrappers, and a fair bit of compile-time type computation and call dispatching - this results in significant savings in the runtime costs of the program, apart from the obvious cut in the number of lines of code. The basic philosophy of the template based program has not changed from the non-template version - it uses functional composition involving the function call operator, and operator overloading to support naturally combining simple expressions into arbitrarily complex expressions. However, the use of templates allows a fair bit of this work to be done at compile time. This is all that is there to Expression templates. You can put together the expression framework using code given here and use the following function to test your code.

Show line numbers
 int main()
{
Variable x;

cout << ((2.0*x*x + 3.0*x + 3.0)*(2.0*x*x + 3.0*x + 3.0))(2) << endl;
cout << integrate((2.0*x*x + 3.0*x + 3.0), 0, 1) << endl;
cout << integrate((2.0*x*x + 3.0*x + 3.0)*(2.0*x*x + 3.0*x + 3.0), 0, 1) << endl;
cout << integrate((x/(1.0+x)), 0, 1) << endl;
cout << integrate(x*x, 0, 7) << endl;

return 0;
}


If you have any trouble understanding and compiling the code, drop me a message.

References



  1. Todd Veldhuizen: Expression Templates
    http://ubiety.uwaterloo.ca/~tveldhui/papers/Expression-Templates/exprtmpl.html
  2. Expression Templates synopsis on More C++ Idioms
    http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Expression-template
  3. Angelika Langer: Expression Templates - Introduction
    http://www.angelikalanger.com/Articles/Cuj/ExpressionTemplates/ExpressionTemplates.htm

Read more!