Monday, February 02, 2009

Jargon Time: L-values, R-values and Temporaries

As promised here are a few fairly basic examples of C++ jargon, demystified, here in this article. We start looking at basics of C++ expressions: l-values, r-values and temporaries. The concepts are simple, but important, and would be used in later columns of this (Jargon) series.

L-values and R-values


When we write code, we express action and intent through well-formed expressions that conform to a broad syntax. Some of this action involves moving data around, some of it involves carrying out a more complex operation - and most involve both. For example:

Show line numbers
 double number = 0.0;
number = 2.0;
double square_root = ::sqrt(number);

The above code involves both moving data around, and carrying out some action. In the second line, the literal double 2.0 is assigned to the double variable number. In this context number is an l-value expression - because it allows modification of the value it holds, when used on the left hand side of an assignment expression.

The expression 2.0 on the other hand can only be used to assign values to expressions such as number - in other words it can only be used on the right hand side of assignment expressions, never on the left hand side. It can also be used in function return statements. Such expressions are called r-values.

At this point we have three small observations to make:
1. An r-value can never be used on the left hand side of an assignment operation.
2. An l-value in very many cases can be used on the right hand side of assignment operations also. In this case, it simply degenerates to the value it contains. For example:

Show line numbers
 double number = 0.0;
number = 2.0;
double anotherNumber = number;

In the above code snippet, on the third line, the expression number is used as an r-value and degenerates to the value contained in the l-value expression number.
Some l-values cannot be used as r-values. For example, in the above code snippet, the expression double anotherNumber is an l-value expression, but we cannot write code like:

double aThirdNumber = (double anotherNumber = 2.0);

Language rules do not allow this. So you have an example of an l-value expression that cannot degenerate to an r-value expression.
3. l-values can also be used in function return statements. However, whether it is treated as an l-value or simply degenerates to an r-value depends on the return type of the function. If a function returns a non-const reference or pointer to an object, the function call can be considered as an l-value expression. That's essentially because the return value of the function is an l-value. For example:

Show line numbers


template<int size>
struct CheckedIntArray {
int& operator[](int index) {
if (index >=0 && index < size) {
return array_[index];
}
throw IndexOutOfBoundsException; // some exception
}
private:
int array_[size];
};

In the above CheckedIntArray class, operator[](int) can actually be used as an l-value expression because it returns a reference to an element in the underlying array. This enables use to write code like this:
Show line numbers
 CheckedIntArray<16> my_array;
my_array[0] = 15;


In the above, the expression my_array[0] = 15; is equivalent to my_array.operator[](0) = 15;.

Many complex expressions are r-values. As well as a few simple ones:

Show line numbers
 int i = 0;
++i; // r-value expression

Arrays are r-values although individual elements in an array are not. Of course this does not apply to a pointer being used with an array syntax. Thus:

Show line numbers
 int arr[32] = {0};
int arr2[32] = {1};
arr[0] = 5; // arr[0] is an l-value
// the following is illegal
// arr = arr2; // arr is an r-value



Finally, let it be said that all expressions in C++ are either l-value or r-value expressions.

Temporaries


Related to the concept of r-values is the concept of temporaries. In fact temporaries are r-values (without all r-values being temporaries). Consider the following example:

Show line numbers
 int m = 4;
int n = 5 + 8/m;

Here the expression 5 + 8/m is a temporary. This is a relatively simple temporary - possibly one that would only exist in the registers of the CPU. However, it is possible, and quite common, to have temporaries on the stack. The important thing to understand is that temporaries are unnamed values, which are created in the context of an expression and whose life time is limited to the period of evaluation of that expression. Consider the following expression:

string str = string("Hola amigos!");

The right hand side expression creates a temporary string object, and it is then copied to a local variable called str. Once the control of the executing program reaches past the semi-colon terminating this line of code, the temporary object is gone. Only str, containing a copy of it, exists.

There is one exception to this rule and it deals with references. In the last expression, if instead of a string variable on the left, we had a string reference, things would be a little different:

const string& str = string("Hola amigos!");

First of all, if you see we've had to add a const to the reference. We could not have had a non-const reference to a temporary. This is always the case, as you can see below:

Show line numbers
 const int& r = 5;
const double& s = 2.0;

Since all temporaries are r-values, it is clear a non-const reference cannot refer to them. But, the exception that I referred to is in the life time of the temporary when a (const) reference refers to it. In this case, the temporary persists till the reference is in scope, and not just till the end of the statement that created the reference.

References are often created for function return values, although most optimizing compilers would eliminate the creation of these temporaries if the return value of the function was not assigned to any specific object. In general, reducing the number of temporaries that is created by a program is a good strategy for optimization, and to some extent, the compiler already does it.

Since all temporaries are r-values, it is clear a non-const reference cannot refer to them. But, the exception that I referred to is in the life time of the temporary when a (const) reference refers to it. In this case, the temporary persists till the reference is in scope, and not just till the end of the statement that created the reference.

References are often created for function return values, although most optimizing compilers would eliminate the creation of these temporaries if the return value of the function was not assigned to any specific object. In general, reducing the number of temporaries that is created by a program is a good strategy for optimization, and to some extent, the compiler already does it.

As a final example of how temporaries are generated, and where we can run into trouble with them if we are not careful, I present a piece of code I have seen written in several places (including products I have worked on).
Show line numbers
 using std::string;
using std::stringstream;
using std::cout;
using std::endl;

...

int x = 0;
double f = 1.6;
stringstream sout;
sout << "Some data values streamed: " << x << "|" << f;
const char *str = sout.str().c_str();
cout << str << endl; // this will likely print garbage

Can you spot the trouble with the above code. The trouble is that the member function std::string str() const of the std::stringstream class returns a temporary string. But in the expression sout.str().c_str(), we get a reference to the const char* pointer member of the returned temporary string and we copy it to the variable called str. As soon as this statement is executed, the temporary that was created as a result of the call to sout.str() is destroyed. But we still have a dangling pointer referring to its internal char * string, which is invalid for all good money. Needless to say, the last line above can even crash the program itself.

In the next edition of the Jargon column, we'll look at Namespace lookups and the Interface Principle. Keep watching, for more jargons demystified.

Read more!

Sunday, February 01, 2009

Parlez vous le C++?

Recently I was thinking back on my days of trying to learn C++ by reading the Usenet groups comp.lang.c++.moderated and comp.std.c++. If someone asks me the best source to learn C++ from, I would always refer to these two places (and perhaps the Boost mailing list even though it is a little more Boost-focussed). Note that I said the best source to "learn" C++ - not merely "read" it (for which here are the books).

I'll explain what I mean by that (and as usual, I'll get a bit philosophical before cutting to the chase). I read C++ on my own, almost every written word of it that I read, alone - not a soul around even to discuss, not a teacher around to cast an impression. And yet no learning is complete unless an impression of that knowledge has been cast on us - this is true for every field of learning - even learning to memorize the English alphabet. The reason must be that this impression is cast on us through multiple senses - sight and sound, if not more. On the other hand, reading a tome is one dimensional - sight. It is here that the Usenet discussion boards and other mailing lists step in. While you still only read, you read direct discussions, arguments, brain-storms, doubts, misgivings, biases - it is a lot more real than reading a chapter on C++ polymorphism. It is what eggs you on to introspect - identify with the views expressed, or refute them; in other words, your C++ perspective is built here and that is what I meant when I said "learning".

Usenet in particular provided this motivation - I have seen posts by Bjarne Stroustrup, Scott Meyers, Andrei Alexandrescu, Herb Sutter, Jim Coplien, Steve Dewhurst, Andrew König, you name one, I'd have read his posts. The debates would go on for days together, even weeks, in lists that grew deep and wide with time. I was a mere reader - my occasional two-pence in the middle of debates would be politely answered but I had not the knowledge or understanding of the language to make a serious impact.
Half the time, I groped through the standard or Stroustrup's book trying to figure out what is a "temporary" or where are "incomplete types" allowed, what is the meaning of "SFINAE" or what is the "Liskov Substitution Principle". You could be excused for wondering "Is this a programming language we are reading or a Theory of Banach algebras?". I certainly did - but it was part of C++'s charm (oh! geek) - for a wannabe-mathematician-turned-nobody, this was a nice feel-good aberration in a language I wanted to "speak". But coming back to the point - this meant I had to have a basic vocabulary ready, and a familiarity with a C++ alphabet soup before I could start communicating effectively in real C++ terms. C++ is tough and exacting, it requires discipline and knowledge - and it was not defined by a Sun or a Microsoft. I plan to put up a set of articles that would help build the awareness and concepts around the C++ jargons. Happy reading!
Read more!