Monday, February 02, 2009

Jargon Time: L-values, R-values and Temporaries

As promised here are a few fairly basic examples of C++ jargon, demystified, here in this article. We start looking at basics of C++ expressions: l-values, r-values and temporaries. The concepts are simple, but important, and would be used in later columns of this (Jargon) series.

L-values and R-values


When we write code, we express action and intent through well-formed expressions that conform to a broad syntax. Some of this action involves moving data around, some of it involves carrying out a more complex operation - and most involve both. For example:

Show line numbers
 double number = 0.0;
number = 2.0;
double square_root = ::sqrt(number);

The above code involves both moving data around, and carrying out some action. In the second line, the literal double 2.0 is assigned to the double variable number. In this context number is an l-value expression - because it allows modification of the value it holds, when used on the left hand side of an assignment expression.

The expression 2.0 on the other hand can only be used to assign values to expressions such as number - in other words it can only be used on the right hand side of assignment expressions, never on the left hand side. It can also be used in function return statements. Such expressions are called r-values.

At this point we have three small observations to make:
1. An r-value can never be used on the left hand side of an assignment operation.
2. An l-value in very many cases can be used on the right hand side of assignment operations also. In this case, it simply degenerates to the value it contains. For example:

Show line numbers
 double number = 0.0;
number = 2.0;
double anotherNumber = number;

In the above code snippet, on the third line, the expression number is used as an r-value and degenerates to the value contained in the l-value expression number.
Some l-values cannot be used as r-values. For example, in the above code snippet, the expression double anotherNumber is an l-value expression, but we cannot write code like:

double aThirdNumber = (double anotherNumber = 2.0);

Language rules do not allow this. So you have an example of an l-value expression that cannot degenerate to an r-value expression.
3. l-values can also be used in function return statements. However, whether it is treated as an l-value or simply degenerates to an r-value depends on the return type of the function. If a function returns a non-const reference or pointer to an object, the function call can be considered as an l-value expression. That's essentially because the return value of the function is an l-value. For example:

Show line numbers


template<int size>
struct CheckedIntArray {
int& operator[](int index) {
if (index >=0 && index < size) {
return array_[index];
}
throw IndexOutOfBoundsException; // some exception
}
private:
int array_[size];
};

In the above CheckedIntArray class, operator[](int) can actually be used as an l-value expression because it returns a reference to an element in the underlying array. This enables use to write code like this:
Show line numbers
 CheckedIntArray<16> my_array;
my_array[0] = 15;


In the above, the expression my_array[0] = 15; is equivalent to my_array.operator[](0) = 15;.

Many complex expressions are r-values. As well as a few simple ones:

Show line numbers
 int i = 0;
++i; // r-value expression

Arrays are r-values although individual elements in an array are not. Of course this does not apply to a pointer being used with an array syntax. Thus:

Show line numbers
 int arr[32] = {0};
int arr2[32] = {1};
arr[0] = 5; // arr[0] is an l-value
// the following is illegal
// arr = arr2; // arr is an r-value



Finally, let it be said that all expressions in C++ are either l-value or r-value expressions.

Temporaries


Related to the concept of r-values is the concept of temporaries. In fact temporaries are r-values (without all r-values being temporaries). Consider the following example:

Show line numbers
 int m = 4;
int n = 5 + 8/m;

Here the expression 5 + 8/m is a temporary. This is a relatively simple temporary - possibly one that would only exist in the registers of the CPU. However, it is possible, and quite common, to have temporaries on the stack. The important thing to understand is that temporaries are unnamed values, which are created in the context of an expression and whose life time is limited to the period of evaluation of that expression. Consider the following expression:

string str = string("Hola amigos!");

The right hand side expression creates a temporary string object, and it is then copied to a local variable called str. Once the control of the executing program reaches past the semi-colon terminating this line of code, the temporary object is gone. Only str, containing a copy of it, exists.

There is one exception to this rule and it deals with references. In the last expression, if instead of a string variable on the left, we had a string reference, things would be a little different:

const string& str = string("Hola amigos!");

First of all, if you see we've had to add a const to the reference. We could not have had a non-const reference to a temporary. This is always the case, as you can see below:

Show line numbers
 const int& r = 5;
const double& s = 2.0;

Since all temporaries are r-values, it is clear a non-const reference cannot refer to them. But, the exception that I referred to is in the life time of the temporary when a (const) reference refers to it. In this case, the temporary persists till the reference is in scope, and not just till the end of the statement that created the reference.

References are often created for function return values, although most optimizing compilers would eliminate the creation of these temporaries if the return value of the function was not assigned to any specific object. In general, reducing the number of temporaries that is created by a program is a good strategy for optimization, and to some extent, the compiler already does it.

Since all temporaries are r-values, it is clear a non-const reference cannot refer to them. But, the exception that I referred to is in the life time of the temporary when a (const) reference refers to it. In this case, the temporary persists till the reference is in scope, and not just till the end of the statement that created the reference.

References are often created for function return values, although most optimizing compilers would eliminate the creation of these temporaries if the return value of the function was not assigned to any specific object. In general, reducing the number of temporaries that is created by a program is a good strategy for optimization, and to some extent, the compiler already does it.

As a final example of how temporaries are generated, and where we can run into trouble with them if we are not careful, I present a piece of code I have seen written in several places (including products I have worked on).
Show line numbers
 using std::string;
using std::stringstream;
using std::cout;
using std::endl;

...

int x = 0;
double f = 1.6;
stringstream sout;
sout << "Some data values streamed: " << x << "|" << f;
const char *str = sout.str().c_str();
cout << str << endl; // this will likely print garbage

Can you spot the trouble with the above code. The trouble is that the member function std::string str() const of the std::stringstream class returns a temporary string. But in the expression sout.str().c_str(), we get a reference to the const char* pointer member of the returned temporary string and we copy it to the variable called str. As soon as this statement is executed, the temporary that was created as a result of the call to sout.str() is destroyed. But we still have a dangling pointer referring to its internal char * string, which is invalid for all good money. Needless to say, the last line above can even crash the program itself.

In the next edition of the Jargon column, we'll look at Namespace lookups and the Interface Principle. Keep watching, for more jargons demystified.

No comments: