A Practical Introduction to C++11 Rvalue References

rvalue references are one of the most powerful new features in C++11. But they're also, in my experience from teaching classes at work and reviewing code, one of the most difficult to grok.

I've come to think this isn't so much because rvalue references are particularly complex (by C++'s standards, anyway). Rather, much of the difficulty seems to stem from the fact that much of their workings have no syntax attached. If you don't know the implicit rules, rvalue references just look like magic.

In this post, I'm going to try to motivate and explain rvalue references using practical examples. My hope is that this will demystify both how they work, and why they were designed this way.

Note to language lawyers: I'm intentionally playing fast and loose with some parts of the standard here. "Practical" is the key word in the title. :)

A Troubling Inefficiency

Imagine you're writing a simple stack class, based on a singly-linked list.

template
class Stack {
 private:
  struct Node {
    T elem;
    Node* next;
  };

  Node* first;

 public:
  void push_front(const T& t) {
    Node* n = new Node();
    n->elem = t;
    n->next = first;
    first = n;
  }

  // pop_front, constructors omitted.
};

Now suppose you use your class as follows.

Stack s;
s.push_front(get_string());

Let's break down what this does:

get_string() creates and returns a string. This involves malloc'ing a buffer on the heap and writing some data into it.
push_front() copies the given string into a newly-allocated Node (n->elem = t;). This involves malloc'ing a buffer for the new string's data and memcpy'ing the original string's data into the new buffer.
push_front() returns, and we destroy the original string, free'ing its buffer.

This is frustratingly inefficient. You and I know that the string returned by get_string() is going to go away right after the call to push_front. Why do we need to malloc a new buffer and free the old buffer, when we could simply steal the old buffer for our own purposes? This would let push_front avoid calling malloc, free, and memcpy entirely, for a big potential speedup.

Let's first look at how we'd solve this inefficiency without C++11.

Stack Hack

We want to let users of our class get this faster behavior, so before C++11, we might write a new member function.

void push_front_destructive(T& t) {
  Node* n = new Node();
  using std::swap;
  swap(n->elem, t);
  n->next = first;
  first = n;
}

(This business with using std::swap is an ADL trick; it lets us call a specialized version of swap if one exists for the type T, otherwise it calls std::swap.)

This is great, it runs in O(1) time, and steals the pointer just like we want.

Now we try to use our stack like we did originally.

Stack s;
s.push_front_destructive(get_string());  // ERROR

But this is a compile error! The problem is that C++ does not let you bind a temporary value (get_string()) to a non-const reference (t). One of the creators of C++ explains that this decision was made to prevent certain kinds of bugs. We might disagree, but whatever, it's the rule, and it gets in our way here.

We can still make this work, but it's a bit ugly:

Stack s;
string str = get_string();
s.push_front_destructive(str);

This is OK because now we're asking the the non-const reference arg in push_front_destructive to bind to the non-temporary str.

(You may have noticed that I haven't defined what a "temporary" actually is. The exact definition is...complicated. For our purposes, it's a sufficient model to say that something is a temporary if you can only use it once. So for example, get_string() is a temporary, because you can do get_string().c_str() or get_string().length() or whatever, but after you do that one thing, you can't go back to the original object and do a second thing. In contrast, if I have a string x, that's not a temporary because I can touch x multiple times.)

This is sort of the best we can do without C++11. It's not awful, but there are a few things that could be improved. For one thing, having to assign get_string into a named variable is ugly. In addition, we have to remember to call push_front_destructive rather than the regular push_front.

But from the perspective of C++'s design philosophy, the worst part may be that we still do unnecessary work. It's not as much as before, but we have to default-construct a T before swapping into it, and we have to do both sides of our swap, even though we only care about the final value of n->elem. Both of these steps can be expensive on arbitrary objects.

The Reference Formerly Known as &&

So here's where C++11 comes in to solve our problems. C++11 introduces a new reference type, called the rvalue reference. It is exactly like a non-const reference, except it's only allowed to refer to temporaries. (Recall that a regular non-const reference can refer only to non-temporaries.)

Again, to emphasize: An rvalue reference is basically the same as a regular (non-const) reference. The only difference is in the referent: A regular non-const reference always points to a non-temporary, while an rvalue reference always points to a temporary.

Using just rvalue references and nothing else new, we can add an overload of push_front that accepts only temporary objects. This overload can destructively modify its inputs at will, because (the assumption is) they're going away anyways.

void push_front(T&& t) {
  Node n = new Node();
  using std::swap;
  swap(n->elem, t);
  n->next = first;
  first = n;
}

(An rvalue reference is spelled &&. The double-ampersand is a single unit; don't try to read it as "reference-to-a-reference".)

The body of our new function is exactly the same as push_front_destructive, because rvalue references behave exactly like regular references.

With this new overload, we can now write code that uses our stack like the following.

Stack s;
s.push_front(get_string());  // Swaps
string str = ...;
s.push_front(str);           // Copies

We now have two overloads of push_front: A version which takes a const reference (which can bind to a temporary or a non-temporary) and doesn't modify its input, and a version which takes an rvalue reference (which can only bind to a temporary) and swaps its input. The first call to push_front passes a temporary, so it can bind to either overload. The compiler picks the rvalue ref one, because it's more specific. The second call to push_front passes a non-temporary, so it can only bind to our original function, which copies.

std::move

Consider again the most recent use of our stack class above. Suppose we wanted to change it so we invoke the new rvalue ref overload for the second call to push_front, because we don't care about the value of str after push_front completes. C++11 adds a mechanism for us to do this: std::move. It looks like this.

string str = ...;
s.push_front(std::move(str));  // Swaps

All std::move does is imbue its argument with temporaryness. I know, it's confusing, because "move" is a verb; it sounds like it should do something. But in fact it's just a cast, just like const_cast or reinterpret_cast. Like those casts, std::move just lets us use an object in a different way; on its own, it doesn't do anything.

Now that we have rvalue references and std::move, we've solved some of the problems with our original push_front_destructive. We no longer have to name our temporaries and remember to call push_front_destructive in order to get the faster behavior. This is great! But our new overload of push_front still isn't as efficient as it could be. We're still default-constructing a T, and we're still swap'ing, even though we only care about one side.

The solution? We'll apply this same technique of overloading functions that take const references, but this time to constructors.

A Movable Constructor

Consider std::string. It's always had a copy constructor, which takes its argument by const reference. Suppose we added a new constructor, which takes its argument by rvalue reference. It might look like this.

class string {
 public:
  string(const string& other);  // Copy constructor, exists pre C++11

  string(string&& other) {      // Move constructor, new in C++11
    length = other.length;
    capacity = other.capacity;
    data = other.data;
    other.data = nullptr;
  }

 private:
  size_t length;
  size_t capacity;
  const char* data;
};

This new constructor is called a move constructor. Our implementation takes a temporary string, steals its pointers, and copies its non-pointer members. Note that we have to set other.data = nullptr, otherwise when other is destroyed, it will free its data pointer, and our data pointer will then be stale. (We might also have to modify string's destructor to handle a null data pointer.)

Now when we create a string, we can play the same overloading game we did with push_front. If we pass a non-temporary, we call string's copy constructor (which malloc's a new buffer and so on), but if we pass a temporary, we call the move constructor.

string a(get_string());  // move constructor
string b(a);             // copy constructor
string c(std::move(b));  // move constructor

Note that if string didn't have a move constructor, the line string c(std::move(b)) would just call the copy constructor, because its const string& arg happily binds to temporaries. But everything in the C++ standard library has been updated to have move constructors where appropriate.

The addition of move constructors is a particularly cool change to the standard library. The line string a(get_string()); could have been written pre-C++11, and it would have resulted in an O(n) copy. But now, just by compiling our code with -std=c++11, it's an O(1) move instead! Getting asymptotic speedups in your code is is impressive for a backwards-compatible language change.

In what you're probably noticing is a pattern, this same trick can be applied to the assignment operator, operator=. Here's what that looks like:

class string {
 public:
  string& operator=(const string& other); // Copy assn operator, pre C++11

  string& operator=(string&& other) {     // Move assn operator, new in C++11
    length = other.length;
    capacity = other.capacity;
    delete data;  // OK even if data is null
    data = other.data;
    other.data = nullptr;
    return *this;
  }
};

string a, b;
a = get_string();  // Move assignment
a = b;             // Copy assignment
a = std::move(b);  // Move assignment

Just like the pre-C++11 compiler would try to auto-generate a copy constructor and copy assignment operator for you, the C++11 compiler will try to auto-generate a move constructor and move assignment operator for you. The rules for when you get one of these are complicated, but the main one is that if you define a custom copy constructor, the compiler won't auto-generate a move constructor or move assignment operator for you. You can use =default if you want to override this behavior, and you can use =delete if you want to explicitly prevent the compiler from auto-generating one of these functions.

In most general-purpose code, you're likely going to interact with rvalue references pretty much exclusively via move constructors and move operators. Non-constructor functions like push_front that take rvalue references are relatively rare, and mainly used in collections, wrapper classes, and for perfect forwarding in functions that wrap constructors, such as std::make_shared.

After all that explanation, I can finally show you how using move constructors solves the remaining problems in our push_front implementation.

Back to push_front

So let's come back to our push_front implementation. To remind you, we currently have

void push_front(T&& t) {
  Node* n = new Node();
  using std::swap;
  swap(n->elem, t);
  n->next = first;
  first = n;
}

Our goals are

replace the swap with a move (move is one sided, so is more efficient in general), and
avoid default-constructing a T inside the Node constructor (since, why bother).

(1) is easy, we can use the move-assignment operator, as follows.

void push_front(T&& t) {
  Node* n = new Node();
  n->elem = std::move(t);
  n->next = first;
  first = n;
}

The thing to note here is that we do need the std::move, if we want to call the move assignment operator, even though t is an rvalue reference. Recall that rvalue references are just like regular references; the only thing that's different is that we know the referent is a temporary. In particular, t itself is not a temporary!

A surprising consequence of this can be seen in the following code.

void foo(const string&) {}
void foo(string&& s) { foo(s); }

foo("bar");

This is not infinite recursion, because s is not a temporary! An rvalue reference merely points to a temporary; it is not a temporary itself. This is weird, I know, but it would be weirder if it were the other way around. (Exactly why is left as an exercise to the reader.)

Okay, we have (1). What about (2)? To avoid default-initializing Node::elem, we need to define and call a constructor on Node that moves t. It might look like this.

struct Node {
  Node(T t, Node* next) : elem(std::move(t)), next(next) {}
  T elem;
  Node* next;
};

void push_front(T&& t) {
  first = new Node(std::move(t), first);
}

This still isn't perfect, as we actually move-construct a T twice: Once inside push_front when we call the Node constructor, and again within the Node constructor. In general this isn't a big deal, and the compiler can likely optimize it away. But if you really cared, you could modify Node's constructor to take a T&&. You would still need the std::move!

(You couldn't avoid writing this constructor on Node by using aggregate initialization, as that always copies its elements. I have no idea why that choice was made.)

And that's it, we're done! Easy, right? :)

std::unique_ptr

Well, we're not quite done. No discussion of rvalue references would be complete without talking about std::unique_ptr. unique_ptr is a move-only class: You can't copy one (because then the pointer it wraps would not be "unique"). We say unique_ptr "owns" a pointer, because when the unique_ptr goes away, it deletes its pointer.

An idiom you'll probably encounter with unique_ptr (and other classes, even ones that are not move-only) is passing unique_ptr by value to indicate a transfer of ownership.

void foo(unique_ptr bar);

unique_ptr my_bar;
foo(std::move(my_bar));

Here we give foo our copy of my_bar. foo now has the only owning copy, and can do whatever it wants with it (e.g. store it in a global variable somewhere).

Your first question here might be, what's the value of my_bar after we call foo? The answer depends on what unique_ptr's move constructor does! Hypothetically, it could do basically anything. It could leave my_bar in a state such that you get undefined behavior if you call any functions on it. The only requirement is that we can safely run my_bar's destructor.

Some classes are more constrained in their behavior after being moved from. According to the spec, a unique_ptr is null after it's moved from. std::string is in a "valid, but unspecified state", meaning, you can safely call functions on the string, but who knows what you'll get.

Another common question I get is, if std::move isn't where we move my_bar, where the heck does the move happen? My model is as follows. Before we start running foo, we execute a function prelude, which sets up foo's arguments. In particular, for each parameter P in foo's declaration, we construct a new P inside the scope of our call to foo, using the argument passed to the call.

So in the case above, where foo takes a unique_ptr bar parameter by value and we do foo(std::move(my_bar)), it's as though foo executes unique_ptr bar(std::move(my_bar)) in its prelude. This is where the move occurs. (Note that if foo took its parameter by, say, rvalue reference, then the same call to foo would result in us doing unique_ptr&& bar(std::move(my_bar)) in the prelude, which is just binding a reference; it's not a copy or a move.)

Something similar happens when you return a value: In the function epilogue, we construct a new value in the calling scope using the value you're returning. There's a wrinkle with C++11, however, which is that when you return foo, if foo is a local variable or function argument, it is implicitly a temporary. You don't need to return std::move(foo) to get efficient move construction of the return value.

In fact, you don't want to return std::move(foo), because (hilariously) that prevents RVO. If your type is efficiently movable, this probably isn't a big deal. But if your type is not movable, then you end up with a full-blown copy! (RVO would have elided this copy, making it free.) clang 3.7's -Wpessimizing-move will warn you about this footgun. You do still need std::move if you're doing something like return std::move(foo.bar) – only arguments and local variables, not their members, are implicitly treated as temporaries in return statements.

unique_ptr idioms

unique_ptr is awesome because it lets you avoid a lot of bugs, and it lets you make any class efficiently movable. And in fact since most of your code isn't performance-sensitive, but presumably all of it is sensitive to memory leaks and use-after-free bugs, I'd say that unique_ptr is actually more important than all this avoiding-copies stuff we've been talking about. But unique_ptr is so convenient, I sometimes see people using it everywhere, even when a plain value type would do. I call this the "I can write Java in any programming language" antipattern.

For example, instead of writing

void foo(unique_ptr>> v);

why not just write

void foo(vector> v);

? vector is already efficiently movable, and in fact vector> is not copyable (because a vector is not copyable if its elements are not copyable). So the unique_ptr buys you nothing other than some extra typing and maybe a cache miss.

Similarly, instead of writing

void foo(const unique_ptr& bar);

why not write

void foo(const Bar* bar);

The former is annoying because someone can only call your function if the data is actually stored in a unique_ptr somewhere. But maybe it's stored on the stack, or maybe it's in a raw pointer for some reason. The former form is kind of mean to your callers.

One last antipattern: Instead of writing

void foo(unique_ptr&& ptr);

consider simply

void foo(unique_ptr ptr);

Because unique_ptr is move-only, these are almost the same from the perspective of a caller. (Why?) The main difference is that, if you std::move the argument to the second foo, you know that the value afterwards is null. Whereas with the first one, the value after foo completes could be anything. I prefer the second form because its behavior is more predictable.

You can extend this argument to types which are both movable and copyable; e.g. if you're writing a function which needs to take ownership of a string, prefer foo(string s) to foo(string&& s). Both let you avoid a copy if you're passing a temporary, but the former is a more flexible API, as it will happily make a copy of a non-temporary. Of course, if you really want to prevent copying, maybe taking the arg by rvalue ref is for you.

Parting words

If you remember nothing else, remember the following two rules. They are enough to recover most of the content of this post.

An rvalue reference is a reference
to a temporary.
std::move is a cast
that lets you treat its argument as a temporary.

The linebreaks here are significant: You can stop reading each of the rules after the first line. Rvalue references are just like normal non-const references (except they always point to temporaries, which regular non-const references never do). std::move is a cast, so does nothing by itself; it just lets you treat something as a temporary.

Once you know about move constructors, it can be easy to tie yourself into knots while trying to avoid all unnecessary copies. Some people even try to avoid unnecessary moves. I'd say, hakuna matata, try not to worry about it. Premature optimization is the root of all evil; just write clean code and profile it if you care. It's been my experience that the rvalue rules often make the code you'd naturally write pretty fast, anyway.

If you want a more rigorous take on this material, I recommend Thomas Beckner's article. In particular, he covers perfect forwarding and reference collapsing, which are useful for library classes. Beckner also has a good article on auto and decltype, if you're interested.

http://cppreference.com is also your friend. It's dense, but it has all the rules, and I've found learning to read that stuff to be well worth the effort.

Good luck, and happy coding!

Thanks to Andrew Hunter and Kyle Huey for many improvements to this post.

jlebar