rvalue references are one of the most powerful new features in C++11. But they're also, in my experience from teaching classes at work and reviewing code, one of the most difficult to grok.
I've come to think this isn't so much because rvalue references are particularly complex (by C++'s standards, anyway). Rather, much of the difficulty seems to stem from the fact that much of their workings have no syntax attached. If you don't know the implicit rules, rvalue references just look like magic.
In this post, I'm going to try to motivate and explain rvalue references using practical examples. My hope is that this will demystify both how they work, and why they were designed this way.
Note to language lawyers: I'm intentionally playing fast and loose with some parts of the standard here. "Practical" is the key word in the title. :)
A Troubling Inefficiency
Imagine you're writing a simple stack class, based on a singly-linked list.
template
class Stack {
private:
struct Node {
T elem;
Node* next;
};
Node* first;
public:
void push_front(const T& t) {
Node* n = new Node();
n->elem = t;
n->next = first;
first = n;
}
// pop_front, constructors omitted.
};
Now suppose you use your class as follows.
Stack s;
s.push_front(get_string());
Let's break down what this does:
get_string()
creates and returns a string. This involvesmalloc
'ing a buffer on the heap and writing some data into it.push_front()
copies the given string into a newly-allocatedNode
(n->elem = t;
). This involvesmalloc
'ing a buffer for the new string's data and memcpy'ing the original string's data into the new buffer.push_front()
returns, and we destroy the original string,free
'ing its buffer.
This is frustratingly inefficient. You and I know that the string returned by
get_string()
is going to go away right after the call to push_front
. Why
do we need to malloc
a new buffer and free
the old buffer, when we could
simply steal the old buffer for our own purposes? This would let push_front
avoid calling malloc
, free
, and memcpy
entirely, for a big potential
speedup.
Let's first look at how we'd solve this inefficiency without C++11.
Stack Hack
We want to let users of our class get this faster behavior, so before C++11, we might write a new member function.
void push_front_destructive(T& t) {
Node* n = new Node();
using std::swap;
swap(n->elem, t);
n->next = first;
first = n;
}
(This business with using std::swap
is an ADL trick; it lets us call a
specialized version of swap
if one exists for the type T
, otherwise it
calls std::swap
.)
This is great, it runs in O(1) time, and steals the pointer just like we want.
Now we try to use our stack like we did originally.
Stack s;
s.push_front_destructive(get_string()); // ERROR
But this is a compile error! The problem is that C++ does not let you bind a
temporary value (get_string()
) to a non-const reference (t
). One of the
creators of C++ explains that this decision was made to prevent
certain kinds of bugs. We might disagree, but whatever, it's the rule, and it
gets in our way here.
We can still make this work, but it's a bit ugly:
Stack s;
string str = get_string();
s.push_front_destructive(str);
This is OK because now we're asking the the non-const reference arg in
push_front_destructive
to bind to the non-temporary str
.
(You may have noticed that I haven't defined what a "temporary" actually is.
The exact definition is...complicated. For our purposes, it's a
sufficient model to say that something is a temporary if you can only use it
once. So for example, get_string()
is a temporary, because you can do
get_string().c_str()
or get_string().length()
or whatever, but after you do
that one thing, you can't go back to the original object and do a second thing.
In contrast, if I have a string x
, that's not a temporary because I can
touch x
multiple times.)
This is sort of the best we can do without C++11. It's not awful, but there
are a few things that could be improved. For one thing, having to assign
get_string
into a named variable is ugly. In addition, we have to remember
to call push_front_destructive
rather than the regular push_front
.
But from the perspective of C++'s design philosophy, the worst part may be that
we still do unnecessary work. It's not as much as before, but we have to
default-construct a T
before swapping into it, and we have to do both sides
of our swap, even though we only care about the final value of n->elem
. Both
of these steps can be expensive on arbitrary objects.
The Reference Formerly Known as &&
So here's where C++11 comes in to solve our problems. C++11 introduces a new reference type, called the rvalue reference. It is exactly like a non-const reference, except it's only allowed to refer to temporaries. (Recall that a regular non-const reference can refer only to non-temporaries.)
Again, to emphasize: An rvalue reference is basically the same as a regular (non-const) reference. The only difference is in the referent: A regular non-const reference always points to a non-temporary, while an rvalue reference always points to a temporary.
Using just rvalue references and nothing else new, we can add an overload of
push_front
that accepts only temporary objects. This overload can
destructively modify its inputs at will, because (the assumption is) they're
going away anyways.
void push_front(T&& t) {
Node n = new Node();
using std::swap;
swap(n->elem, t);
n->next = first;
first = n;
}
(An rvalue reference is spelled &&
. The double-ampersand is a single unit;
don't try to read it as "reference-to-a-reference".)
The body of our new function is exactly the same as push_front_destructive
,
because rvalue references behave exactly like regular references.
With this new overload, we can now write code that uses our stack like the following.
Stack s;
s.push_front(get_string()); // Swaps
string str = ...;
s.push_front(str); // Copies
We now have two overloads of push_front
: A version which takes a const
reference (which can bind to a temporary or a non-temporary) and doesn't modify
its input, and a version which takes an rvalue reference (which can only bind to
a temporary) and swap
s its input. The first call to push_front
passes a
temporary, so it can bind to either overload. The compiler picks the rvalue
ref one, because it's more specific. The second call to push_front
passes a
non-temporary, so it can only bind to our original function, which copies.
std::move
Consider again the most recent use of our stack class above. Suppose we wanted
to change it so we invoke the new rvalue ref overload for the second call to
push_front
, because we don't care about the value of str
after push_front
completes. C++11 adds a mechanism for us to do this: std::move
. It looks
like this.
string str = ...;
s.push_front(std::move(str)); // Swaps
All std::move
does is imbue its argument with temporaryness. I know,
it's confusing, because "move" is a verb; it sounds like it should do
something. But in fact it's just a cast, just like const_cast
or
reinterpret_cast
. Like those casts, std::move
just lets us use an object
in a different way; on its own, it doesn't do anything.
Now that we have rvalue references and std::move
, we've solved some of the
problems with our original push_front_destructive
. We no longer have to name
our temporaries and remember to call push_front_destructive
in order to get
the faster behavior. This is great! But our new overload of push_front
still
isn't as efficient as it could be. We're still default-constructing a T
, and
we're still swap
'ing, even though we only care about one side.
The solution? We'll apply this same technique of overloading functions that take const references, but this time to constructors.
A Movable Constructor
Consider std::string
. It's always had a copy constructor, which takes its
argument by const reference. Suppose we added a new constructor, which takes
its argument by rvalue reference. It might look like this.
class string {
public:
string(const string& other); // Copy constructor, exists pre C++11
string(string&& other) { // Move constructor, new in C++11
length = other.length;
capacity = other.capacity;
data = other.data;
other.data = nullptr;
}
private:
size_t length;
size_t capacity;
const char* data;
};
This new constructor is called a move constructor. Our implementation
takes a temporary string, steals its pointers, and copies its non-pointer
members. Note that we have to set other.data = nullptr
, otherwise when
other
is destroyed, it will free
its data
pointer, and our data
pointer
will then be stale. (We might also have to modify string
's destructor to
handle a null data
pointer.)
Now when we create a string, we can play the same overloading game we did with
push_front
. If we pass a non-temporary, we call string
's copy constructor
(which malloc
's a new buffer and so on), but if we pass a temporary, we call
the move constructor.
string a(get_string()); // move constructor
string b(a); // copy constructor
string c(std::move(b)); // move constructor
Note that if string
didn't have a move constructor, the line string
c(std::move(b))
would just call the copy constructor, because its const
string&
arg happily binds to temporaries. But everything in the C++ standard
library has been updated to have move constructors where appropriate.
The addition of move constructors is a particularly cool change to the standard
library. The line string a(get_string());
could have been written pre-C++11,
and it would have resulted in an O(n) copy. But now, just by compiling our
code with -std=c++11
, it's an O(1) move instead! Getting asymptotic speedups
in your code is is impressive for a backwards-compatible language change.
In what you're probably noticing is a pattern, this same trick can be applied
to the assignment operator, operator=
. Here's what that looks like:
class string {
public:
string& operator=(const string& other); // Copy assn operator, pre C++11
string& operator=(string&& other) { // Move assn operator, new in C++11
length = other.length;
capacity = other.capacity;
delete data; // OK even if data is null
data = other.data;
other.data = nullptr;
return *this;
}
};
string a, b;
a = get_string(); // Move assignment
a = b; // Copy assignment
a = std::move(b); // Move assignment
Just like the pre-C++11 compiler would try to auto-generate a copy constructor
and copy assignment operator for you, the C++11 compiler will try to
auto-generate a move constructor and move assignment operator for you. The
rules for when you get one of these are complicated, but the
main one is that if you define a custom copy constructor, the compiler won't
auto-generate a move constructor or move assignment operator for you. You can
use =default
if you want to override this behavior, and you can use =delete
if you want to explicitly prevent the compiler from auto-generating one of these
functions.
In most general-purpose code, you're likely going to interact with rvalue
references pretty much exclusively via move constructors and move operators.
Non-constructor functions like push_front
that take rvalue references are
relatively rare, and mainly used in collections, wrapper classes, and for
perfect forwarding in functions that wrap constructors, such as
std::make_shared
.
After all that explanation, I can finally show you how using move constructors
solves the remaining problems in our push_front
implementation.
Back to push_front
So let's come back to our push_front
implementation. To remind you, we
currently have
void push_front(T&& t) {
Node* n = new Node();
using std::swap;
swap(n->elem, t);
n->next = first;
first = n;
}
Our goals are
- replace the swap with a move (move is one sided, so is more efficient in general), and
- avoid default-constructing a
T
inside theNode
constructor (since, why bother).
(1) is easy, we can use the move-assignment operator, as follows.
void push_front(T&& t) {
Node* n = new Node();
n->elem = std::move(t);
n->next = first;
first = n;
}
The thing to note here is that we do need the std::move
, if we want to call
the move assignment operator, even though t
is an rvalue reference. Recall
that rvalue references are just like regular references; the only thing that's
different is that we know the referent is a temporary. In particular, t
itself is not a temporary!
A surprising consequence of this can be seen in the following code.
void foo(const string&) {}
void foo(string&& s) { foo(s); }
foo("bar");
This is not infinite recursion, because s
is not a temporary! An rvalue
reference merely points to a temporary; it is not a temporary itself. This is
weird, I know, but it would be weirder if it were the other way around.
(Exactly why is left as an exercise to the reader.)
Okay, we have (1). What about (2)? To avoid default-initializing Node::elem
,
we need to define and call a constructor on Node
that moves t
. It might
look like this.
struct Node {
Node(T t, Node* next) : elem(std::move(t)), next(next) {}
T elem;
Node* next;
};
void push_front(T&& t) {
first = new Node(std::move(t), first);
}
This still isn't perfect, as we actually move-construct a T
twice: Once inside
push_front
when we call the Node
constructor, and again within the Node
constructor. In general this isn't a big deal, and the compiler can likely
optimize it away. But if you really cared, you could modify Node's constructor
to take a T&&
. You would still need the std::move
!
(You couldn't avoid writing this constructor on Node
by using
aggregate initialization, as that always copies its elements. I have no
idea why that choice was made.)
And that's it, we're done! Easy, right? :)
std::unique_ptr
Well, we're not quite done. No discussion of rvalue references would be
complete without talking about std::unique_ptr
. unique_ptr
is a move-only
class: You can't copy one (because then the pointer it wraps would not be
"unique"). We say unique_ptr
"owns" a pointer, because when the unique_ptr
goes away, it delete
s its pointer.
An idiom you'll probably encounter with unique_ptr
(and other classes, even
ones that are not move-only) is passing unique_ptr
by value to indicate a
transfer of ownership.
void foo(unique_ptr bar);
unique_ptr my_bar;
foo(std::move(my_bar));
Here we give foo
our copy of my_bar
. foo
now has the only owning copy,
and can do whatever it wants with it (e.g. store it in a global variable
somewhere).
Your first question here might be, what's the value of my_bar
after we call
foo
? The answer depends on what unique_ptr
's move constructor does!
Hypothetically, it could do basically anything. It could leave my_bar
in a
state such that you get undefined behavior if you call any functions on it.
The only requirement is that we can safely run my_bar
's destructor.
Some classes are more constrained in their behavior after being moved from.
According to the spec, a unique_ptr
is null after it's moved from.
std::string
is in a "valid, but unspecified state", meaning, you can safely
call functions on the string, but who knows what you'll get.
Another common question I get is, if std::move
isn't where we move my_bar
,
where the heck does the move happen? My model is as follows. Before we start
running foo
, we execute a function prelude, which sets up foo
's arguments.
In particular, for each parameter P in foo
's declaration, we construct a new P
inside the scope of our call to foo
, using the argument passed to the call.
So in the case above, where foo
takes a unique_ptr
parameter by
value and we do foo(std::move(my_bar))
, it's as though foo
executes
unique_ptr
in its prelude. This is where the
move occurs. (Note that if foo
took its parameter by, say, rvalue reference,
then the same call to foo
would result in us doing unique_ptr
in the prelude, which is just binding a reference; it's
not a copy or a move.)
Something similar happens when you return a value: In the function epilogue, we
construct a new value in the calling scope using the value you're returning.
There's a wrinkle with C++11, however, which is that when you return foo
, if
foo
is a local variable or function argument, it is implicitly a temporary.
You don't need to return std::move(foo)
to get efficient move construction of
the return value.
In fact, you don't want to return std::move(foo)
, because (hilariously) that
prevents RVO. If your type is efficiently movable, this probably isn't a
big deal. But if your type is not movable, then you end up with a full-blown
copy! (RVO would have elided this copy, making it free.) clang 3.7's
-Wpessimizing-move
will warn you about this footgun. You do still need
std::move
if you're doing something like return std::move(foo.bar)
–
only arguments and local variables, not their members, are implicitly treated
as temporaries in return statements.
unique_ptr idioms
unique_ptr
is awesome because it lets you avoid a lot of bugs, and it lets you
make any class efficiently movable. And in fact since most of your code isn't
performance-sensitive, but presumably all of it is sensitive to memory leaks and
use-after-free bugs, I'd say that unique_ptr
is actually more important than
all this avoiding-copies stuff we've been talking about. But unique_ptr
is
so convenient, I sometimes see people using it everywhere, even when a plain
value type would do. I call this the "I can write Java in any programming
language" antipattern.
For example, instead of writing
void foo(unique_ptr>> v);
why not just write
void foo(vector> v);
? vector
is already efficiently movable, and in fact vector
is not copyable (because a vector is not copyable if its elements are not
copyable). So the unique_ptr
buys you nothing other than some extra typing
and maybe a cache miss.
Similarly, instead of writing
void foo(const unique_ptr& bar);
why not write
void foo(const Bar* bar);
The former is annoying because someone can only call your function if the data
is actually stored in a unique_ptr
somewhere. But maybe it's stored on the
stack, or maybe it's in a raw pointer for some reason. The former form is kind
of mean to your callers.
One last antipattern: Instead of writing
void foo(unique_ptr&& ptr);
consider simply
void foo(unique_ptr ptr);
Because unique_ptrstd::move
the argument to the second foo
, you know that the value
afterwards is null. Whereas with the first one, the value after foo
completes could be anything. I prefer the second form because its behavior is
more predictable.
You can extend this argument to types which are both movable and copyable; e.g.
if you're writing a function which needs to take ownership of a string, prefer
foo(string s)
to foo(string&& s)
. Both let you avoid a copy if you're
passing a temporary, but the former is a more flexible API, as it will happily
make a copy of a non-temporary. Of course, if you really want to prevent
copying, maybe taking the arg by rvalue ref is for you.
Parting words
If you remember nothing else, remember the following two rules. They are enough to recover most of the content of this post.
An rvalue reference is a reference
to a temporary.std::move
is a cast
that lets you treat its argument as a temporary.
The linebreaks here are significant: You can stop reading each of the rules
after the first line. Rvalue references are just like normal non-const
references (except they always point to temporaries, which regular non-const
references never do). std::move
is a cast, so does nothing by itself; it just
lets you treat something as a temporary.
Once you know about move constructors, it can be easy to tie yourself into knots while trying to avoid all unnecessary copies. Some people even try to avoid unnecessary moves. I'd say, hakuna matata, try not to worry about it. Premature optimization is the root of all evil; just write clean code and profile it if you care. It's been my experience that the rvalue rules often make the code you'd naturally write pretty fast, anyway.
If you want a more rigorous take on this material, I recommend Thomas Beckner's
article. In particular, he covers perfect forwarding and reference
collapsing, which are useful for library classes. Beckner also has a good
article on auto
and decltype
, if you're interested.
http://cppreference.com is also your friend. It's dense, but it has all the rules, and I've found learning to read that stuff to be well worth the effort.
Good luck, and happy coding!
Thanks to Andrew Hunter and Kyle Huey for many improvements to this post.