C++ Move Semantics Considered Harmful (Rust Is Better)

gumby · on Nov 22, 2022

The author's examples are contrived.

First, they do gratuitous allocation (e.g. `char* foo = strdup ("Hi");` followed by explaining that they have to allocate again because of their use of *foo. No need for the first copy and nobody would write the code that way.

Second they seem not to realize you can allocate straight into your destination memory (i.e. 'emplace'). So again, they are complaining about something nobody would ever do.

Then they make assertions about runtime behavior of C++ code that could easily be disproven by simply looking at what the compiler does! The C++ compiler can perform optimizations Rust cannot (perhaps can in the future) such as allocating directly into the caller's stack frame so that the "move" never happens at runtime at all!

I stopped reading at this point. It's a shame, as the author also refers to legitimate issues with C++'s memory model (e.g. a reference can point to invalid memory in certain multithreaded code) which are partially mitigated by Rust's memory model.

Rust has its strengths and weaknesses vs C++, but this article doesn't appear to really get to them.

ninkendo · on Nov 22, 2022

The gratuitous allocation is what would need to take place if C++ didn’t have move semantics. The whole point of that part of the article is to set the background for why C++11 move semantics were added, and what problem they solved (mainly that if C++ only allows copy and destruct, there’s no opportunity to avoid extra allocations/deletions when doing something like sorting a vector<string>.)

Also “emplace” was added to C++11 and was only possible due to the new move semantics (aka rvalue references.)

Anyhow, after explaining the background, the article goes on to critique some of the more annoying aspects of C++11’s move semantics, namely that the destructor is still called on the moved-from object (and thus you have to arrange for it to happen gracefully in your move-constructor’s logic), and that C++ lets you continue using a moved value and the compiler won’t/can’t complain to you, e.g.

    // given
    void foo(std::string &&s);

    auto s = std::string("Hello");
    foo(std::move(s));
    std::count << s << std::endl; // U.B.. But the compiler won’t complain!

I have to say, the “considered harmful” monicker may actually be warranted here. When I was reading about move semantics back when it was still called C++0x, I was actually shocked to learn that there’s nothing the compiler could do to stop you from just continuing to use a reference that was moved from. That could have been one of the best benefits of the new semantics, and they just swept it under the rug and called it “undefined behavior”.

tomjakubowski · on Nov 22, 2022

Using a moved-from std::string value, or one of any standard library type, isn't undefined behavior.

ninkendo · on Nov 22, 2022

(/me does some quick googling)

I guess it’s in an “unspecified” state after move, which I suppose is different from undefined. Still doesn’t take away from my main point which is that C++ has no problem allowing you to use a reference that has been moved from (which I’d wager is a mistake in very nearly 100% of cases.)

AnimalMuppet · on Nov 22, 2022

Isn't that the difference between move and copy? Copy means the original is still valid; move means it isn't.

ninkendo · on Nov 22, 2022

Well, that’s just the rub: It’s up to the implementor of a given type to decide what happens to an object that’s been moved-from. They could do something silly like make a `print(std::string&&)` function which doesn’t actually do anything to the passed-in string, just prints it, but requires callers to `std::move` the arguments, even if it’s not actually moving anything.

To C++, there’s no such thing as “moving” per se, there’s just “rvalue references”, and a standard library macro called “std::move” which is really just a static_cast to an rvalue reference. The compiler doesn’t mind if you continue to use something after casting it via std::move. It’s all rather disappointing, really (although I can understand why they did it… backwards compatibility is important in the C++ world.)

AnimalMuppet · on Nov 22, 2022

> the destructor is still called on the moved-from object

Wait, what? If I have a class with a pointer to some allocated memory, then the constructor should allocate the memory, the destructor should free it, copy should allocate an equal-sized amount of memory, and move should just give the destination the pointer (meaning that the destination now owns the allocated block, the original does not). Calling the destructor on the original means that the destination's block of memory is freed (unless you null the origin's pointer as part of the move).

I've never actually written a move that I recall, but this behavior is very surprising to me.

ninkendo · on Nov 22, 2022

Yup, that’s why the move constructor for (e.g.) std::string has to call something like `source.buffer = nullptr;`… to prevent the source’s destructor from freeing the storage that was moved. The article covers this point pretty well.

AnimalMuppet · on Nov 22, 2022

pcwalton · on Nov 22, 2022

> It's a shame, as the author also refers to legitimate issues with C++'s memory model (e.g. a reference can point to invalid memory in certain multithreaded code) which are partially mitigated by Rust's memory model.

This issue is not partially mitigated by Rust's memory model; it is fully solved, because Rust is memory safe.

thecodedmessage · on Nov 23, 2022

Author here!

My "example" is showing an equivalent to what the C++ compiler actually outputs, inlining the `std::string` abstraction, given the input code, which in this context was C++03, and before emplace.

But emplace doesn't save you here! Emplace doesn't save on heap allocations over a `strdup`, if you're emplacing something from a string literal like this.

Doing:

``` vec.emplace_back("here's a string"); ```

Does indeed do an allocation for the string. It has to, because every `std::string` must manage a heap allocation (modulo the small string optimization which is a nitpick). The string literal is in the `text` section of the binary and can't be the backing for the `std::string` object, which must be in read-write memory and on the heap.

So the `strdup` is correct as an equivalent to what the C++ compiler would output with `std::string` inlined for the example code it covered.

Also, the freeing of the empty object with move semantics is NOT always optimized out, not even close. It can't be, not when those calls can cross a library boundary or be arbitrarily wonky.

You write as if C++ has one optimizer. Rust shares an optimizer with one of the most popular C++ compilers. Implying Rust is somehow "behind" C++ on optimizations is just ill-informed.

eklitzke · on Nov 22, 2022

I strongly agree. This post is SO MANY WORDS basically to complain about the fact that the state of a moved-from object is partially unspecified (i.e. the object has to be in a valid state and destructible, but what that state is is not necessarily guaranteed by the language).

Obviously this behavior is a little weird when you first encounter it. A lot of C++ developers find it weird as well, and I'm sure a lot of C++ developers have made mistakes around this at one time or another. However, it's not nearly as bad as the article makes out.

For one thing, clangd/clang-tidy will emit warnings if you use a moved-from object in an unsafe way, e.g.:

  some_vector.emplace_back(std::move(foo));
  foo.whatever();  // <-- clang-tidy will warn about this

Using these kinds of static analysis tools is pretty much essential in any professional C++ role. Could you argue that it would be better if this was a language feature instead of something that you need a static analyzer for? Of course, but in practice everyone is using these tools anyway so it's not a big deal.

Additionally the C++ semantics are sometimes really useful. As a simplified example, I have some code at work that looks something like this (simplified here, but this shows the general idea):

  struct CounterData {
    // Threads call this method very frequently, say thousands of times per second
    void AddCounter(std::string_view key, int val) {
      absl::MutexLock lk(&mut);  // ALMOST NEVER CONTENDED
      counters[key] += val;
    }
  
    absl::flat_hash_map<std::string, int> counters;
    absl::Mutex mut;
  };
  
  // Thread-local instance of counter_data
  static thread_local CounterData counter_data;
  
  // Main thread calls this infrequently, say once a second
  absl::flat_hash_map<std::string, int> AggregateCounters() {
    std::vector<absl::flat_hash_map<std::string, int>> all_counters;
    // Assume it's possible to enumerate threads and iterate over thread local counters,
    // this is not a difficult abstraction to build.
    all_counters.reserve(num_threads);
    for (CounterData &data : all_thread_local_counters) {
      absl::MutexLock lk(&data.mut);
      all_counters.emplace_back(std::move(data.counters));
    }
    absl::flat_hash_map<std::string, int> aggregate;
    for (const auto &map : all_counters) {
      for (const auto &[k, v] : map) {
        aggregate[k] += v;
      }
    }
    return aggregate;
  }

Again, this is a simplified example. But the point of this code is that it minimizes the time spent in the critical section where each per-thread CounterData is locked by the main thread. Normally calls to AddCounter() will be uncontended and the lock acquisition will be nearly free. The only time the lock is contended is if an AddCounter() call happens at the same time as a call by the main thread to AggregateCounters(). However, AggregateCounters() holds these locks for really short periods of time, because it can acquire the lock, move the absl::flat_hash_map, and then release the lock. The move just updates a few pointers and afterwards the map is in a default initialized state, so it's perfectly OK for the threads to call AddCounter() immediately afterwards, even though the object is in a moved-from state. And AggregateCounters() does all the expensive work (iteration and summing) without needing to hold any locks.

You might object: how do you know the state of an absl::flat_hash_map after moving it? It's a reasonable thing to ask but in practice it's really easy to understand. Pretty much anyone writing a custom move constructor is going to do something reasonable, like move pointers to the new object and probably reset the original pointers to nullptr. Since the object needs to be in a valid state after move, this means that things like integer length/capacity fields and whatnot will also need to be reset to zero afterwards, since if they weren't the object wouldn't be in a usable state. And if you have any questions about this, you just look at the code. But really 99.99% of the time the author is going to write their move constructor in a sane way where you can guess the behavior from a few basic principles.

The original post also specifically complains that std::string is weird because if the string is large enough that it allocates then the string will probably be in an empty state after the call to std::move, and if the string is small enough that SSO applies it will probably be in the same state afterwards, and the standard doesn't guarantee anything. Fair enough, but this just means you need to assume that you don't know the string state after a move, and you can't read from a moved-from string. Again, clang-tidy and clangd will warn you if you try to use a string this way.

simplotek · on Nov 22, 2022

> Using these kinds of static analysis tools is pretty much essential in any professional C++ role. Could you argue that it would be better if this was a language feature instead of something that you need a static analyzer for?

Isn't Rust's shtick basically a language built around a static code analyzer? It would be weird if Rust fanatics complained about static code analysis in other languages.

thecodedmessage · on Nov 23, 2022

Rust fans might complain that the static code analysis is not built in, and therefore working a "C++ job" will expose you to code that has not been properly statically analyzed, or not with an analyzer up to your personal standards. Large projects can easily sink to the lowest common denominator of usage that the programming language allows.

But in general I think most Rust fans would agree that C++ with static analysis is better than C++ without static analysis.

C++ then still has the move semantics issue of being surprising and confusing, which I think is more than an "at first" issue -- and I think actually most programmers at most projects are relative novices (or outsiders) in the programming language they're working in. If you're rusty in C++, if you're junior, if you're just stepping in from another PL to look at some C++, this is an increase on cognitive load, even if the static analyzer will catch problems.

pornel · on Nov 22, 2022

This article explains very well what's the issue and how C++ ended up with the design it has (spoiler: incrementally, with backwards compatibility constraints).

Simplifying it to single ownership with moves that don't leave "unspecified state" copies behind makes so much sense. It's easier to reason about, it lets the compiler catch accidental use-after-move, and avoids redundant object state and dtor calls.

melagonster · on Nov 22, 2022

when I read first book of C++, author always say "in old time we did it in terrible way but now we have new features solve it!"

then I found everyone say the feature is terrible. most of famous C++ software was written in old version.

ahartmetz · on Nov 23, 2022

I (who writes mostly C++) noticed the pattern. I find that "this changes EVERYTHING" statements are usually overblown. Reasons are legacy code and sometimes the new thing being sufficiently verbose or just special case that I'd rather avoid using it. Range for (also "new") is std::copy_if's worst enemy...

kvark · on Nov 22, 2022

Every time, every time I wrestle with C++ compiler on the move constructors/semantics, I open this post[1] by Herb Sutter saying that C++ moves are simple and misunderstood. It's humiliating to read. Finally, somebody dissected a section by section and explained in a reasonable language that there is nothing simple about C++ moves.

TL:DR of the article: C++ moves are bad because they effectively force nullability on everything, and they provide no guarantees about the state of the objects.

[1] https://herbsutter.com/2020/02/17/move-simply/

thecodedmessage · on Nov 23, 2022

Thanks so much! That article was the original (anti-)inspiration for this post. It is so condescending: "Moves are so simple if you pretend you don't understand what they're for and why they're preferable to copies, and also ignore the name."

synergy20 · on Nov 23, 2022

"Rust is the best, Rust is better, every other language sucks, let's rewrite everything in Rust"

Come on, Rust has been here 16 years and its market share is as good as Ada.

These evangelists are the major reason in addition to its stdlib is not dynamic-link friendly that pushed me away from learning Rust.

thecodedmessage · on Nov 23, 2022

Author here!

C++ has serious problems. Linus Torvalds agrees. Bryan Cantrill agrees. Many, many experienced systems programmers agree. Rust solves many of the problems with C++, as a systems programming language. That is the scope of my argument.

I am not really sure in the modern software development world what problem dynamic linking of stdlib solves, especially since so much of stdlib would be monomorphized and/or inlined. C++ with any level of templates is far more unfriendly to dynamic linking -- and again, that is the only language of comparison in scope for this article.

synergy20 · on Nov 23, 2022

Ada solved all those problems decades ago, it did not fly.

C++ ran on rockets to the space, Rust has not yet. I don't even know any major projects or products are done with Rust for the last 16 years.

"There are only two kinds of languages: the ones people complain about and the ones nobody uses".

thecodedmessage · on Nov 23, 2022

Ada and Rust aren’t really comparable, and more importantly Ada and C++ aren’t really comparable. The key feature of Rust isn’t that it’s safe — many programming languages are, after all. The key feature of Rust is that it manages to have an explicit safe subset while maintaining C++’s goals of sophisticated zero-cost abstraction.

estebank · on Nov 25, 2022

Interesting that you call it available for 16 years when 1.0 came out in 2015 and looked little like pre-1.0 Rust, and I would argue that no 1.0 language is usually successful until later releases. I would also say that Rust 1.0 is nowhere close to where it got to in 2018, let alone today. Python didn't become prevalent until 2.4 and Java didn't take off until at least 1.1.1.

Elsewhere you claim nothing is written of note is written in Rust, and I would be interested in knowing what you base this on. Either we have different definitions on what makes a project significant, or you just haven't seen them.

I do not know what level of market share you think would be needed for Rust to have been worth it, but given it was originally aimed at the ill-defined but somewhat niche "systems programming" space, and now it is being used up and down different layers of the tech stack I would call it a success of adoption.

cesarb · on Nov 23, 2022

> As a result, Rust has a mechanism called “pinning” that indicates, in the type system, that a particular value will never move again, which can be used to implement self-referential values and which is used in async. The details are beyond the scope of this blog post, [...]

What this otherwise excellent blog post neglects to mention is that "pinning" didn't exist in Rust as of its 1.0 release; it was only added much later, and IMO it's still very complex and hard to understand (which also explains why the author of this blog post glossed over it). It's a workaround for the fact that Rust moves are "just memcpy()" and therefore do not call any custom per-type code (which could be used to fixup any references after the move).

thecodedmessage · on Nov 23, 2022

Thanks for the compliment!

I wanted to go more into pinning, but this post was already extremely long (and many people have already complained about how long it is). So everything you said is very on-point, just covered by "the details are beyond the scope of this blog post." Would you mind putting this extra information as a comment on the article itself, so other readers in the future can have access?

cesarb · on Nov 23, 2022

Unfortunately, by "very complex and hard to understand" I also implicitly meant "[...] and I don't understand it myself well enough to explain it to someone else", sorry. I don't know whether it's because I haven't found a good enough explanation of that magic yet, or whether it's because it's something on the level of Haskell monads.

jokoon · on Nov 22, 2022

Please watch the presentation of cpp2 by herb sutter.

It's interesting to see how difficult C++ is to teach.

I really really want cpp2 to succeed.

boppo1 · on Nov 22, 2022

I'm a "new"[0] programmer learning C++ from Stroustrup's Principles, mainly because I'm interested in graphics. I see stuff on HN and other tech boards about how terrible it is to work with C++ and how memory management will clothesline all novice and intermediate programmers. However, the Stroustrup book makes it seem like it's pretty manageable. The issues I can imagine are from other people writing poorly documented code and APIs, which, in my limited experience, is a ubiquitous problem. Can you elaborate on what makes C++ so difficult to teach/learn/use?

[0] I've spent years messing around with python, JS, swift, and various frameworks. However that's been at a pretty naive level, understanding only variables, functions, and some pandas basics, but never any real OOP implementation or DS&A stuff.

simplotek · on Nov 22, 2022

> I see stuff on HN and other tech boards about how terrible it is to work with C++ and how memory management will clothesline all novice and intermediate programmers. However, the Stroustrup book makes it seem like it's pretty manageable.

Plenty of people around here desperately want to vindicate their personal bets regarding other languages and framworks, and consequently you see a multitude of poorly thought-through strawmen put up to try to denigrate the other choice and, by process if elimination, leave their personal choice as the one true choice.

It's tiring. This obsession some Rust fanatics have regarding C++ drags on for over a decade, and it never changes. If they think it's so great, why waste so much time trying to drag down pseudo-rivals?

jokoon · on Nov 23, 2022

Because the language is very very large. In inherits from C, which means it sometimes shares a few problems.

The video I'm talking about shows that it's difficult to right "good modern C++" to avoid pitfalls.

Of course I think it's much better than JS or swift. The immense strength of C++ is that it is an industrial language, it's a standard in the industry, and it has an ISO standard. You can do everything with it, and there a lot of existing code you can use, especially since you can also use existing C code and libraries.

zozbot234 · on Nov 23, 2022

> However, the Stroustrup book makes it seem like it's pretty manageable.

It's manageable for trivial examples such as you'd find in a learn-to-code book, but memory management is a huge issue for larger programs. The C++ Core Guidelines are mostly intended as a way of managing these issues in a somewhat general way (while still not going full Rust) and they're extremely complex.

technoooooost · on Nov 22, 2022

C++ gets much hate on hn for whatever reason. Modern c++ is easy to learn, intuitive and a joy to work with.

thecodedmessage · on Nov 23, 2022

I'm happy you're happy, but I don't think that's most people's experience. Perhaps compared to pre-modern C++.

mkoubaa · on Nov 22, 2022

I admit I didn't read the article, the title is off putting

thecodedmessage · on Nov 23, 2022

I back it up, I promise. I don't think it's overstated, except in that perhaps "considered harmful" is ... considered so harmful that there's no appropriate use anymore. But C++ move semantics are quite problematic, and I explain why.