A deep dive into SmallVector:push_back (maskray.me)

30 points by mariuz 2 days ago

someonebaggy an hour ago

This is a great demonstration of how compilers, good as they are, are a leaky abstraction, just like all abstravtions. And sometimes you need to peer below the abstraction.

im3w1l 3 hours ago

So one thing I thought of when reading this is that very often it is known ahead of time (at runtime or even compile-time) how many push_backs will be done. The programmer could make a reserve call but doesn't bother since the efficiency gain is minimal.

The gain is minimal for doing this optimization at one location. But doing it everywhere, that could matter. Pushing back in a loop could maybe be optimized to a single allocation and a memcopy.

tialaramex 3 hours ago

In C++ programmers are often taught not to use their reservation API for this purpose because it's designed in such a way that if you don't have perfect foresight you can destroy amortization and thus get much worse performance.

For example Bjarne Stroustrup suggests you should use reservation for "avoiding invalidation of iterators" instead.

oneshtein 2 hours ago

API should have reserve_at_least(n) and reserve_exactly(n), instead of one reserve(n), which works as reserve_exactly(n), but described as reserve_at_least(n).

tialaramex 43 minutes ago

im3w1l 23 minutes ago

Sorry I completely messed up in communicating my point: That it would be nice if the compiler would do it for us automatically in the many simple cases. That's what I tried to say, but didn't say, when I said it could matter fixing it everywhere.

dzaima 5 hours ago

The crappiness of shrink-wrapping in gcc and clang (but especially clang) annoys me a lot. It feels like there should be a quite decent amount of general performance to be gained from properly pushing more into slow paths (or, not necessarily even slow paths, but generally paths with high register pressure / uninlined function calls), never mind calling conventions in general.

On the push impl in the article - for non-x86 (and perhaps even on x86 for performance, though not size/instruction count) it'd be better to allow the size increment to reuse the size read done by the capacity check; with C++'s lack of suitable aliasing information, the interleaved memcpy/store prevents the compiler from deciding this itself.

RossBencina 4 hours ago

> prevents the compiler from deciding this itself.

Interesting. I understand why it does that, but it makes me realise that I usually think "the compiler will reuse the loaded value/perform CSE" without considering the cases where it won't. Are there tools that will detect and warn/indicate this situation? e.g. "warning: could not reuse previously loaded value of 'foo' due to aliasing hazard 'memcpy' at line 234."

dzaima 4 hours ago

Not that I know of; and such would necessarily have false-positives (...or, rather, entirely consist of potential false-positives) because you may actually want the re-read.