diff mbox

std::vector move assign patch

Message ID CAH6eHdSuwbuXzs0fmy4iJ6z+AUiFaKXNe1qhzzmNACnsdNKXhQ@mail.gmail.com
State New
Headers show

Commit Message

Jonathan Wakely Jan. 9, 2014, 6:39 p.m. UTC
On 9 January 2014 12:22, H.J. Lu wrote:
> On Fri, Dec 27, 2013 at 10:27 AM, François Dumont <frs.dumont@gmail.com> wrote:
>> Hi
>>
>>     Here is a patch to fix an issue in normal mode during the move
>> assignment. The destination vector allocator instance is moved too during
>> the assignment which is wrong.
>>
>>     As I discover this problem while working on issues with management of
>> safe iterators during move operations this patch also fix those issues in
>> the debug mode for the vector container. Fixes for other containers in debug
>> mode will come later.
>>
>> 2013-12-27  François Dumont <fdumont@gcc.gnu.org>
>>
>>     * include/bits/stl_vector.h (std::vector<>::_M_move_assign): Pass
>>     *this allocator instance when building temporary vector instance
>>     so that *this allocator do not get moved.
>>     * include/debug/safe_base.h
>>     (_Safe_sequence_base(_Safe_sequence_base&&)): New.
>>     * include/debug/vector (__gnu_debug::vector<>(vector&&)): Use
>>     latter.
>>     (__gnu_debug::vector<>(vector&&, const allocator_type&)): Swap
>>     safe iterators if the instance is moved.
>>     (__gnu_debug::vector<>::operator=(vector&&)): Likewise.
>>     * testsuite/23_containers/vector/allocator/move.cc (test01): Add
>>     check on a vector iterator.
>>     * testsuite/23_containers/vector/allocator/move_assign.cc
>>     (test02): Likewise.
>>     (test03): New, test with a non-propagating allocator.
>>     * testsuite/23_containers/vector/debug/move_assign_neg.cc: New.
>>
>> Tested under Linux x86_64 normal and debug modes.
>>
>> I will be in vacation for a week starting today so if you want to apply it
>> quickly do not hesitate to do it yourself.
>>
>
> This caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59738

Fixed by the attached patch, tested x86_64-linux and committed to
trunk.  I've also rotated the libstdc++ ChangeLog.


2014-01-09  Jonathan Wakely  <jwakely@redhat.com>

        PR libstdc++/59738
        * include/bits/stl_vector.h (vector<>::_M_move_assign): Restore
        support for non-Movable types.
commit c12a0d112781150c2888de7c63960e22ef4ffcbb
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Thu Jan 9 16:50:50 2014 +0000

    	PR libstdc++/59738
    	* include/bits/stl_vector.h (vector<>::_M_move_assign): Restore
    	support for non-Movable types.

Comments

Marc Glisse April 24, 2017, 8:10 p.m. UTC | #1
On Thu, 9 Jan 2014, Jonathan Wakely wrote:

> On 9 January 2014 12:22, H.J. Lu wrote:
>> On Fri, Dec 27, 2013 at 10:27 AM, François Dumont <frs.dumont@gmail.com> wrote:
>>> Hi
>>>
>>>     Here is a patch to fix an issue in normal mode during the move
>>> assignment. The destination vector allocator instance is moved too during
>>> the assignment which is wrong.
>>>
>>>     As I discover this problem while working on issues with management of
>>> safe iterators during move operations this patch also fix those issues in
>>> the debug mode for the vector container. Fixes for other containers in debug
>>> mode will come later.
>>>
>>> 2013-12-27  François Dumont <fdumont@gcc.gnu.org>
>>>
>>>     * include/bits/stl_vector.h (std::vector<>::_M_move_assign): Pass
>>>     *this allocator instance when building temporary vector instance
>>>     so that *this allocator do not get moved.
>>>     * include/debug/safe_base.h
>>>     (_Safe_sequence_base(_Safe_sequence_base&&)): New.
>>>     * include/debug/vector (__gnu_debug::vector<>(vector&&)): Use
>>>     latter.
>>>     (__gnu_debug::vector<>(vector&&, const allocator_type&)): Swap
>>>     safe iterators if the instance is moved.
>>>     (__gnu_debug::vector<>::operator=(vector&&)): Likewise.
>>>     * testsuite/23_containers/vector/allocator/move.cc (test01): Add
>>>     check on a vector iterator.
>>>     * testsuite/23_containers/vector/allocator/move_assign.cc
>>>     (test02): Likewise.
>>>     (test03): New, test with a non-propagating allocator.
>>>     * testsuite/23_containers/vector/debug/move_assign_neg.cc: New.
>>>
>>> Tested under Linux x86_64 normal and debug modes.
>>>
>>> I will be in vacation for a week starting today so if you want to apply it
>>> quickly do not hesitate to do it yourself.
>>>
>>
>> This caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59738
>
> Fixed by the attached patch, tested x86_64-linux and committed to
> trunk.  I've also rotated the libstdc++ ChangeLog.
>
>
> 2014-01-09  Jonathan Wakely  <jwakely@redhat.com>
>
>        PR libstdc++/59738
>        * include/bits/stl_vector.h (vector<>::_M_move_assign): Restore
>        support for non-Movable types.

It seems that this patch had 2 consequences that may or may not have been 
planned. Consider this example (from PR64601)

#include <vector>
typedef std::vector<int> V;
void f(V&v,V&w){ V(std::move(w)).swap(v); }
void g(V&v,V&w){ v=std::move(w); }

1) We generate shorter code for f than for g, probably since the fix for 
PR59738. g ends up zeroing v, copying w to v, and finally zeroing w, and 
for weird reasons (and because we swap the members one by one) the 
standard prevents us from assuming that v and w do not overlap in weird 
ways so we cannot optimize as much as one might expect.

2) g(v,v) seems to turn v into a nice empty vector, while f(v,v) turns it 
into an invalid vector pointing at released memory.

Since 2) is a nice side-effect, it may not be worth rewriting operator= in 
a way that improves 1) but loses 2). Anyway, just mentioning this here.
Jonathan Wakely April 25, 2017, 12:52 p.m. UTC | #2
On 24/04/17 22:10 +0200, Marc Glisse wrote:
>It seems that this patch had 2 consequences that may or may not have 
>been planned. Consider this example (from PR64601)
>
>#include <vector>
>typedef std::vector<int> V;
>void f(V&v,V&w){ V(std::move(w)).swap(v); }
>void g(V&v,V&w){ v=std::move(w); }
>
>1) We generate shorter code for f than for g, probably since the fix 
>for PR59738. g ends up zeroing v, copying w to v, and finally zeroing 
>w, and for weird reasons (and because we swap the members one by one) 
>the standard prevents us from assuming that v and w do not overlap in 
>weird ways so we cannot optimize as much as one might expect.

f has an additional precondition (that the allocators of the vectors
being swapped must propagate on swap or be equal) and so the swap code
doesn't have to worry about non-equal allocators.

g has to be able to cope with the case where the allocator doesn't
propagate and isn't equal, and so is more complicated.

However, the propagation trait is known at compile-time, and for the
common case so is the equality condition, so it's unfortunate if that
can't be simplified (I'm sure you've analysed it carefully already
though!)


>2) g(v,v) seems to turn v into a nice empty vector,

Yes.

>while f(v,v) turns 
>it into an invalid vector pointing at released memory.

Does it?! I don't see that happening, and it's a bug if it does.

>Since 2) is a nice side-effect, it may not be worth rewriting 
>operator= in a way that improves 1) but loses 2). Anyway, just 
>mentioning this here.
Marc Glisse April 25, 2017, 3:23 p.m. UTC | #3
On Tue, 25 Apr 2017, Jonathan Wakely wrote:

> On 24/04/17 22:10 +0200, Marc Glisse wrote:
>> It seems that this patch had 2 consequences that may or may not have been 
>> planned. Consider this example (from PR64601)
>> 
>> #include <vector>
>> typedef std::vector<int> V;
>> void f(V&v,V&w){ V(std::move(w)).swap(v); }
>> void g(V&v,V&w){ v=std::move(w); }
>> 
>> 1) We generate shorter code for f than for g, probably since the fix for 
>> PR59738. g ends up zeroing v, copying w to v, and finally zeroing w, and 
>> for weird reasons (and because we swap the members one by one) the standard 
>> prevents us from assuming that v and w do not overlap in weird ways so we 
>> cannot optimize as much as one might expect.
>
> f has an additional precondition (that the allocators of the vectors
> being swapped must propagate on swap or be equal) and so the swap code
> doesn't have to worry about non-equal allocators.
>
> g has to be able to cope with the case where the allocator doesn't
> propagate and isn't equal, and so is more complicated.
>
> However, the propagation trait is known at compile-time, and for the
> common case so is the equality condition, so it's unfortunate if that
> can't be simplified (I'm sure you've analysed it carefully already
> though!)

The code isn't horrible. With f, we get:

         movq    (%rsi), %r8
         movq    8(%rsi), %rcx
         movq    $0, (%rsi)
         movq    $0, 8(%rsi)
         movq    16(%rsi), %rdx
         movq    $0, 16(%rsi)
         movq    (%rdi), %rax
         movq    %rcx, 8(%rdi)
         movq    %r8, (%rdi)
         movq    %rdx, 16(%rdi)
         testq   %rax, %rax

which seems quite optimal: read each pointer from w, write them to v, 
write 0s in w, that's 9 memory operations, +1 to read the pointer from w 
and possibly call delete on it.

With g:

         movq    $0, 8(%rdi)
         movq    (%rdi), %rax
         movq    $0, 16(%rdi)
         movq    $0, (%rdi)
         movq    (%rsi), %rdx
         movq    %rdx, (%rdi)
         movq    8(%rsi), %rcx
         movq    $0, (%rsi)
         movq    8(%rdi), %rdx
         movq    %rcx, 8(%rdi)
         movq    16(%rsi), %rcx
         movq    %rdx, 8(%rsi)
         movq    16(%rdi), %rdx
         movq    %rcx, 16(%rdi)
         movq    %rdx, 16(%rsi)
         testq   %rax, %rax

That's only 5 more memory operations. If I tweak vector swapping to avoid 
calling swap on each member (which drops type-based aliasing information, 
that was the topic of PR64601)

         void _M_swap_data(_Vector_impl& __x) _GLIBCXX_NOEXCEPT
         {
           pointer tmp;
#define MARC(x,y) tmp=x; x=y; y=tmp
           MARC(_M_start, __x._M_start);
           MARC(_M_finish, __x._M_finish);
           MARC(_M_end_of_storage, __x._M_end_of_storage);
         }

this gets down to 13, which is kind of sensible
* 0 the elements of v -> 3 ops
* read the elements of w -> 3 ops
* write them to v -> 3 ops
* 0 the elements of w -> 3 ops
(+1 to get the pointer that we might call delete on)

The first step of zeroing the elements of v is redundant
* if v and w don't alias, we are going to overwrite those 0s in step 3 
without ever reading them
* if v and w are the same, we are going to write those 0s in step 4 anyway

but that's hard for the optimizers to notice.

I didn't try hard to find a nice C++ way to get an equivalent of g that 
generates the optimal number of operations, but it would be a little ugly 
to write in operator=
this->_M_impl._M_finish = x._M_impl._M_finish; x._M_impl._M_finish = 0;
same for _M_end_of_storage and _M_start, and remembering to use the 
original this->_M_impl._M_start for delete.

>> 2) g(v,v) seems to turn v into a nice empty vector,
>
> Yes.
>
>> while f(v,v) turns it into an invalid vector pointing at released memory.
>
> Does it?! I don't see that happening, and it's a bug if it does.

Er, my fault, you are right. It shuffles things around and amounts to a 
NOP, v remains as it was before the call to f. Which could be considered 
less desirable than clearing the vector by some, but not as much as 
getting something invalid of course :-)
Jonathan Wakely April 25, 2017, 3:35 p.m. UTC | #4
On 25/04/17 17:23 +0200, Marc Glisse wrote:
>On Tue, 25 Apr 2017, Jonathan Wakely wrote:
>
>>On 24/04/17 22:10 +0200, Marc Glisse wrote:
>>>It seems that this patch had 2 consequences that may or may not 
>>>have been planned. Consider this example (from PR64601)
>>>
>>>#include <vector>
>>>typedef std::vector<int> V;
>>>void f(V&v,V&w){ V(std::move(w)).swap(v); }
>>>void g(V&v,V&w){ v=std::move(w); }
>>>
>>>1) We generate shorter code for f than for g, probably since the 
>>>fix for PR59738. g ends up zeroing v, copying w to v, and finally 
>>>zeroing w, and for weird reasons (and because we swap the members 
>>>one by one) the standard prevents us from assuming that v and w do 
>>>not overlap in weird ways so we cannot optimize as much as one 
>>>might expect.
>>
>>f has an additional precondition (that the allocators of the vectors
>>being swapped must propagate on swap or be equal) and so the swap code
>>doesn't have to worry about non-equal allocators.
>>
>>g has to be able to cope with the case where the allocator doesn't
>>propagate and isn't equal, and so is more complicated.
>>
>>However, the propagation trait is known at compile-time, and for the
>>common case so is the equality condition, so it's unfortunate if that
>>can't be simplified (I'm sure you've analysed it carefully already
>>though!)
>
>The code isn't horrible. With f, we get:
>
>        movq    (%rsi), %r8
>        movq    8(%rsi), %rcx
>        movq    $0, (%rsi)
>        movq    $0, 8(%rsi)
>        movq    16(%rsi), %rdx
>        movq    $0, 16(%rsi)
>        movq    (%rdi), %rax
>        movq    %rcx, 8(%rdi)
>        movq    %r8, (%rdi)
>        movq    %rdx, 16(%rdi)
>        testq   %rax, %rax
>
>which seems quite optimal: read each pointer from w, write them to v, 
>write 0s in w, that's 9 memory operations, +1 to read the pointer from 
>w and possibly call delete on it.
>
>With g:
>
>        movq    $0, 8(%rdi)
>        movq    (%rdi), %rax
>        movq    $0, 16(%rdi)
>        movq    $0, (%rdi)
>        movq    (%rsi), %rdx
>        movq    %rdx, (%rdi)
>        movq    8(%rsi), %rcx
>        movq    $0, (%rsi)
>        movq    8(%rdi), %rdx
>        movq    %rcx, 8(%rdi)
>        movq    16(%rsi), %rcx
>        movq    %rdx, 8(%rsi)
>        movq    16(%rdi), %rdx
>        movq    %rcx, 16(%rdi)
>        movq    %rdx, 16(%rsi)
>        testq   %rax, %rax
>
>That's only 5 more memory operations. If I tweak vector swapping to 
>avoid calling swap on each member (which drops type-based aliasing 
>information, that was the topic of PR64601)

I didn't really understand the discussion in the PR. I find that's
true of most TBAA discussions.

std::swap(T& x, T& y) is hard to optimise because we don't know that
the dynamic type of the thing at &x is the same type as T?


>        void _M_swap_data(_Vector_impl& __x) _GLIBCXX_NOEXCEPT
>        {
>          pointer tmp;
>#define MARC(x,y) tmp=x; x=y; y=tmp
>          MARC(_M_start, __x._M_start);
>          MARC(_M_finish, __x._M_finish);
>          MARC(_M_end_of_storage, __x._M_end_of_storage);
>        }
>
>this gets down to 13, which is kind of sensible
>* 0 the elements of v -> 3 ops
>* read the elements of w -> 3 ops
>* write them to v -> 3 ops
>* 0 the elements of w -> 3 ops
>(+1 to get the pointer that we might call delete on)
>
>The first step of zeroing the elements of v is redundant
>* if v and w don't alias, we are going to overwrite those 0s in step 3 
>without ever reading them
>* if v and w are the same, we are going to write those 0s in step 4 anyway
>
>but that's hard for the optimizers to notice.
>
>I didn't try hard to find a nice C++ way to get an equivalent of g 
>that generates the optimal number of operations, but it would be a 
>little ugly to write in operator=
>this->_M_impl._M_finish = x._M_impl._M_finish; x._M_impl._M_finish = 0;
>same for _M_end_of_storage and _M_start, and remembering to use the 
>original this->_M_impl._M_start for delete.

I'm not opposed to writing it out by hand. Operations on std::vector
should be as fast as possible, and move-assignment should be cheap.

>>>2) g(v,v) seems to turn v into a nice empty vector,
>>
>>Yes.
>>
>>>while f(v,v) turns it into an invalid vector pointing at released memory.
>>
>>Does it?! I don't see that happening, and it's a bug if it does.
>
>Er, my fault, you are right. It shuffles things around and amounts to 
>a NOP, v remains as it was before the call to f. Which could be 
>considered less desirable than clearing the vector by some, but not as 
>much as getting something invalid of course :-)

Phew :-)
Marc Glisse April 25, 2017, 4:12 p.m. UTC | #5
On Tue, 25 Apr 2017, Jonathan Wakely wrote:

> On 25/04/17 17:23 +0200, Marc Glisse wrote:
>> On Tue, 25 Apr 2017, Jonathan Wakely wrote:
>> 
>>> On 24/04/17 22:10 +0200, Marc Glisse wrote:
>>>> It seems that this patch had 2 consequences that may or may not have been 
>>>> planned. Consider this example (from PR64601)
>>>> 
>>>> #include <vector>
>>>> typedef std::vector<int> V;
>>>> void f(V&v,V&w){ V(std::move(w)).swap(v); }
>>>> void g(V&v,V&w){ v=std::move(w); }
>>>> 
>>>> 1) We generate shorter code for f than for g, probably since the fix for 
>>>> PR59738. g ends up zeroing v, copying w to v, and finally zeroing w, and 
>>>> for weird reasons (and because we swap the members one by one) the 
>>>> standard prevents us from assuming that v and w do not overlap in weird 
>>>> ways so we cannot optimize as much as one might expect.
>>> 
>>> f has an additional precondition (that the allocators of the vectors
>>> being swapped must propagate on swap or be equal) and so the swap code
>>> doesn't have to worry about non-equal allocators.
>>> 
>>> g has to be able to cope with the case where the allocator doesn't
>>> propagate and isn't equal, and so is more complicated.
>>> 
>>> However, the propagation trait is known at compile-time, and for the
>>> common case so is the equality condition, so it's unfortunate if that
>>> can't be simplified (I'm sure you've analysed it carefully already
>>> though!)
>> 
>> The code isn't horrible. With f, we get:
>>
>>        movq    (%rsi), %r8
>>        movq    8(%rsi), %rcx
>>        movq    $0, (%rsi)
>>        movq    $0, 8(%rsi)
>>        movq    16(%rsi), %rdx
>>        movq    $0, 16(%rsi)
>>        movq    (%rdi), %rax
>>        movq    %rcx, 8(%rdi)
>>        movq    %r8, (%rdi)
>>        movq    %rdx, 16(%rdi)
>>        testq   %rax, %rax
>> 
>> which seems quite optimal: read each pointer from w, write them to v, write 
>> 0s in w, that's 9 memory operations, +1 to read the pointer from w and 
>> possibly call delete on it.

Of course I should have mentioned that writing 0s is done between the read 
and the other write, which differentiates it from g.

>> With g:
>>
>>        movq    $0, 8(%rdi)
>>        movq    (%rdi), %rax
>>        movq    $0, 16(%rdi)
>>        movq    $0, (%rdi)
>>        movq    (%rsi), %rdx
>>        movq    %rdx, (%rdi)
>>        movq    8(%rsi), %rcx
>>        movq    $0, (%rsi)
>>        movq    8(%rdi), %rdx
>>        movq    %rcx, 8(%rdi)
>>        movq    16(%rsi), %rcx
>>        movq    %rdx, 8(%rsi)
>>        movq    16(%rdi), %rdx
>>        movq    %rcx, 16(%rdi)
>>        movq    %rdx, 16(%rsi)
>>        testq   %rax, %rax
>> 
>> That's only 5 more memory operations. If I tweak vector swapping to avoid 
>> calling swap on each member (which drops type-based aliasing information, 
>> that was the topic of PR64601)
>
> I didn't really understand the discussion in the PR. I find that's
> true of most TBAA discussions.
>
> std::swap(T& x, T& y) is hard to optimise because we don't know that
> the dynamic type of the thing at &x is the same type as T?

std::swap<int*> only knows that it is dealing with pointers to integers, 
so _M_start from one vector might be in the same location as _M_finish 
from some other vector...

When I access __x._M_start directly, I am accessing some part of a 
_Vector_impl, and 2 _Vector_impl can only be the same or disjoint, they 
cannot partially overlap.

>>        void _M_swap_data(_Vector_impl& __x) _GLIBCXX_NOEXCEPT
>>        {
>>          pointer tmp;
>> #define MARC(x,y) tmp=x; x=y; y=tmp
>>          MARC(_M_start, __x._M_start);
>>          MARC(_M_finish, __x._M_finish);
>>          MARC(_M_end_of_storage, __x._M_end_of_storage);
>>        }
>> 
>> this gets down to 13, which is kind of sensible
>> * 0 the elements of v -> 3 ops
>> * read the elements of w -> 3 ops
>> * write them to v -> 3 ops
>> * 0 the elements of w -> 3 ops
>> (+1 to get the pointer that we might call delete on)
>> 
>> The first step of zeroing the elements of v is redundant
>> * if v and w don't alias, we are going to overwrite those 0s in step 3 
>> without ever reading them
>> * if v and w are the same, we are going to write those 0s in step 4 anyway
>> 
>> but that's hard for the optimizers to notice.
>> 
>> I didn't try hard to find a nice C++ way to get an equivalent of g that 
>> generates the optimal number of operations, but it would be a little ugly 
>> to write in operator=
>> this->_M_impl._M_finish = x._M_impl._M_finish; x._M_impl._M_finish = 0;
>> same for _M_end_of_storage and _M_start, and remembering to use the 
>> original this->_M_impl._M_start for delete.
>
> I'm not opposed to writing it out by hand. Operations on std::vector
> should be as fast as possible, and move-assignment should be cheap.

I am not sure it would make a noticable difference. I might do it at some 
point in the future...

By the way, do we have a policy on writing if(p)delete p; vs directly 
delete p; which does nothing for p==0? There are cases where the extra 
shortcut is a nice optimization, others where it is redundant work. At 
-Os, the middle-end could optimize if(p)free(p) to free(p) in the future.
Jonathan Wakely April 25, 2017, 4:54 p.m. UTC | #6
On 25/04/17 18:12 +0200, Marc Glisse wrote:
>On Tue, 25 Apr 2017, Jonathan Wakely wrote:
>
>>On 25/04/17 17:23 +0200, Marc Glisse wrote:
>>>On Tue, 25 Apr 2017, Jonathan Wakely wrote:
>>>
>>>>On 24/04/17 22:10 +0200, Marc Glisse wrote:
>>>>>It seems that this patch had 2 consequences that may or may 
>>>>>not have been planned. Consider this example (from PR64601)
>>>>>
>>>>>#include <vector>
>>>>>typedef std::vector<int> V;
>>>>>void f(V&v,V&w){ V(std::move(w)).swap(v); }
>>>>>void g(V&v,V&w){ v=std::move(w); }
>>>>>
>>>>>1) We generate shorter code for f than for g, probably since 
>>>>>the fix for PR59738. g ends up zeroing v, copying w to v, and 
>>>>>finally zeroing w, and for weird reasons (and because we swap 
>>>>>the members one by one) the standard prevents us from assuming 
>>>>>that v and w do not overlap in weird ways so we cannot 
>>>>>optimize as much as one might expect.
>>>>
>>>>f has an additional precondition (that the allocators of the vectors
>>>>being swapped must propagate on swap or be equal) and so the swap code
>>>>doesn't have to worry about non-equal allocators.
>>>>
>>>>g has to be able to cope with the case where the allocator doesn't
>>>>propagate and isn't equal, and so is more complicated.
>>>>
>>>>However, the propagation trait is known at compile-time, and for the
>>>>common case so is the equality condition, so it's unfortunate if that
>>>>can't be simplified (I'm sure you've analysed it carefully already
>>>>though!)
>>>
>>>The code isn't horrible. With f, we get:
>>>
>>>       movq    (%rsi), %r8
>>>       movq    8(%rsi), %rcx
>>>       movq    $0, (%rsi)
>>>       movq    $0, 8(%rsi)
>>>       movq    16(%rsi), %rdx
>>>       movq    $0, 16(%rsi)
>>>       movq    (%rdi), %rax
>>>       movq    %rcx, 8(%rdi)
>>>       movq    %r8, (%rdi)
>>>       movq    %rdx, 16(%rdi)
>>>       testq   %rax, %rax
>>>
>>>which seems quite optimal: read each pointer from w, write them to 
>>>v, write 0s in w, that's 9 memory operations, +1 to read the 
>>>pointer from w and possibly call delete on it.
>
>Of course I should have mentioned that writing 0s is done between the 
>read and the other write, which differentiates it from g.
>
>>>With g:
>>>
>>>       movq    $0, 8(%rdi)
>>>       movq    (%rdi), %rax
>>>       movq    $0, 16(%rdi)
>>>       movq    $0, (%rdi)
>>>       movq    (%rsi), %rdx
>>>       movq    %rdx, (%rdi)
>>>       movq    8(%rsi), %rcx
>>>       movq    $0, (%rsi)
>>>       movq    8(%rdi), %rdx
>>>       movq    %rcx, 8(%rdi)
>>>       movq    16(%rsi), %rcx
>>>       movq    %rdx, 8(%rsi)
>>>       movq    16(%rdi), %rdx
>>>       movq    %rcx, 16(%rdi)
>>>       movq    %rdx, 16(%rsi)
>>>       testq   %rax, %rax
>>>
>>>That's only 5 more memory operations. If I tweak vector swapping 
>>>to avoid calling swap on each member (which drops type-based 
>>>aliasing information, that was the topic of PR64601)
>>
>>I didn't really understand the discussion in the PR. I find that's
>>true of most TBAA discussions.
>>
>>std::swap(T& x, T& y) is hard to optimise because we don't know that
>>the dynamic type of the thing at &x is the same type as T?
>
>std::swap<int*> only knows that it is dealing with pointers to 
>integers, so _M_start from one vector might be in the same location as 
>_M_finish from some other vector...
>
>When I access __x._M_start directly, I am accessing some part of a 
>_Vector_impl, and 2 _Vector_impl can only be the same or disjoint, 
>they cannot partially overlap.

I see. I'm surprised that after the std::swap calls get inlined, so
that the references can be "seen through", the pessimisation is still
present.

>>>       void _M_swap_data(_Vector_impl& __x) _GLIBCXX_NOEXCEPT
>>>       {
>>>         pointer tmp;
>>>#define MARC(x,y) tmp=x; x=y; y=tmp
>>>         MARC(_M_start, __x._M_start);
>>>         MARC(_M_finish, __x._M_finish);
>>>         MARC(_M_end_of_storage, __x._M_end_of_storage);
>>>       }
>>>
>>>this gets down to 13, which is kind of sensible
>>>* 0 the elements of v -> 3 ops
>>>* read the elements of w -> 3 ops
>>>* write them to v -> 3 ops
>>>* 0 the elements of w -> 3 ops
>>>(+1 to get the pointer that we might call delete on)
>>>
>>>The first step of zeroing the elements of v is redundant
>>>* if v and w don't alias, we are going to overwrite those 0s in 
>>>step 3 without ever reading them
>>>* if v and w are the same, we are going to write those 0s in step 4 anyway
>>>
>>>but that's hard for the optimizers to notice.
>>>
>>>I didn't try hard to find a nice C++ way to get an equivalent of g 
>>>that generates the optimal number of operations, but it would be a 
>>>little ugly to write in operator=
>>>this->_M_impl._M_finish = x._M_impl._M_finish; x._M_impl._M_finish = 0;
>>>same for _M_end_of_storage and _M_start, and remembering to use 
>>>the original this->_M_impl._M_start for delete.
>>
>>I'm not opposed to writing it out by hand. Operations on std::vector
>>should be as fast as possible, and move-assignment should be cheap.
>
>I am not sure it would make a noticable difference. I might do it at 
>some point in the future...
>
>By the way, do we have a policy on writing if(p)delete p; vs directly 
>delete p; which does nothing for p==0? There are cases where the extra 
>shortcut is a nice optimization, others where it is redundant work. At 
>-Os, the middle-end could optimize if(p)free(p) to free(p) in the 
>future.

No official policy. I always consider the if (p) to be redundant
noise, but that's based on readability and correctness, not
performance.
Marc Glisse April 25, 2017, 9:02 p.m. UTC | #7
On Tue, 25 Apr 2017, Jonathan Wakely wrote:

>>>> That's only 5 more memory operations. If I tweak vector swapping to avoid 
>>>> calling swap on each member (which drops type-based aliasing information, 
>>>> that was the topic of PR64601)
>>> 
>>> I didn't really understand the discussion in the PR. I find that's
>>> true of most TBAA discussions.
>>> 
>>> std::swap(T& x, T& y) is hard to optimise because we don't know that
>>> the dynamic type of the thing at &x is the same type as T?
>> 
>> std::swap<int*> only knows that it is dealing with pointers to integers, so 
>> _M_start from one vector might be in the same location as _M_finish from 
>> some other vector...
>> 
>> When I access __x._M_start directly, I am accessing some part of a 
>> _Vector_impl, and 2 _Vector_impl can only be the same or disjoint, they 
>> cannot partially overlap.
>
> I see. I'm surprised that after the std::swap calls get inlined, so
> that the references can be "seen through", the pessimisation is still
> present.

(this is my understanding of how compilers handle TBAA)

The only things that carry information are the declaration of an object, 
and accesses (read/write). Pointers and references are meaningless until 
the object is accessed, they are just a fancy way to do pointer 
arithmetic, and the way the object is accessed carries the whole 
information. If I have a reference p to std::pair<int,int>, p.first=42 or 
x=p.first access the first element of a pair, while *&p.first=42 or 
x=*&p.first are only accesses to an int. p.first and q.second cannot 
alias, while *&p.first and *&q.second can.

In other words
illegal: ((std::pair<int,int>*)&p.second)->first=2
allowed: *(int*)&((std::pair<int,int>*)&p.second)->first=2
because there is no actual access through the "wrong" pointers (I added a 
redundant cast to int* to try and clarify the difference).

(actually, gcc wrongly optimizes *&x to x so you need to use an 
intermediate variable to notice this: auto y=&x; ... *y ... while clang is 
stricter)

I think this is horrible but that's not a fight I feel capable of leading.

If what I said is too hard to understand, it is fine to ignore. And if it 
is wrong, I'll be happy to learn that :-)

>> By the way, do we have a policy on writing if(p)delete p; vs directly 
>> delete p; which does nothing for p==0? There are cases where the extra 
>> shortcut is a nice optimization, others where it is redundant work. At -Os, 
>> the middle-end could optimize if(p)free(p) to free(p) in the future.
>
> No official policy. I always consider the if (p) to be redundant
> noise, but that's based on readability and correctness, not
> performance.

I was going to say we have that in std::vector's destructor, but 
allocators do not have the same property as free/delete, deallocate can 
only be called with something allocate returned, not 0. So we don't have a 
choice.
diff mbox

Patch

diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h
index 3638a8c..2cedd39 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -1433,7 +1433,8 @@  _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
       void
       _M_move_assign(vector&& __x, std::true_type) noexcept
       {
-	const vector __tmp(std::move(*this), get_allocator());
+	vector __tmp(get_allocator());
+	this->_M_impl._M_swap_data(__tmp._M_impl);
 	this->_M_impl._M_swap_data(__x._M_impl);
 	if (_Alloc_traits::_S_propagate_on_move_assign())
 	  std::__alloc_on_move(_M_get_Tp_allocator(),