diff mbox

Improve performance of list::reverse

Message ID CAFTKwZ-PPOX=Vw0u7Cabe86TcWBcSZAwnyMr_vhhB4zVzdiX4w@mail.gmail.com
State New
Headers show

Commit Message

Elliot Goodrich Oct. 9, 2016, 3:23 p.m. UTC
Hi,

If we unroll the loop so that we iterate both forwards and backwards,
we can take advantage of memory-level parallelism when chasing
pointers. This means that reverse takes 35% less time when nodes are
randomly scattered in memory and about the same time if nodes are
contiguous.

Further, as our node pointers will never alias, we can interleave the
swaps of the next and previous pointers to remove further data
dependencies. This takes another 5% off the time when nodes are
scattered in memory and takes 20% off when nodes are contiguous.

All in all we save 20%-40% depending on the memory layout.

For future improvement, by passing whether there is an odd or even
number of nodes in the list we can hoist one of the ifs out of the
loop and gain another 5-10% but most likely this is only possible when
_GLIBCXX_USE_CXX11_ABI is defined and size() is O(1). This would bring
the saving to 30%-45%. Is it worth writing a new overload of
_M_reverse which takes the size of the list?

Tested on x86_64-linux-gnu.

Thanks!
commit 073c569818764c5b2e80cc09e89721f3386e1728
Author: Elliot Goodrich <elliotgoodrich@gmail.com>
Date:   Sun Oct 9 14:34:59 2016 +0100

    2016-10-09  Elliot Goodrich  <elliotgoodrich@gmail.com>
    
        * libstdc++-v3/src/c++98/list.cc (list::reverse): Speedup reverse method.
        * libstdc++-v3/testsuite/23_containers/list/operations/reverse/1.cc: Add test.

Comments

Jonathan Wakely Oct. 10, 2016, 4:12 p.m. UTC | #1
On 09/10/16 16:23 +0100, Elliot Goodrich wrote:
>Hi,
>
>If we unroll the loop so that we iterate both forwards and backwards,
>we can take advantage of memory-level parallelism when chasing
>pointers. This means that reverse takes 35% less time when nodes are
>randomly scattered in memory and about the same time if nodes are
>contiguous.
>
>Further, as our node pointers will never alias, we can interleave the
>swaps of the next and previous pointers to remove further data
>dependencies. This takes another 5% off the time when nodes are
>scattered in memory and takes 20% off when nodes are contiguous.
>
>All in all we save 20%-40% depending on the memory layout.

Nice, thanks for the patch.

Do you have (or are you willing to sign) a copyright assignment for
GCC?

See https://gcc.gnu.org/contribute.html#legal for details.

>For future improvement, by passing whether there is an odd or even
>number of nodes in the list we can hoist one of the ifs out of the
>loop and gain another 5-10% but most likely this is only possible when
>_GLIBCXX_USE_CXX11_ABI is defined and size() is O(1). This would bring
>the saving to 30%-45%. Is it worth writing a new overload of
>_M_reverse which takes the size of the list?

That certainly seems worthwhile. Do we need an overload or can it just
be done with #if? It seems to me we'd either want to use the size, or
not use it, we wouldn't want both versions defined at once. That
suggests #if to me.
Elliot Goodrich Oct. 10, 2016, 10:45 p.m. UTC | #2
I haven't yet but I will try and sort it out tomorrow.

If we're replacing the current method with one that takes a size
parameter when _GLIBCXX_USE_CXX11_ABI is defined, is this going to
cause any issues with ABI compatibility? If not, then I agree that we
should go with the #if version.

On 10 October 2016 at 17:12, Jonathan Wakely <jwakely@redhat.com> wrote:
> On 09/10/16 16:23 +0100, Elliot Goodrich wrote:
>>
>> Hi,
>>
>> If we unroll the loop so that we iterate both forwards and backwards,
>> we can take advantage of memory-level parallelism when chasing
>> pointers. This means that reverse takes 35% less time when nodes are
>> randomly scattered in memory and about the same time if nodes are
>> contiguous.
>>
>> Further, as our node pointers will never alias, we can interleave the
>> swaps of the next and previous pointers to remove further data
>> dependencies. This takes another 5% off the time when nodes are
>> scattered in memory and takes 20% off when nodes are contiguous.
>>
>> All in all we save 20%-40% depending on the memory layout.
>
>
> Nice, thanks for the patch.
>
> Do you have (or are you willing to sign) a copyright assignment for
> GCC?
>
> See https://gcc.gnu.org/contribute.html#legal for details.
>
>> For future improvement, by passing whether there is an odd or even
>> number of nodes in the list we can hoist one of the ifs out of the
>> loop and gain another 5-10% but most likely this is only possible when
>> _GLIBCXX_USE_CXX11_ABI is defined and size() is O(1). This would bring
>> the saving to 30%-45%. Is it worth writing a new overload of
>> _M_reverse which takes the size of the list?
>
>
> That certainly seems worthwhile. Do we need an overload or can it just
> be done with #if? It seems to me we'd either want to use the size, or
> not use it, we wouldn't want both versions defined at once. That
> suggests #if to me.
>
diff mbox

Patch

diff --git a/libstdc++-v3/src/c++98/list.cc b/libstdc++-v3/src/c++98/list.cc
index 62b812a..aeb2402 100644
--- a/libstdc++-v3/src/c++98/list.cc
+++ b/libstdc++-v3/src/c++98/list.cc
@@ -112,15 +112,36 @@  namespace std _GLIBCXX_VISIBILITY(default)
     void
     _List_node_base::_M_reverse() _GLIBCXX_USE_NOEXCEPT
     {
-      _List_node_base* __tmp = this;
-      do
-	{
-	  std::swap(__tmp->_M_next, __tmp->_M_prev);
+      _List_node_base* __forward  = this;
+      _List_node_base* __backward = __forward->_M_prev;
+      if (__forward == __backward) {
+        return;
+      }
 
-	  // Old next node is now prev.
-	  __tmp = __tmp->_M_prev;
-	}
-      while (__tmp != this);
+      // unroll the loop to take advantage of memory-level parallelism
+      while (true) {
+        _List_node_base* __forward_next  = __forward->_M_next;
+        _List_node_base* __backward_prev = __backward->_M_prev;
+
+        __forward->_M_next  = __forward->_M_prev;
+        __backward->_M_prev = __backward->_M_next;
+
+        __backward->_M_next = __backward_prev;
+        __forward->_M_prev  = __forward_next;
+
+        if (__forward_next == __backward_prev) {
+          // if there is an odd number of nodes, swap the last one
+          std::swap(__forward_next->_M_next, __forward_next->_M_prev);
+          return;
+        }
+
+        if (__forward_next == __backward) {
+          return;
+        }
+
+        __forward  = __forward_next;
+        __backward = __backward_prev;
+      }
     }
 
     void
diff --git a/libstdc++-v3/testsuite/23_containers/list/operations/reverse/1.cc b/libstdc++-v3/testsuite/23_containers/list/operations/reverse/1.cc
new file mode 100644
index 0000000..7eab83b
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/list/operations/reverse/1.cc
@@ -0,0 +1,88 @@ 
+// Copyright (C) 2016-2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// <http://www.gnu.org/licenses/>.
+
+// 23.2.2.4 list operations [lib.list.ops]
+
+#include <testsuite_hooks.h>
+#include <list>
+
+// reverse
+void
+test01()
+{
+  bool test __attribute__((unused)) = true;
+
+  typedef std::list<int>      list_type;
+  typedef list_type::iterator iterator_type;
+
+  const int A[] = {0, 1, 2, 3};
+  const int N = sizeof(A) / sizeof(int);
+
+  list_type list01(A, A + N);
+
+  // iterators should still be valid after reverse
+  iterator_type i  = list01.begin();
+  iterator_type i0 = i++;
+  iterator_type i1 = i++;
+  iterator_type i2 = i++;
+  iterator_type i3 = i++;
+
+  // reverse a list of size 4
+  list01.reverse();
+  i = list01.begin();
+  VERIFY(*i++ == 3);
+  VERIFY(*i++ == 2);
+  VERIFY(*i++ == 1);
+  VERIFY(*i++ == 0);
+
+  // iterators should be unchanged after reverse
+  VERIFY(*i0 == 0);
+  VERIFY(*i1 == 1);
+  VERIFY(*i2 == 2);
+  VERIFY(*i3 == 3);
+
+  // reverse a list of size 3
+  list01.pop_back();
+  list01.reverse();
+  i = list01.begin();
+  VERIFY(*i++ == 1);
+  VERIFY(*i++ == 2);
+  VERIFY(*i++ == 3);
+
+  // reverse a list of size 2
+  list01.pop_back();
+  list01.reverse();
+  i = list01.begin();
+  VERIFY(*i++ == 2);
+  VERIFY(*i++ == 1);
+
+  // reverse a list of size 1
+  list01.pop_back();
+  list01.reverse();
+  i = list01.begin();
+  VERIFY(*i++ == 2);
+
+  // reverse an empty list
+  list01.pop_back();
+  list01.reverse();
+}
+
+int main()
+{
+  test01();
+  return 0;
+}