mbox series

[RFC/SRU,Bionic,0/2] Reverts we probably want to do quickly

Message ID 20210728150230.150985-1-stefan.bader@canonical.com
Headers show
Series Reverts we probably want to do quickly | expand

Message

Stefan Bader July 28, 2021, 3:02 p.m. UTC
While we were looking for a regression I stumbled over two changes we
imported from 4.19.y in to Bionic. Both differ from the versions in
4.19.y and somehow I am suspicious that they might not be needed or even
might cause some harm.

The amdgpu change follows a change that is not in 4.15 where GEM objects
are stored in a different place and somehow I believe that this might
have added the need to release multiple references. Though I cannot find
the place where those multiple references are taken (which is the claim
of the fix). The test done before the release loop is using the same
reference as the modified fix code. So maybe it is ok. On the other hand
it is sometimes better not to change an apparently working system.

With the mutex change it was a missing removal of a __sched directive.
It was dropped from a function which appears to be the counterpart of a
new function that gets added with the fix. This function (which does
exist in 4.19 but not in 4.15 was added to 4.19 when adding Wound-Wait
mutexes:

commit 08295b3b5beec9aac0f7a9db86f0fc3792039da3
Author: Thomas Hellstrom <thellstrom@vmware.com>
Date:   Fri Jun 15 10:17:38 2018 +0200

    locking: Implement an algorithm choice for Wound-Wait mutexes
...
/*
+ * Add @waiter to a given location in the lock wait_list and set the
+ * FLAG_WAITERS flag if it's the first waiter.
+ */
+static void __sched
+__mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
+                  struct list_head *list)

The question is whether without this, the clearing is actually required
or potentially even be dangerous.

-Stefan

Stefan Bader (2):
  Revert "locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to
    signal"
  Revert "drm/amd/amdgpu: fix refcount leak"

 drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c |  3 ---
 kernel/locking/mutex-debug.c           |  4 ++--
 kernel/locking/mutex-debug.h           |  2 +-
 kernel/locking/mutex.c                 | 16 ++++------------
 kernel/locking/mutex.h                 |  4 +++-
 5 files changed, 10 insertions(+), 19 deletions(-)

Comments

Kleber Souza July 30, 2021, 10:23 a.m. UTC | #1
On 28.07.21 17:02, Stefan Bader wrote:
> While we were looking for a regression I stumbled over two changes we
> imported from 4.19.y in to Bionic. Both differ from the versions in
> 4.19.y and somehow I am suspicious that they might not be needed or even
> might cause some harm.
> 
> The amdgpu change follows a change that is not in 4.15 where GEM objects
> are stored in a different place and somehow I believe that this might
> have added the need to release multiple references. Though I cannot find
> the place where those multiple references are taken (which is the claim
> of the fix). The test done before the release loop is using the same
> reference as the modified fix code. So maybe it is ok. On the other hand
> it is sometimes better not to change an apparently working system.
> 
> With the mutex change it was a missing removal of a __sched directive.
> It was dropped from a function which appears to be the counterpart of a
> new function that gets added with the fix. This function (which does
> exist in 4.19 but not in 4.15 was added to 4.19 when adding Wound-Wait
> mutexes:
> 
> commit 08295b3b5beec9aac0f7a9db86f0fc3792039da3
> Author: Thomas Hellstrom <thellstrom@vmware.com>
> Date:   Fri Jun 15 10:17:38 2018 +0200
> 
>      locking: Implement an algorithm choice for Wound-Wait mutexes
> ...
> /*
> + * Add @waiter to a given location in the lock wait_list and set the
> + * FLAG_WAITERS flag if it's the first waiter.
> + */
> +static void __sched
> +__mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
> +                  struct list_head *list)
> 
> The question is whether without this, the clearing is actually required
> or potentially even be dangerous.
> 
> -Stefan
> 
> Stefan Bader (2):
>    Revert "locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to
>      signal"
>    Revert "drm/amd/amdgpu: fix refcount leak"
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c |  3 ---
>   kernel/locking/mutex-debug.c           |  4 ++--
>   kernel/locking/mutex-debug.h           |  2 +-
>   kernel/locking/mutex.c                 | 16 ++++------------
>   kernel/locking/mutex.h                 |  4 +++-
>   5 files changed, 10 insertions(+), 19 deletions(-)
> 

Reverting those patches looks sensible.


Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

Thanks
Kleber Souza July 30, 2021, 10:43 a.m. UTC | #2
On 28.07.21 17:02, Stefan Bader wrote:
> While we were looking for a regression I stumbled over two changes we
> imported from 4.19.y in to Bionic. Both differ from the versions in
> 4.19.y and somehow I am suspicious that they might not be needed or even
> might cause some harm.
> 
> The amdgpu change follows a change that is not in 4.15 where GEM objects
> are stored in a different place and somehow I believe that this might
> have added the need to release multiple references. Though I cannot find
> the place where those multiple references are taken (which is the claim
> of the fix). The test done before the release loop is using the same
> reference as the modified fix code. So maybe it is ok. On the other hand
> it is sometimes better not to change an apparently working system.
> 
> With the mutex change it was a missing removal of a __sched directive.
> It was dropped from a function which appears to be the counterpart of a
> new function that gets added with the fix. This function (which does
> exist in 4.19 but not in 4.15 was added to 4.19 when adding Wound-Wait
> mutexes:
> 
> commit 08295b3b5beec9aac0f7a9db86f0fc3792039da3
> Author: Thomas Hellstrom <thellstrom@vmware.com>
> Date:   Fri Jun 15 10:17:38 2018 +0200
> 
>      locking: Implement an algorithm choice for Wound-Wait mutexes
> ...
> /*
> + * Add @waiter to a given location in the lock wait_list and set the
> + * FLAG_WAITERS flag if it's the first waiter.
> + */
> +static void __sched
> +__mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
> +                  struct list_head *list)
> 
> The question is whether without this, the clearing is actually required
> or potentially even be dangerous.
> 
> -Stefan
> 
> Stefan Bader (2):
>    Revert "locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to
>      signal"
>    Revert "drm/amd/amdgpu: fix refcount leak"
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c |  3 ---
>   kernel/locking/mutex-debug.c           |  4 ++--
>   kernel/locking/mutex-debug.h           |  2 +-
>   kernel/locking/mutex.c                 | 16 ++++------------
>   kernel/locking/mutex.h                 |  4 +++-
>   5 files changed, 10 insertions(+), 19 deletions(-)
> 


Applied to bionic:linux, adding the "UBUNTU: SAUCE:" prefix to the commits.

These patches were applied with a BugLink to https://bugs.launchpad.net/bugs/1938537
("Potential reverts of 4.19.y stable changes in 18.04"), which has the following SRU
Justification:

===========================================================================
Impact: While reviewing patches done as part of stable work there were two changes
which came from linux-4.19.y and seemed not quite right for the 4.15 kernel version
we have in 18.04/Bionic:

2e1eb7b6e1e0 locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal

This change is only in 4.19y and not in 4.14.y despite it is implied that the breaking
commit was already in 3.13. It also seems to pair up with a mode of mutex which only
got added to 4.19. Although there were no reports yet, which could prove this wrong,
it feels like we rather should not take this change without a case where the current
code is reported to be wrong.

6c0065eb69de drm/amd/amdgpu: fix refcount leak

The driver in 4.15 does take refcounts the same way as 4.19 does. We are missing
37ac3dc00da0 ("drm/amdgpu: Use device specific BO size & stride check.") which changed
that. So we really should revert this one.

Testing: These are proactive reverts, so there is no real test that could be done.
However this just moves back to a state which we had before. And for that state we had
no bug reports.
===========================================================================


Thanks,
Kleber