diff mbox

Ubuntu Wily / VMWare graphics boot regression

Message ID 1456353730.31338.104.camel@canonical.com
State New
Headers show

Commit Message

Kamal Mostafa Feb. 24, 2016, 10:42 p.m. UTC
Hi Thomas, Sinclair, and my team-

Here's a weird one.  It appears that this Linux commit which was
recently applied to Ubuntu Wily 15.10 (via 4.2-stable):

  [mainline] 025af18 drm/ttm: Fixed a read/write lock imbalance

is the trigger for this rather major Ubuntu/VMWare graphics boot
regression:

  https://bugs.launchpad.net/ubuntu/wily/+source/linux/+bug/1548587
  Ubuntu 15.10 VMWare guest won't show UI after upgrading to 4.2.0-30

(In Comment #33 I produced a test kernel with that commit reverted  
which was confirmed as fixing the regression).


But the thing is...

025af18 (attached) just looks so *obviously* valid, in that the thing
it fixes looks like it was obviously wrong.  I was reluctant to even
try reverting it, and was surprised when multiple testers confirmed
that it fixed the problem.

Furthermore, backports of 025af18 have been deployed in many other
stable kernels (and of course, mainline) but the reported boot problem
** only seems to occur with v4.2-based kernels **.  The problem does
occur with 4.2-stable (including the 025af18 backport), but does _not_
occur with a 4.4 kernel (which always contained 025af18).  That commit
been shipping in pre-4.2 Ubuntu Trusty and Vivid for at cycle or two
with no reports of problems there either.

So despite the indication that 025af18 is the troublemaker for 4.2-
stable based kernels, I'm not very happy with the idea of just
reverting it from 4.2-stable or from Wily without a better
understanding of why.

Any thoughts on this topic will be much appreciated.

 -Kamal

Comments

Sinclair Yeh Feb. 24, 2016, 11:27 p.m. UTC | #1
Hi,

I was able to reproduce this last night after updating 15.10, and I
didn't know what the cause was until your mail.

Let me try a 4.2 kernel with lockdep check enabled and see if I can
spot anything.

Thomas' in a different time zone, so he may also pick this up in
his morning.

Sinclair

On Wed, Feb 24, 2016 at 02:42:10PM -0800, Kamal Mostafa wrote:
> Hi Thomas, Sinclair, and my team-
> 
> Here's a weird one.  It appears that this Linux commit which was
> recently applied to Ubuntu Wily 15.10 (via 4.2-stable):
> 
>   [mainline] 025af18 drm/ttm: Fixed a read/write lock imbalance
> 
> is the trigger for this rather major Ubuntu/VMWare graphics boot
> regression:
> 
>   https://bugs.launchpad.net/ubuntu/wily/+source/linux/+bug/1548587
>   Ubuntu 15.10 VMWare guest won't show UI after upgrading to 4.2.0-30
> 
> (In Comment #33 I produced a test kernel with that commit reverted  
> which was confirmed as fixing the regression).
> 
> 
> But the thing is...
> 
> 025af18 (attached) just looks so *obviously* valid, in that the thing
> it fixes looks like it was obviously wrong.  I was reluctant to even
> try reverting it, and was surprised when multiple testers confirmed
> that it fixed the problem.
> 
> Furthermore, backports of 025af18 have been deployed in many other
> stable kernels (and of course, mainline) but the reported boot problem
> ** only seems to occur with v4.2-based kernels **.  The problem does
> occur with 4.2-stable (including the 025af18 backport), but does _not_
> occur with a 4.4 kernel (which always contained 025af18).  That commit
> been shipping in pre-4.2 Ubuntu Trusty and Vivid for at cycle or two
> with no reports of problems there either.
> 
> So despite the indication that 025af18 is the troublemaker for 4.2-
> stable based kernels, I'm not very happy with the idea of just
> reverting it from 4.2-stable or from Wily without a better
> understanding of why.
> 
> Any thoughts on this topic will be much appreciated.
> 
>  -Kamal

> From 025af189fb44250206dd8a32fa4a682392af3301 Mon Sep 17 00:00:00 2001
> From: Thomas Hellstrom <thellstrom@vmware.com>
> Date: Fri, 20 Nov 2015 11:43:50 -0800
> Subject: drm/ttm: Fixed a read/write lock imbalance
> 
> In ttm_write_lock(), the uninterruptible path should call
> __ttm_write_lock() not __ttm_read_lock().  This fixes a vmwgfx hang
> on F23 start up.
> 
> syeh: Extracted this from one of Thomas' internal patches.
> 
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
> ---
>  drivers/gpu/drm/ttm/ttm_lock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_lock.c b/drivers/gpu/drm/ttm/ttm_lock.c
> index 6a95454..f154fb1 100644
> --- a/drivers/gpu/drm/ttm/ttm_lock.c
> +++ b/drivers/gpu/drm/ttm/ttm_lock.c
> @@ -180,7 +180,7 @@ int ttm_write_lock(struct ttm_lock *lock, bool interruptible)
>  			spin_unlock(&lock->lock);
>  		}
>  	} else
> -		wait_event(lock->queue, __ttm_read_lock(lock));
> +		wait_event(lock->queue, __ttm_write_lock(lock));
>  
>  	return ret;
>  }
> -- 
> 2.7.0
>
Thomas Hellstrom Feb. 25, 2016, 6:13 a.m. UTC | #2
Hi!

Ugh. I'll try to reproduce and see if I can provide a fix. 4.3 saw a
major linux modesetting rewrite so it might be possible that we fixed
more than one bug and previously they might have canceled out eachother....

/Thomas



On 02/25/2016 12:27 AM, Sinclair Yeh wrote:
> Hi,
>
> I was able to reproduce this last night after updating 15.10, and I
> didn't know what the cause was until your mail.
>
> Let me try a 4.2 kernel with lockdep check enabled and see if I can
> spot anything.
>
> Thomas' in a different time zone, so he may also pick this up in
> his morning.
>
> Sinclair
>
> On Wed, Feb 24, 2016 at 02:42:10PM -0800, Kamal Mostafa wrote:
>> Hi Thomas, Sinclair, and my team-
>>
>> Here's a weird one.  It appears that this Linux commit which was
>> recently applied to Ubuntu Wily 15.10 (via 4.2-stable):
>>
>>   [mainline] 025af18 drm/ttm: Fixed a read/write lock imbalance
>>
>> is the trigger for this rather major Ubuntu/VMWare graphics boot
>> regression:
>>
>>   https://bugs.launchpad.net/ubuntu/wily/+source/linux/+bug/1548587
>>   Ubuntu 15.10 VMWare guest won't show UI after upgrading to 4.2.0-30
>>
>> (In Comment #33 I produced a test kernel with that commit reverted  
>> which was confirmed as fixing the regression).
>>
>>
>> But the thing is...
>>
>> 025af18 (attached) just looks so *obviously* valid, in that the thing
>> it fixes looks like it was obviously wrong.  I was reluctant to even
>> try reverting it, and was surprised when multiple testers confirmed
>> that it fixed the problem.
>>
>> Furthermore, backports of 025af18 have been deployed in many other
>> stable kernels (and of course, mainline) but the reported boot problem
>> ** only seems to occur with v4.2-based kernels **.  The problem does
>> occur with 4.2-stable (including the 025af18 backport), but does _not_
>> occur with a 4.4 kernel (which always contained 025af18).  That commit
>> been shipping in pre-4.2 Ubuntu Trusty and Vivid for at cycle or two
>> with no reports of problems there either.
>>
>> So despite the indication that 025af18 is the troublemaker for 4.2-
>> stable based kernels, I'm not very happy with the idea of just
>> reverting it from 4.2-stable or from Wily without a better
>> understanding of why.
>>
>> Any thoughts on this topic will be much appreciated.
>>
>>  -Kamal
>> From 025af189fb44250206dd8a32fa4a682392af3301 Mon Sep 17 00:00:00 2001
>> From: Thomas Hellstrom <thellstrom@vmware.com>
>> Date: Fri, 20 Nov 2015 11:43:50 -0800
>> Subject: drm/ttm: Fixed a read/write lock imbalance
>>
>> In ttm_write_lock(), the uninterruptible path should call
>> __ttm_write_lock() not __ttm_read_lock().  This fixes a vmwgfx hang
>> on F23 start up.
>>
>> syeh: Extracted this from one of Thomas' internal patches.
>>
>> Cc: <stable@vger.kernel.org>
>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
>> ---
>>  drivers/gpu/drm/ttm/ttm_lock.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_lock.c b/drivers/gpu/drm/ttm/ttm_lock.c
>> index 6a95454..f154fb1 100644
>> --- a/drivers/gpu/drm/ttm/ttm_lock.c
>> +++ b/drivers/gpu/drm/ttm/ttm_lock.c
>> @@ -180,7 +180,7 @@ int ttm_write_lock(struct ttm_lock *lock, bool interruptible)
>>  			spin_unlock(&lock->lock);
>>  		}
>>  	} else
>> -		wait_event(lock->queue, __ttm_read_lock(lock));
>> +		wait_event(lock->queue, __ttm_write_lock(lock));
>>  
>>  	return ret;
>>  }
>> -- 
>> 2.7.0
>>
Thomas Hellstrom Feb. 25, 2016, 8:46 a.m. UTC | #3
Hi!

There is a fix for this problem already upstream. For some reason it
wasn't cc'd stable..
The commit id is

12617971c443c50750a12a77ea0e08319d161975

and it applies from 3.15 to 4.2 provided the ttm fix is applied.

I'll send a message through stable.

/Thomas


On 02/25/2016 07:13 AM, Thomas Hellstrom wrote:
> Hi!
>
> Ugh. I'll try to reproduce and see if I can provide a fix. 4.3 saw a
> major linux modesetting rewrite so it might be possible that we fixed
> more than one bug and previously they might have canceled out eachother....
>
> /Thomas
>
>
>
> On 02/25/2016 12:27 AM, Sinclair Yeh wrote:
>> Hi,
>>
>> I was able to reproduce this last night after updating 15.10, and I
>> didn't know what the cause was until your mail.
>>
>> Let me try a 4.2 kernel with lockdep check enabled and see if I can
>> spot anything.
>>
>> Thomas' in a different time zone, so he may also pick this up in
>> his morning.
>>
>> Sinclair
>>
>> On Wed, Feb 24, 2016 at 02:42:10PM -0800, Kamal Mostafa wrote:
>>> Hi Thomas, Sinclair, and my team-
>>>
>>> Here's a weird one.  It appears that this Linux commit which was
>>> recently applied to Ubuntu Wily 15.10 (via 4.2-stable):
>>>
>>>   [mainline] 025af18 drm/ttm: Fixed a read/write lock imbalance
>>>
>>> is the trigger for this rather major Ubuntu/VMWare graphics boot
>>> regression:
>>>
>>>   https://bugs.launchpad.net/ubuntu/wily/+source/linux/+bug/1548587
>>>   Ubuntu 15.10 VMWare guest won't show UI after upgrading to 4.2.0-30
>>>
>>> (In Comment #33 I produced a test kernel with that commit reverted  
>>> which was confirmed as fixing the regression).
>>>
>>>
>>> But the thing is...
>>>
>>> 025af18 (attached) just looks so *obviously* valid, in that the thing
>>> it fixes looks like it was obviously wrong.  I was reluctant to even
>>> try reverting it, and was surprised when multiple testers confirmed
>>> that it fixed the problem.
>>>
>>> Furthermore, backports of 025af18 have been deployed in many other
>>> stable kernels (and of course, mainline) but the reported boot problem
>>> ** only seems to occur with v4.2-based kernels **.  The problem does
>>> occur with 4.2-stable (including the 025af18 backport), but does _not_
>>> occur with a 4.4 kernel (which always contained 025af18).  That commit
>>> been shipping in pre-4.2 Ubuntu Trusty and Vivid for at cycle or two
>>> with no reports of problems there either.
>>>
>>> So despite the indication that 025af18 is the troublemaker for 4.2-
>>> stable based kernels, I'm not very happy with the idea of just
>>> reverting it from 4.2-stable or from Wily without a better
>>> understanding of why.
>>>
>>> Any thoughts on this topic will be much appreciated.
>>>
>>>  -Kamal
>>> From 025af189fb44250206dd8a32fa4a682392af3301 Mon Sep 17 00:00:00 2001
>>> From: Thomas Hellstrom <thellstrom@vmware.com>
>>> Date: Fri, 20 Nov 2015 11:43:50 -0800
>>> Subject: drm/ttm: Fixed a read/write lock imbalance
>>>
>>> In ttm_write_lock(), the uninterruptible path should call
>>> __ttm_write_lock() not __ttm_read_lock().  This fixes a vmwgfx hang
>>> on F23 start up.
>>>
>>> syeh: Extracted this from one of Thomas' internal patches.
>>>
>>> Cc: <stable@vger.kernel.org>
>>> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
>>> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
>>> ---
>>>  drivers/gpu/drm/ttm/ttm_lock.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_lock.c b/drivers/gpu/drm/ttm/ttm_lock.c
>>> index 6a95454..f154fb1 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_lock.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_lock.c
>>> @@ -180,7 +180,7 @@ int ttm_write_lock(struct ttm_lock *lock, bool interruptible)
>>>  			spin_unlock(&lock->lock);
>>>  		}
>>>  	} else
>>> -		wait_event(lock->queue, __ttm_read_lock(lock));
>>> +		wait_event(lock->queue, __ttm_write_lock(lock));
>>>  
>>>  	return ret;
>>>  }
>>> -- 
>>> 2.7.0
>>>
Kamal Mostafa Feb. 25, 2016, 5:24 p.m. UTC | #4
On Thu, 2016-02-25 at 09:46 +0100, Thomas Hellstrom wrote:
> Hi!
> 
> There is a fix for this problem already upstream. For some reason it
> wasn't cc'd stable..
> The commit id is
> 
> 12617971c443c50750a12a77ea0e08319d161975
> 
> and it applies from 3.15 to 4.2 provided the ttm fix is applied.
> 
> I'll send a message through stable.
> 
> /Thomas


Confirmed.  This does indeed fix:

  https://bugs.launchpad.net/ubuntu/wily/+source/linux/+bug/1548587
  Ubuntu 15.10 VMWare guest won't show UI after upgrading to 4.2.0-30

Thanks Thomas!

 -Kamal


> 
> On 02/25/2016 07:13 AM, Thomas Hellstrom wrote:
> > Hi!
> > 
> > Ugh. I'll try to reproduce and see if I can provide a fix. 4.3 saw
> > a
> > major linux modesetting rewrite so it might be possible that we
> > fixed
> > more than one bug and previously they might have canceled out
> > eachother....
> > 
> > /Thomas
> > 
> > 
> > 
> > On 02/25/2016 12:27 AM, Sinclair Yeh wrote:
> > > Hi,
> > > 
> > > I was able to reproduce this last night after updating 15.10, and
> > > I
> > > didn't know what the cause was until your mail.
> > > 
> > > Let me try a 4.2 kernel with lockdep check enabled and see if I
> > > can
> > > spot anything.
> > > 
> > > Thomas' in a different time zone, so he may also pick this up in
> > > his morning.
> > > 
> > > Sinclair
> > > 
> > > On Wed, Feb 24, 2016 at 02:42:10PM -0800, Kamal Mostafa wrote:
> > > > Hi Thomas, Sinclair, and my team-
> > > > 
> > > > Here's a weird one.  It appears that this Linux commit which
> > > > was
> > > > recently applied to Ubuntu Wily 15.10 (via 4.2-stable):
> > > > 
> > > >   [mainline] 025af18 drm/ttm: Fixed a read/write lock imbalance
> > > > 
> > > > is the trigger for this rather major Ubuntu/VMWare graphics
> > > > boot
> > > > regression:
> > > > 
> > > >   https://bugs.launchpad.net/ubuntu/wily/+source/linux/+bug/154
> > > > 8587
> > > >   Ubuntu 15.10 VMWare guest won't show UI after upgrading to
> > > > 4.2.0-30
> > > > 
> > > > (In Comment #33 I produced a test kernel with that commit
> > > > reverted  
> > > > which was confirmed as fixing the regression).
> > > > 
> > > > 
> > > > But the thing is...
> > > > 
> > > > 025af18 (attached) just looks so *obviously* valid, in that the
> > > > thing
> > > > it fixes looks like it was obviously wrong.  I was reluctant to
> > > > even
> > > > try reverting it, and was surprised when multiple testers
> > > > confirmed
> > > > that it fixed the problem.
> > > > 
> > > > Furthermore, backports of 025af18 have been deployed in many
> > > > other
> > > > stable kernels (and of course, mainline) but the reported boot
> > > > problem
> > > > ** only seems to occur with v4.2-based kernels **.  The problem
> > > > does
> > > > occur with 4.2-stable (including the 025af18 backport), but
> > > > does _not_
> > > > occur with a 4.4 kernel (which always contained 025af18).  That
> > > > commit
> > > > been shipping in pre-4.2 Ubuntu Trusty and Vivid for at cycle
> > > > or two
> > > > with no reports of problems there either.
> > > > 
> > > > So despite the indication that 025af18 is the troublemaker for
> > > > 4.2-
> > > > stable based kernels, I'm not very happy with the idea of just
> > > > reverting it from 4.2-stable or from Wily without a better
> > > > understanding of why.
> > > > 
> > > > Any thoughts on this topic will be much appreciated.
> > > > 
> > > >  -Kamal
> > > > From 025af189fb44250206dd8a32fa4a682392af3301 Mon Sep 17
> > > > 00:00:00 2001
> > > > From: Thomas Hellstrom <thellstrom@vmware.com>
> > > > Date: Fri, 20 Nov 2015 11:43:50 -0800
> > > > Subject: drm/ttm: Fixed a read/write lock imbalance
> > > > 
> > > > In ttm_write_lock(), the uninterruptible path should call
> > > > __ttm_write_lock() not __ttm_read_lock().  This fixes a vmwgfx
> > > > hang
> > > > on F23 start up.
> > > > 
> > > > syeh: Extracted this from one of Thomas' internal patches.
> > > > 
> > > > Cc: <stable@vger.kernel.org>
> > > > Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
> > > > Reviewed-by: Sinclair Yeh <syeh@vmware.com>
> > > > ---
> > > >  drivers/gpu/drm/ttm/ttm_lock.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_lock.c
> > > > b/drivers/gpu/drm/ttm/ttm_lock.c
> > > > index 6a95454..f154fb1 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_lock.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_lock.c
> > > > @@ -180,7 +180,7 @@ int ttm_write_lock(struct ttm_lock *lock,
> > > > bool interruptible)
> > > >  			spin_unlock(&lock->lock);
> > > >  		}
> > > >  	} else
> > > > -		wait_event(lock->queue,
> > > > __ttm_read_lock(lock));
> > > > +		wait_event(lock->queue,
> > > > __ttm_write_lock(lock));
> > > >  
> > > >  	return ret;
> > > >  }
Sinclair Yeh Feb. 26, 2016, 6:01 p.m. UTC | #5
Hi,

FYI, I think this one is also related:

https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1550090?comments=all

I've checked out the kernel for Trusty and verified that it has the
read/write lock patch, but not the FB unlock patch.

Sinclair


On Thu, Feb 25, 2016 at 09:24:46AM -0800, Kamal Mostafa wrote:
> On Thu, 2016-02-25 at 09:46 +0100, Thomas Hellstrom wrote:
> > Hi!
> > 
> > There is a fix for this problem already upstream. For some reason it
> > wasn't cc'd stable..
> > The commit id is
> > 
> > 12617971c443c50750a12a77ea0e08319d161975
> > 
> > and it applies from 3.15 to 4.2 provided the ttm fix is applied.
> > 
> > I'll send a message through stable.
> > 
> > /Thomas
> 
> 
> Confirmed.  This does indeed fix:
> 
>   https://bugs.launchpad.net/ubuntu/wily/+source/linux/+bug/1548587
>   Ubuntu 15.10 VMWare guest won't show UI after upgrading to 4.2.0-30
> 
> Thanks Thomas!
> 
>  -Kamal
> 
> 
> > 
> > On 02/25/2016 07:13 AM, Thomas Hellstrom wrote:
> > > Hi!
> > > 
> > > Ugh. I'll try to reproduce and see if I can provide a fix. 4.3 saw
> > > a
> > > major linux modesetting rewrite so it might be possible that we
> > > fixed
> > > more than one bug and previously they might have canceled out
> > > eachother....
> > > 
> > > /Thomas
> > > 
> > > 
> > > 
> > > On 02/25/2016 12:27 AM, Sinclair Yeh wrote:
> > > > Hi,
> > > > 
> > > > I was able to reproduce this last night after updating 15.10, and
> > > > I
> > > > didn't know what the cause was until your mail.
> > > > 
> > > > Let me try a 4.2 kernel with lockdep check enabled and see if I
> > > > can
> > > > spot anything.
> > > > 
> > > > Thomas' in a different time zone, so he may also pick this up in
> > > > his morning.
> > > > 
> > > > Sinclair
> > > > 
> > > > On Wed, Feb 24, 2016 at 02:42:10PM -0800, Kamal Mostafa wrote:
> > > > > Hi Thomas, Sinclair, and my team-
> > > > > 
> > > > > Here's a weird one.  It appears that this Linux commit which
> > > > > was
> > > > > recently applied to Ubuntu Wily 15.10 (via 4.2-stable):
> > > > > 
> > > > >   [mainline] 025af18 drm/ttm: Fixed a read/write lock imbalance
> > > > > 
> > > > > is the trigger for this rather major Ubuntu/VMWare graphics
> > > > > boot
> > > > > regression:
> > > > > 
> > > > >   https://bugs.launchpad.net/ubuntu/wily/+source/linux/+bug/154
> > > > > 8587
> > > > >   Ubuntu 15.10 VMWare guest won't show UI after upgrading to
> > > > > 4.2.0-30
> > > > > 
> > > > > (In Comment #33 I produced a test kernel with that commit
> > > > > reverted  
> > > > > which was confirmed as fixing the regression).
> > > > > 
> > > > > 
> > > > > But the thing is...
> > > > > 
> > > > > 025af18 (attached) just looks so *obviously* valid, in that the
> > > > > thing
> > > > > it fixes looks like it was obviously wrong.  I was reluctant to
> > > > > even
> > > > > try reverting it, and was surprised when multiple testers
> > > > > confirmed
> > > > > that it fixed the problem.
> > > > > 
> > > > > Furthermore, backports of 025af18 have been deployed in many
> > > > > other
> > > > > stable kernels (and of course, mainline) but the reported boot
> > > > > problem
> > > > > ** only seems to occur with v4.2-based kernels **.  The problem
> > > > > does
> > > > > occur with 4.2-stable (including the 025af18 backport), but
> > > > > does _not_
> > > > > occur with a 4.4 kernel (which always contained 025af18).  That
> > > > > commit
> > > > > been shipping in pre-4.2 Ubuntu Trusty and Vivid for at cycle
> > > > > or two
> > > > > with no reports of problems there either.
> > > > > 
> > > > > So despite the indication that 025af18 is the troublemaker for
> > > > > 4.2-
> > > > > stable based kernels, I'm not very happy with the idea of just
> > > > > reverting it from 4.2-stable or from Wily without a better
> > > > > understanding of why.
> > > > > 
> > > > > Any thoughts on this topic will be much appreciated.
> > > > > 
> > > > >  -Kamal
> > > > > From 025af189fb44250206dd8a32fa4a682392af3301 Mon Sep 17
> > > > > 00:00:00 2001
> > > > > From: Thomas Hellstrom <thellstrom@vmware.com>
> > > > > Date: Fri, 20 Nov 2015 11:43:50 -0800
> > > > > Subject: drm/ttm: Fixed a read/write lock imbalance
> > > > > 
> > > > > In ttm_write_lock(), the uninterruptible path should call
> > > > > __ttm_write_lock() not __ttm_read_lock().  This fixes a vmwgfx
> > > > > hang
> > > > > on F23 start up.
> > > > > 
> > > > > syeh: Extracted this from one of Thomas' internal patches.
> > > > > 
> > > > > Cc: <stable@vger.kernel.org>
> > > > > Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
> > > > > Reviewed-by: Sinclair Yeh <syeh@vmware.com>
> > > > > ---
> > > > >  drivers/gpu/drm/ttm/ttm_lock.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/ttm/ttm_lock.c
> > > > > b/drivers/gpu/drm/ttm/ttm_lock.c
> > > > > index 6a95454..f154fb1 100644
> > > > > --- a/drivers/gpu/drm/ttm/ttm_lock.c
> > > > > +++ b/drivers/gpu/drm/ttm/ttm_lock.c
> > > > > @@ -180,7 +180,7 @@ int ttm_write_lock(struct ttm_lock *lock,
> > > > > bool interruptible)
> > > > >  			spin_unlock(&lock->lock);
> > > > >  		}
> > > > >  	} else
> > > > > -		wait_event(lock->queue,
> > > > > __ttm_read_lock(lock));
> > > > > +		wait_event(lock->queue,
> > > > > __ttm_write_lock(lock));
> > > > >  
> > > > >  	return ret;
> > > > >  }
Kamal Mostafa Feb. 26, 2016, 6:23 p.m. UTC | #6
On Fri, 2016-02-26 at 10:01 -0800, Sinclair Yeh wrote:
> Hi,
> 
> FYI, I think this one is also related:
> 
> https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/155009
> 0?comments=all
> 
> I've checked out the kernel for Trusty and verified that it has the
> read/write lock patch, but not the FB unlock patch.

Yup, we'll apply the same fix to our 3.19 Trusty kernel too.  Thanks
for the heads-up on that other bug number, Sinclair.  We appreciate the
help!

 -Kamal


> On Thu, Feb 25, 2016 at 09:24:46AM -0800, Kamal Mostafa wrote:
> > On Thu, 2016-02-25 at 09:46 +0100, Thomas Hellstrom wrote:
> > > Hi!
> > > 
> > > There is a fix for this problem already upstream. For some reason
> > > it
> > > wasn't cc'd stable..
> > > The commit id is
> > > 
> > > 12617971c443c50750a12a77ea0e08319d161975
> > > 
> > > and it applies from 3.15 to 4.2 provided the ttm fix is applied.
> > > 
> > > I'll send a message through stable.
> > > 
> > > /Thomas
> > 
> > 
> > Confirmed.  This does indeed fix:
> > 
> >   https://bugs.launchpad.net/ubuntu/wily/+source/linux/+bug/1548587
> >   Ubuntu 15.10 VMWare guest won't show UI after upgrading to 4.2.0-
> > 30
> > 
> > Thanks Thomas!
> > 
> >  -Kamal
> > 
> > 
> > > 
> > > On 02/25/2016 07:13 AM, Thomas Hellstrom wrote:
> > > > Hi!
> > > > 
> > > > Ugh. I'll try to reproduce and see if I can provide a fix. 4.3
> > > > saw
> > > > a
> > > > major linux modesetting rewrite so it might be possible that we
> > > > fixed
> > > > more than one bug and previously they might have canceled out
> > > > eachother....
> > > > 
> > > > /Thomas
> > > > 
> > > > 
> > > > 
> > > > On 02/25/2016 12:27 AM, Sinclair Yeh wrote:
> > > > > Hi,
> > > > > 
> > > > > I was able to reproduce this last night after updating 15.10,
> > > > > and
> > > > > I
> > > > > didn't know what the cause was until your mail.
> > > > > 
> > > > > Let me try a 4.2 kernel with lockdep check enabled and see if
> > > > > I
> > > > > can
> > > > > spot anything.
> > > > > 
> > > > > Thomas' in a different time zone, so he may also pick this up
> > > > > in
> > > > > his morning.
> > > > > 
> > > > > Sinclair
> > > > > 
> > > > > On Wed, Feb 24, 2016 at 02:42:10PM -0800, Kamal Mostafa
> > > > > wrote:
> > > > > > Hi Thomas, Sinclair, and my team-
> > > > > > 
> > > > > > Here's a weird one.  It appears that this Linux commit
> > > > > > which
> > > > > > was
> > > > > > recently applied to Ubuntu Wily 15.10 (via 4.2-stable):
> > > > > > 
> > > > > >   [mainline] 025af18 drm/ttm: Fixed a read/write lock
> > > > > > imbalance
> > > > > > 
> > > > > > is the trigger for this rather major Ubuntu/VMWare graphics
> > > > > > boot
> > > > > > regression:
> > > > > > 
> > > > > >   https://bugs.launchpad.net/ubuntu/wily/+source/linux/+bug
> > > > > > /154
> > > > > > 8587
> > > > > >   Ubuntu 15.10 VMWare guest won't show UI after upgrading
> > > > > > to
> > > > > > 4.2.0-30
> > > > > > 
> > > > > > (In Comment #33 I produced a test kernel with that commit
> > > > > > reverted  
> > > > > > which was confirmed as fixing the regression).
> > > > > > 
> > > > > > 
> > > > > > But the thing is...
> > > > > > 
> > > > > > 025af18 (attached) just looks so *obviously* valid, in that
> > > > > > the
> > > > > > thing
> > > > > > it fixes looks like it was obviously wrong.  I was
> > > > > > reluctant to
> > > > > > even
> > > > > > try reverting it, and was surprised when multiple testers
> > > > > > confirmed
> > > > > > that it fixed the problem.
> > > > > > 
> > > > > > Furthermore, backports of 025af18 have been deployed in
> > > > > > many
> > > > > > other
> > > > > > stable kernels (and of course, mainline) but the reported
> > > > > > boot
> > > > > > problem
> > > > > > ** only seems to occur with v4.2-based kernels **.  The
> > > > > > problem
> > > > > > does
> > > > > > occur with 4.2-stable (including the 025af18 backport), but
> > > > > > does _not_
> > > > > > occur with a 4.4 kernel (which always contained
> > > > > > 025af18).  That
> > > > > > commit
> > > > > > been shipping in pre-4.2 Ubuntu Trusty and Vivid for at
> > > > > > cycle
> > > > > > or two
> > > > > > with no reports of problems there either.
> > > > > > 
> > > > > > So despite the indication that 025af18 is the troublemaker
> > > > > > for
> > > > > > 4.2-
> > > > > > stable based kernels, I'm not very happy with the idea of
> > > > > > just
> > > > > > reverting it from 4.2-stable or from Wily without a better
> > > > > > understanding of why.
> > > > > > 
> > > > > > Any thoughts on this topic will be much appreciated.
> > > > > > 
> > > > > >  -Kamal
> > > > > > From 025af189fb44250206dd8a32fa4a682392af3301 Mon Sep 17
> > > > > > 00:00:00 2001
> > > > > > From: Thomas Hellstrom <thellstrom@vmware.com>
> > > > > > Date: Fri, 20 Nov 2015 11:43:50 -0800
> > > > > > Subject: drm/ttm: Fixed a read/write lock imbalance
> > > > > > 
> > > > > > In ttm_write_lock(), the uninterruptible path should call
> > > > > > __ttm_write_lock() not __ttm_read_lock().  This fixes a
> > > > > > vmwgfx
> > > > > > hang
> > > > > > on F23 start up.
> > > > > > 
> > > > > > syeh: Extracted this from one of Thomas' internal patches.
> > > > > > 
> > > > > > Cc: <stable@vger.kernel.org>
> > > > > > Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
> > > > > > Reviewed-by: Sinclair Yeh <syeh@vmware.com>
> > > > > > ---
> > > > > >  drivers/gpu/drm/ttm/ttm_lock.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_lock.c
> > > > > > b/drivers/gpu/drm/ttm/ttm_lock.c
> > > > > > index 6a95454..f154fb1 100644
> > > > > > --- a/drivers/gpu/drm/ttm/ttm_lock.c
> > > > > > +++ b/drivers/gpu/drm/ttm/ttm_lock.c
> > > > > > @@ -180,7 +180,7 @@ int ttm_write_lock(struct ttm_lock
> > > > > > *lock,
> > > > > > bool interruptible)
> > > > > >  			spin_unlock(&lock->lock);
> > > > > >  		}
> > > > > >  	} else
> > > > > > -		wait_event(lock->queue,
> > > > > > __ttm_read_lock(lock));
> > > > > > +		wait_event(lock->queue,
> > > > > > __ttm_write_lock(lock));
> > > > > >  
> > > > > >  	return ret;
> > > > > >  }
>
diff mbox

Patch

From 025af189fb44250206dd8a32fa4a682392af3301 Mon Sep 17 00:00:00 2001
From: Thomas Hellstrom <thellstrom@vmware.com>
Date: Fri, 20 Nov 2015 11:43:50 -0800
Subject: drm/ttm: Fixed a read/write lock imbalance

In ttm_write_lock(), the uninterruptible path should call
__ttm_write_lock() not __ttm_read_lock().  This fixes a vmwgfx hang
on F23 start up.

syeh: Extracted this from one of Thomas' internal patches.

Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
---
 drivers/gpu/drm/ttm/ttm_lock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_lock.c b/drivers/gpu/drm/ttm/ttm_lock.c
index 6a95454..f154fb1 100644
--- a/drivers/gpu/drm/ttm/ttm_lock.c
+++ b/drivers/gpu/drm/ttm/ttm_lock.c
@@ -180,7 +180,7 @@  int ttm_write_lock(struct ttm_lock *lock, bool interruptible)
 			spin_unlock(&lock->lock);
 		}
 	} else
-		wait_event(lock->queue, __ttm_read_lock(lock));
+		wait_event(lock->queue, __ttm_write_lock(lock));
 
 	return ret;
 }
-- 
2.7.0