diff mbox

kernel BUG at drivers/scsi/scsi_lib.c:1096!

Message ID 20151122005635.1b9ffbe1@tom-T450 (mailing list archive)
State Not Applicable
Headers show

Commit Message

Ming Lei Nov. 21, 2015, 4:56 p.m. UTC
On Sat, 21 Nov 2015 12:30:14 +0100
Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:

> On 20/11/2015 13:10, Michael Ellerman wrote:
> > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > 
> >> It's pretty much guaranteed a block layer bug, most likely in the
> >> merge bios to request infrastucture where we don't obey the merging
> >> limits properly.
> >>
> >> Does either of you have a known good and first known bad kernel?
> > 
> > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > 4.4-rc1.
> > 
> > Laurent, can you narrow it down at all?
> 
> It seems that the panic is triggered by the commit bdced438acd8 ("block:
> setup bi_phys_segments after splitting") which has been pulled by the
> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> git://git.kernel.dk/linux-block").
> 
> My system is panicing promptly when running a kernel built at
> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> without panicing.
> 
> This being said, I can't explain what's going wrong.
> 
> May Ming shed some light here ?

Laurent, looks there is one bug in blk_bio_segment_split(), would you
mind testing the following patch to see if it fixes your issue?

---
From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei@canonical.com>
Date: Sun, 22 Nov 2015 00:47:13 +0800
Subject: [PATCH] block: fix segment split

Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
always points to the iterator local variable, which is obviously
wrong, so fix it by pointing to the local variable of 'bvprv'.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 block/blk-merge.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Mark Salter Nov. 22, 2015, 11:20 p.m. UTC | #1
On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> On Sat, 21 Nov 2015 12:30:14 +0100
> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
> 
> > On 20/11/2015 13:10, Michael Ellerman wrote:
> > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > > 
> > > > It's pretty much guaranteed a block layer bug, most likely in the
> > > > merge bios to request infrastucture where we don't obey the merging
> > > > limits properly.
> > > > 
> > > > Does either of you have a known good and first known bad kernel?
> > > 
> > > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > > 4.4-rc1.
> > > 
> > > Laurent, can you narrow it down at all?
> > 
> > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> > setup bi_phys_segments after splitting") which has been pulled by the
> > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> > git://git.kernel.dk/linux-block").
> > 
> > My system is panicing promptly when running a kernel built at
> > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> > without panicing.
> > 
> > This being said, I can't explain what's going wrong.
> > 
> > May Ming shed some light here ?
> 
> Laurent, looks there is one bug in blk_bio_segment_split(), would you
> mind testing the following patch to see if it fixes your issue?
> 
> ---
> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> From: Ming Lei <ming.lei@canonical.com>
> Date: Sun, 22 Nov 2015 00:47:13 +0800
> Subject: [PATCH] block: fix segment split
> 
> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> always points to the iterator local variable, which is obviously
> wrong, so fix it by pointing to the local variable of 'bvprv'.
> 
> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> ---
>  block/blk-merge.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index de5716d8..f2efe8a 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>  
>  			seg_size += bv.bv_len;
>  			bvprv = bv;
> -			bvprvp = &bv;
> +			bvprvp = &bvprv;
>  			sectors += bv.bv_len >> 9;
>  			continue;
>  		}
> @@ -108,7 +108,7 @@ new_segment:
>  
>  		nsegs++;
>  		bvprv = bv;
> -		bvprvp = &bv;
> +		bvprvp = &bvprv;
>  		seg_size = bv.bv_len;
>  		sectors += bv.bv_len >> 9;
>  	}

I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
Ming Lei Nov. 23, 2015, 12:36 a.m. UTC | #2
On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msalter@redhat.com> wrote:
> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> On Sat, 21 Nov 2015 12:30:14 +0100
>> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>>
>> > On 20/11/2015 13:10, Michael Ellerman wrote:
>> > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>> > >
>> > > > It's pretty much guaranteed a block layer bug, most likely in the
>> > > > merge bios to request infrastucture where we don't obey the merging
>> > > > limits properly.
>> > > >
>> > > > Does either of you have a known good and first known bad kernel?
>> > >
>> > > Not me, I've only hit it one or two times. All I can say is I have hit it in
>> > > 4.4-rc1.
>> > >
>> > > Laurent, can you narrow it down at all?
>> >
>> > It seems that the panic is triggered by the commit bdced438acd8 ("block:
>> > setup bi_phys_segments after splitting") which has been pulled by the
>> > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>> > git://git.kernel.dk/linux-block").
>> >
>> > My system is panicing promptly when running a kernel built at
>> > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>> > without panicing.
>> >
>> > This being said, I can't explain what's going wrong.
>> >
>> > May Ming shed some light here ?
>>
>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> mind testing the following patch to see if it fixes your issue?
>>
>> ---
>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> From: Ming Lei <ming.lei@canonical.com>
>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>> Subject: [PATCH] block: fix segment split
>>
>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> always points to the iterator local variable, which is obviously
>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>
>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>> ---
>>  block/blk-merge.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index de5716d8..f2efe8a 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>
>>                       seg_size += bv.bv_len;
>>                       bvprv = bv;
>> -                     bvprvp = &bv;
>> +                     bvprvp = &bvprv;
>>                       sectors += bv.bv_len >> 9;
>>                       continue;
>>               }
>> @@ -108,7 +108,7 @@ new_segment:
>>
>>               nsegs++;
>>               bvprv = bv;
>> -             bvprvp = &bv;
>> +             bvprvp = &bvprv;
>>               seg_size = bv.bv_len;
>>               sectors += bv.bv_len >> 9;
>>       }
>
> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.

OK, looks there are still other bugs, care to share us how to reproduce
it on arm64?

thanks,
Ming
Mark Salter Nov. 23, 2015, 1:50 a.m. UTC | #3
On Mon, 2015-11-23 at 08:36 +0800, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 7:20 AM, Mark Salter <msalter@redhat.com> wrote:
> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> > > On Sat, 21 Nov 2015 12:30:14 +0100
> > > Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
> > > 
> > > > On 20/11/2015 13:10, Michael Ellerman wrote:
> > > > > On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> > > > > 
> > > > > > It's pretty much guaranteed a block layer bug, most likely in the
> > > > > > merge bios to request infrastucture where we don't obey the merging
> > > > > > limits properly.
> > > > > > 
> > > > > > Does either of you have a known good and first known bad kernel?
> > > > > 
> > > > > Not me, I've only hit it one or two times. All I can say is I have hit it in
> > > > > 4.4-rc1.
> > > > > 
> > > > > Laurent, can you narrow it down at all?
> > > > 
> > > > It seems that the panic is triggered by the commit bdced438acd8 ("block:
> > > > setup bi_phys_segments after splitting") which has been pulled by the
> > > > merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> > > > git://git.kernel.dk/linux-block").
> > > > 
> > > > My system is panicing promptly when running a kernel built at
> > > > d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> > > > without panicing.
> > > > 
> > > > This being said, I can't explain what's going wrong.
> > > > 
> > > > May Ming shed some light here ?
> > > 
> > > Laurent, looks there is one bug in blk_bio_segment_split(), would you
> > > mind testing the following patch to see if it fixes your issue?
> > > 
> > > ---
> > > From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> > > From: Ming Lei <ming.lei@canonical.com>
> > > Date: Sun, 22 Nov 2015 00:47:13 +0800
> > > Subject: [PATCH] block: fix segment split
> > > 
> > > Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> > > always points to the iterator local variable, which is obviously
> > > wrong, so fix it by pointing to the local variable of 'bvprv'.
> > > 
> > > Signed-off-by: Ming Lei <ming.lei@canonical.com>
> > > ---
> > >  block/blk-merge.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/block/blk-merge.c b/block/blk-merge.c
> > > index de5716d8..f2efe8a 100644
> > > --- a/block/blk-merge.c
> > > +++ b/block/blk-merge.c
> > > @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
> > > 
> > >                       seg_size += bv.bv_len;
> > >                       bvprv = bv;
> > > -                     bvprvp = &bv;
> > > +                     bvprvp = &bvprv;
> > >                       sectors += bv.bv_len >> 9;
> > >                       continue;
> > >               }
> > > @@ -108,7 +108,7 @@ new_segment:
> > > 
> > >               nsegs++;
> > >               bvprv = bv;
> > > -             bvprvp = &bv;
> > > +             bvprvp = &bvprv;
> > >               seg_size = bv.bv_len;
> > >               sectors += bv.bv_len >> 9;
> > >       }
> > 
> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
> 
> OK, looks there are still other bugs, care to share us how to reproduce
> it on arm64?
> 
> thanks,
> Ming

Unfortunately, the best reproducer I have is to boot the platform. I have seen the
BUG a few times post-boot, but I don't have a consistant reproducer. I am using
upstream 4.4-rc1 with this config:

  http://people.redhat.com/msalter/fh_defconfig

With 4.4-rc1 on an APM Mustang platform, I see the BUG about once every 6-7 boots.
On an AMD Seattle platform, about every 9 boots.

I have a script that loops through an ssh command to reboot the platform under test.
I manually install test kernels and then run the script and wait for failure. While
debugging, I have tried more minimal configs with which I have been unable to
reproduce the problem even after several hours of reboots. With the above mentioned
fh_defconfig, I have been able to get a failure within 20 or so boots with most
kernel builds but at certain kernel commits, the failure has taken a longer time to
reproduce.

From my POV, I can't say which commit causes the problem. So far, I have not been
able to reproduce at all before commit d9734e0d1ccf but I am currently trying to
reproduce with commit 0d51ce9ca1116 (one merge earlier than d9734e0d1ccf).
Laurent Dufour Nov. 23, 2015, 1:57 p.m. UTC | #4
On 23/11/2015 00:20, Mark Salter wrote:
> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>> On Sat, 21 Nov 2015 12:30:14 +0100
>> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>>
>>> On 20/11/2015 13:10, Michael Ellerman wrote:
>>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>>>
>>>>> It's pretty much guaranteed a block layer bug, most likely in the
>>>>> merge bios to request infrastucture where we don't obey the merging
>>>>> limits properly.
>>>>>
>>>>> Does either of you have a known good and first known bad kernel?
>>>>
>>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
>>>> 4.4-rc1.
>>>>
>>>> Laurent, can you narrow it down at all?
>>>
>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>> setup bi_phys_segments after splitting") which has been pulled by the
>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>> git://git.kernel.dk/linux-block").
>>>
>>> My system is panicing promptly when running a kernel built at
>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>> without panicing.
>>>
>>> This being said, I can't explain what's going wrong.
>>>
>>> May Ming shed some light here ?
>>
>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>> mind testing the following patch to see if it fixes your issue?
>>
>> ---
>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>> From: Ming Lei <ming.lei@canonical.com>
>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>> Subject: [PATCH] block: fix segment split
>>
>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>> always points to the iterator local variable, which is obviously
>> wrong, so fix it by pointing to the local variable of 'bvprv'.
>>
>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>> ---
>>  block/blk-merge.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>> index de5716d8..f2efe8a 100644
>> --- a/block/blk-merge.c
>> +++ b/block/blk-merge.c
>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>  
>>  			seg_size += bv.bv_len;
>>  			bvprv = bv;
>> -			bvprvp = &bv;
>> +			bvprvp = &bvprv;
>>  			sectors += bv.bv_len >> 9;
>>  			continue;
>>  		}
>> @@ -108,7 +108,7 @@ new_segment:
>>  
>>  		nsegs++;
>>  		bvprv = bv;
>> -		bvprvp = &bv;
>> +		bvprvp = &bvprv;
>>  		seg_size = bv.bv_len;
>>  		sectors += bv.bv_len >> 9;
>>  	}
> 
> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.

On my side, with the patch applied on top of 4.4-rc1, I can't get the
panic anymore.
Pratyush Anand Nov. 23, 2015, 3:13 p.m. UTC | #5
On 23/11/2015:02:57:19 PM, Laurent Dufour wrote:
> On 23/11/2015 00:20, Mark Salter wrote:
> > On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
> >> On Sat, 21 Nov 2015 12:30:14 +0100
> >> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
> >>
> >>> On 20/11/2015 13:10, Michael Ellerman wrote:
> >>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
> >>>>
> >>>>> It's pretty much guaranteed a block layer bug, most likely in the
> >>>>> merge bios to request infrastucture where we don't obey the merging
> >>>>> limits properly.
> >>>>>
> >>>>> Does either of you have a known good and first known bad kernel?
> >>>>
> >>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
> >>>> 4.4-rc1.
> >>>>
> >>>> Laurent, can you narrow it down at all?
> >>>
> >>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
> >>> setup bi_phys_segments after splitting") which has been pulled by the
> >>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
> >>> git://git.kernel.dk/linux-block").
> >>>
> >>> My system is panicing promptly when running a kernel built at
> >>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
> >>> without panicing.
> >>>
> >>> This being said, I can't explain what's going wrong.
> >>>
> >>> May Ming shed some light here ?
> >>
> >> Laurent, looks there is one bug in blk_bio_segment_split(), would you
> >> mind testing the following patch to see if it fixes your issue?
> >>
> >> ---
> >> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
> >> From: Ming Lei <ming.lei@canonical.com>
> >> Date: Sun, 22 Nov 2015 00:47:13 +0800
> >> Subject: [PATCH] block: fix segment split
> >>
> >> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
> >> always points to the iterator local variable, which is obviously
> >> wrong, so fix it by pointing to the local variable of 'bvprv'.
> >>
> >> Signed-off-by: Ming Lei <ming.lei@canonical.com>
> >> ---
> >>  block/blk-merge.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/block/blk-merge.c b/block/blk-merge.c
> >> index de5716d8..f2efe8a 100644
> >> --- a/block/blk-merge.c
> >> +++ b/block/blk-merge.c
> >> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
> >>  
> >>  			seg_size += bv.bv_len;
> >>  			bvprv = bv;
> >> -			bvprvp = &bv;
> >> +			bvprvp = &bvprv;
> >>  			sectors += bv.bv_len >> 9;
> >>  			continue;
> >>  		}
> >> @@ -108,7 +108,7 @@ new_segment:
> >>  
> >>  		nsegs++;
> >>  		bvprv = bv;
> >> -		bvprvp = &bv;
> >> +		bvprvp = &bvprv;
> >>  		seg_size = bv.bv_len;
> >>  		sectors += bv.bv_len >> 9;
> >>  	}
> > 
> > I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
> 
> On my side, with the patch applied on top of 4.4-rc1, I can't get the
> panic anymore.

git bisect shows:

bdced438acd83ad83a6c6fc7f50099b820245ddb is the first bad commit
commit bdced438acd83ad83a6c6fc7f50099b820245ddb
Author: Ming Lei <ming.lei@canonical.com>
Date:   Tue Oct 20 23:13:52 2015 +0800 

    block: setup bi_phys_segments after splitting

Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.

~Pratyush
Laurent Dufour Nov. 23, 2015, 3:20 p.m. UTC | #6
On 23/11/2015 16:13, Pratyush Anand wrote:
> On 23/11/2015:02:57:19 PM, Laurent Dufour wrote:
>> On 23/11/2015 00:20, Mark Salter wrote:
>>> On Sun, 2015-11-22 at 00:56 +0800, Ming Lei wrote:
>>>> On Sat, 21 Nov 2015 12:30:14 +0100
>>>> Laurent Dufour <ldufour@linux.vnet.ibm.com> wrote:
>>>>
>>>>> On 20/11/2015 13:10, Michael Ellerman wrote:
>>>>>> On Thu, 2015-11-19 at 00:23 -0800, Christoph Hellwig wrote:
>>>>>>
>>>>>>> It's pretty much guaranteed a block layer bug, most likely in the
>>>>>>> merge bios to request infrastucture where we don't obey the merging
>>>>>>> limits properly.
>>>>>>>
>>>>>>> Does either of you have a known good and first known bad kernel?
>>>>>>
>>>>>> Not me, I've only hit it one or two times. All I can say is I have hit it in
>>>>>> 4.4-rc1.
>>>>>>
>>>>>> Laurent, can you narrow it down at all?
>>>>>
>>>>> It seems that the panic is triggered by the commit bdced438acd8 ("block:
>>>>> setup bi_phys_segments after splitting") which has been pulled by the
>>>>> merge d9734e0d1ccf ("Merge branch 'for-4.4/core' of
>>>>> git://git.kernel.dk/linux-block").
>>>>>
>>>>> My system is panicing promptly when running a kernel built at
>>>>> d9734e0d1ccf, while reverting the commit bdced438acd8, it can run hours
>>>>> without panicing.
>>>>>
>>>>> This being said, I can't explain what's going wrong.
>>>>>
>>>>> May Ming shed some light here ?
>>>>
>>>> Laurent, looks there is one bug in blk_bio_segment_split(), would you
>>>> mind testing the following patch to see if it fixes your issue?
>>>>
>>>> ---
>>>> From 6fc701231dcc000bc8bc4b9105583380d9aa31f4 Mon Sep 17 00:00:00 2001
>>>> From: Ming Lei <ming.lei@canonical.com>
>>>> Date: Sun, 22 Nov 2015 00:47:13 +0800
>>>> Subject: [PATCH] block: fix segment split
>>>>
>>>> Inside blk_bio_segment_split(), previous bvec pointer('bvprvp')
>>>> always points to the iterator local variable, which is obviously
>>>> wrong, so fix it by pointing to the local variable of 'b
> ~Pratyush
> 
vprv'.
>>>>
>>>> Signed-off-by: Ming Lei <ming.lei@canonical.com>
>>>> ---
>>>>  block/blk-merge.c | 4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/block/blk-merge.c b/block/blk-merge.c
>>>> index de5716d8..f2efe8a 100644
>>>> --- a/block/blk-merge.c
>>>> +++ b/block/blk-merge.c
>>>> @@ -98,7 +98,7 @@ static struct bio *blk_bio_segment_split(struct request_queue *q,
>>>>  
>>>>  			seg_size += bv.bv_len;bdced438acd8
>>>>  			bvprv = bv;
>>>> -			bvprvp = &bv;
>>>> +			bvprvp = &bvprv;
>>>>  			sectors += bv.bv_len >> 9;
>>>>  			continue;
>>>>  		}
>>>> @@ -108,7 +108,7 @@ new_segment:
>>>>  
>>>>  		nsegs++;
>>>>  		bvprv = bv;
>>>> -		bvprvp = &bv;
>>>> +		bvprvp = &bvprv;
>>>>  		seg_size = bv.bv_len;
>>>>  		sectors += bv.bv_len >> 9;
>>>>  	}
>>>
>>> I'm still hitting the BUG even with this patch applied on top of 4.4-rc1.
>>
>> On my side, with the patch applied on top of 4.4-rc1, I can't get the
>> panic anymore.
> 
> git bisect shows:
> 
> bdced438acd83ad83a6c6fc7f50099b820245ddb is the first bad commit
> commit bdced438acd83ad83a6c6fc7f50099b820245ddb
> Author: Ming Lei <ming.lei@canonical.com>
> Date:   Tue Oct 20 23:13:52 2015 +0800 
> 
>     block: setup bi_phys_segments after splitting
> 
> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.

That's what I mentioned earlier ;)

Now Ming send an additional patch with seems to fix the bug introduced
through the commit bdced438acd8. When testing with this new patch I
can't get the panic anymore, but Mark reported he is still hitting it.
Ming Lei Nov. 23, 2015, 3:27 p.m. UTC | #7
On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
<ldufour@linux.vnet.ibm.com> wrote:
>>
>> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
>
> That's what I mentioned earlier ;)
>
> Now Ming send an additional patch with seems to fix the bug introduced
> through the commit bdced438acd8. When testing with this new patch I
> can't get the panic anymore, but Mark reported he is still hitting it.

Laurent, thanks for your test on the 1st patch, and looks there are at
least two problems, and my 2nd patch sent just now should address
Mark's issue which is caused by bdced438acd83a.

Once the 2nd one is tested OK, I will send out the two together.

Thanks,
Ming
Laurent Dufour Nov. 23, 2015, 4:24 p.m. UTC | #8
On 23/11/2015 16:27, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
> <ldufour@linux.vnet.ibm.com> wrote:
>>>
>>> Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
>>
>> That's what I mentioned earlier ;)
>>
>> Now Ming send an additional patch with seems to fix the bug introduced
>> through the commit bdced438acd8. When testing with this new patch I
>> can't get the panic anymore, but Mark reported he is still hitting it.
> 
> Laurent, thanks for your test on the 1st patch, and looks there are at
> least two problems, and my 2nd patch sent just now should address
> Mark's issue which is caused by bdced438acd83a.
> 
> Once the 2nd one is tested OK, I will send out the two together.

FWIW, I applied the 2nd patch, and my system is still running like a charm.

Cheers,
Laurent.
Mark Salter Nov. 24, 2015, 1:30 a.m. UTC | #9
On Mon, 2015-11-23 at 23:27 +0800, Ming Lei wrote:
> On Mon, Nov 23, 2015 at 11:20 PM, Laurent Dufour
> <ldufour@linux.vnet.ibm.com> wrote:
> > > 
> > > Reverting above commit on top if 4.4-rc1 seems to fix the problem for me.
> > 
> > That's what I mentioned earlier ;)
> > 
> > Now Ming send an additional patch with seems to fix the bug introduced
> > through the commit bdced438acd8. When testing with this new patch I
> > can't get the panic anymore, but Mark reported he is still hitting it.
> 
> Laurent, thanks for your test on the 1st patch, and looks there are at
> least two problems, and my 2nd patch sent just now should address
> Mark's issue which is caused by bdced438acd83a.
> 
> Once the 2nd one is tested OK, I will send out the two together.
> 
> Thanks,
> Ming

Thanks Ming.
With both patches applied, I have been unable to reproduce the problem.
diff mbox

Patch

diff --git a/block/blk-merge.c b/block/blk-merge.c
index de5716d8..f2efe8a 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -98,7 +98,7 @@  static struct bio *blk_bio_segment_split(struct request_queue *q,
 
 			seg_size += bv.bv_len;
 			bvprv = bv;
-			bvprvp = &bv;
+			bvprvp = &bvprv;
 			sectors += bv.bv_len >> 9;
 			continue;
 		}
@@ -108,7 +108,7 @@  new_segment:
 
 		nsegs++;
 		bvprv = bv;
-		bvprvp = &bv;
+		bvprvp = &bvprv;
 		seg_size = bv.bv_len;
 		sectors += bv.bv_len >> 9;
 	}