diff mbox

IO errors after "block: remove bio_get_nr_vecs()"

Message ID 20151221200721.GN4026@mtj.duckdns.org
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Tejun Heo Dec. 21, 2015, 8:07 p.m. UTC
Hello, Artem.

Can you please apply the following patch on top and see whether
anything changes?  If it does make the issue go away, can you please
revert the ".can_queue" part and test again?

Thanks.

---
 drivers/ata/ahci.h    |    2 +-
 drivers/ata/libahci.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Tejun Heo Dec. 21, 2015, 9:08 p.m. UTC | #1
Hello, again.

On Mon, Dec 21, 2015 at 03:07:21PM -0500, Tejun Heo wrote:
> Hello, Artem.
> 
> Can you please apply the following patch on top and see whether
> anything changes?  If it does make the issue go away, can you please
> revert the ".can_queue" part and test again?

If the patch doesn't change anything, can you please try the
followings and see which one makes difference?

1. Exclude memory above 4G line with boot param "max_addr=4G".
2. Disable highmem with "highmem=0".
3. Try booting 64bit kernel.

At the moment, the only thing I can think of which can explain the PAE
+ bio_get_nr_vecs() situation is that the bio split code which is
activated by the bio_get_nr_vecs() somehow messes up 64bit or high
addresses on 32bit kernels.  I scanned for the obvious but at bio
layer, memory is represented by struct page, so nothing obvious seems
broken.

Note that for now I'm ignoring the debug dumps from the ahci debug
patch which indicates that the passed in addresses are all fine.  It
is possible that the controller gets confused with certain receiving
addresses and reports failure on later commands or maybe there is a
different sequence of events which can encompass both that we don't
know of yet.

Thanks.
Kent Overstreet Dec. 22, 2015, 3:43 a.m. UTC | #2
I just reproduced it - Artem, I'll let you know when we have a possible fix but
hopefully there won't be any need for you to beat up your hardware any more :)

On Mon, Dec 21, 2015 at 04:08:11PM -0500, Tejun Heo wrote:
> Hello, again.
> 
> On Mon, Dec 21, 2015 at 03:07:21PM -0500, Tejun Heo wrote:
> > Hello, Artem.
> > 
> > Can you please apply the following patch on top and see whether
> > anything changes?  If it does make the issue go away, can you please
> > revert the ".can_queue" part and test again?
> 
> If the patch doesn't change anything, can you please try the
> followings and see which one makes difference?
> 
> 1. Exclude memory above 4G line with boot param "max_addr=4G".
> 2. Disable highmem with "highmem=0".
> 3. Try booting 64bit kernel.
> 
> At the moment, the only thing I can think of which can explain the PAE
> + bio_get_nr_vecs() situation is that the bio split code which is
> activated by the bio_get_nr_vecs() somehow messes up 64bit or high
> addresses on 32bit kernels.  I scanned for the obvious but at bio
> layer, memory is represented by struct page, so nothing obvious seems
> broken.
> 
> Note that for now I'm ignoring the debug dumps from the ahci debug
> patch which indicates that the passed in addresses are all fine.  It
> is possible that the controller gets confused with certain receiving
> addresses and reports failure on later commands or maybe there is a
> different sequence of events which can encompass both that we don't
> know of yet.

It repros pretty easily, I should be able to just write some test code that
sends down single specific IOs so no matter what the controller is doing we know
exactly what IO triggered the error.

I'm checking all the 64 bit/pae/etc. stuff now.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kent Overstreet Dec. 22, 2015, 3:59 a.m. UTC | #3
On Mon, Dec 21, 2015 at 04:08:11PM -0500, Tejun Heo wrote:

reproduced it with 32 bit pae:

> 1. Exclude memory above 4G line with boot param "max_addr=4G".

doesn't work - max_addr=1G doesn't work either

> 2. Disable highmem with "highmem=0".

works!

> 3. Try booting 64bit kernel.

works

Ok, so maybe it actually is PAE specific... but like you noted the block layer
works entirely in terms of pages so...

The one idea I can think of is - maybe BIOVEC_PHYS_MERGEABLE() is broken in PAE
mode? I am unfamiliar with anything PAE though.

Where does the ahci code consume the sglist?
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kent Overstreet Dec. 22, 2015, 4:45 a.m. UTC | #4
So what I _really_ want to know is - how the fuck is the actual ATA command
itself malformed?

You told me at one point that the error code indicated the controller was
claiming it overran the end of the sglist - well, if that's the case we ought to
be able to prove it with an assertion (I already tried; qc->nbytes does match
the sglist, checking from ata_sg_setup()).

Ignore bvec merging, PAE, all that crap - the controller doesn't know about any
of that. _WHAT_ are we feeding it that it doesn't like?

There shouldn't even be that much stuff to check, since in theory the only
possible thing that could be at fault is the sglist. Maybe they're too big, too
small, misaligned, too many of them, god knows what, but somehow the sglist
we're feeding the device has to be at fault, right?

But the sglists in Artem's debugging output look pretty uninteresting (one
of them has _no_ merged segments - it's just 18 4k pages - how could THAT be an
issue?)

Gonna apply your debugging patch and start throwing stuff at the wall next, I
guess...
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Artem S. Tashkinov Dec. 22, 2015, 5:10 a.m. UTC | #5
On 2015-12-22 01:07, Tejun Heo wrote:
> Hello, Artem.
> 
> Can you please apply the following patch on top and see whether
> anything changes?  If it does make the issue go away, can you please
> revert the ".can_queue" part and test again?
> 
> Thanks.
> 
> ---
>  drivers/ata/ahci.h    |    2 +-
>  drivers/ata/libahci.c |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/drivers/ata/ahci.h
> +++ b/drivers/ata/ahci.h
> @@ -365,7 +365,7 @@ extern struct device_attribute *ahci_sde
>   */
>  #define AHCI_SHT(drv_name)						\
>  	ATA_NCQ_SHT(drv_name),						\
> -	.can_queue		= AHCI_MAX_CMDS - 1,			\
> +	.can_queue		= 1/*AHCI_MAX_CMDS - 1*/,		\
>  	.sg_tablesize		= AHCI_MAX_SG,				\
>  	.dma_boundary		= AHCI_DMA_BOUNDARY,			\
>  	.shost_attrs		= ahci_shost_attrs,			\
> --- a/drivers/ata/libahci.c
> +++ b/drivers/ata/libahci.c
> @@ -420,7 +420,7 @@ void ahci_save_initial_config(struct dev
>  		hpriv->saved_cap2 = cap2 = 0;
> 
>  	/* some chips have errata preventing 64bit use */
> -	if ((cap & HOST_CAP_64) && (hpriv->flags & AHCI_HFLAG_32BIT_ONLY)) {
> +	if ((cap & HOST_CAP_64)/* && (hpriv->flags & 
> AHCI_HFLAG_32BIT_ONLY)*/) {
>  		dev_info(dev, "controller can't do 64bit DMA, forcing 32bit\n");
>  		cap &= ~HOST_CAP_64;
>  	}

This patch fixes the issue for me. Now rechecking without .can_queue 
part.

BTW, since I left debugging on, here's the part you wanted:

[    0.613851] XXX port 0 dma_sz=91392 mem=c0020000 mem_dma=00020000 
cmd_slot=0 rx_fis=1024 cmd_tbl=1280
[    0.613865] XXX port 1 dma_sz=91392 mem=eea00000 mem_dma=2ea00000 
cmd_slot=0 rx_fis=1024 cmd_tbl=1280
[    0.620464] XXX port 2 dma_sz=91392 mem=eea20000 mem_dma=2ea20000 
cmd_slot=0 rx_fis=1024 cmd_tbl=1280
[    0.627121] XXX port 3 dma_sz=91392 mem=eea40000 mem_dma=2ea40000 
cmd_slot=0 rx_fis=1024 cmd_tbl=1280
[    0.633791] XXX port 4 dma_sz=91392 mem=eea60000 mem_dma=2ea60000 
cmd_slot=0 rx_fis=1024 cmd_tbl=1280
[    0.640445] XXX port 5 dma_sz=91392 mem=eea80000 mem_dma=2ea80000 
cmd_slot=0 rx_fis=1024 cmd_tbl=1280

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Artem S. Tashkinov Dec. 22, 2015, 5:20 a.m. UTC | #6
On 2015-12-22 01:07, Tejun Heo wrote:
> Hello, Artem.
> 
> Can you please apply the following patch on top and see whether
> anything changes?  If it does make the issue go away, can you please
> revert the ".can_queue" part and test again?
> 
> Thanks.
> 
> ---
>  drivers/ata/ahci.h    |    2 +-
>  drivers/ata/libahci.c |    2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> --- a/drivers/ata/ahci.h
> +++ b/drivers/ata/ahci.h
> @@ -365,7 +365,7 @@ extern struct device_attribute *ahci_sde
>   */
>  #define AHCI_SHT(drv_name)						\
>  	ATA_NCQ_SHT(drv_name),						\
> -	.can_queue		= AHCI_MAX_CMDS - 1,			\
> +	.can_queue		= 1/*AHCI_MAX_CMDS - 1*/,		\
>  	.sg_tablesize		= AHCI_MAX_SG,				\
>  	.dma_boundary		= AHCI_DMA_BOUNDARY,			\
>  	.shost_attrs		= ahci_shost_attrs,			\
> --- a/drivers/ata/libahci.c
> +++ b/drivers/ata/libahci.c
> @@ -420,7 +420,7 @@ void ahci_save_initial_config(struct dev
>  		hpriv->saved_cap2 = cap2 = 0;
> 
>  	/* some chips have errata preventing 64bit use */
> -	if ((cap & HOST_CAP_64) && (hpriv->flags & AHCI_HFLAG_32BIT_ONLY)) {
> +	if ((cap & HOST_CAP_64)/* && (hpriv->flags & 
> AHCI_HFLAG_32BIT_ONLY)*/) {
>  		dev_info(dev, "controller can't do 64bit DMA, forcing 32bit\n");
>  		cap &= ~HOST_CAP_64;
>  	}

With the ".can_queue" part left intact the bug resurfaced. Full dmesg 
output is attached.
Kent Overstreet Dec. 22, 2015, 5:37 a.m. UTC | #7
On Tue, Dec 22, 2015 at 05:26:12AM +0000, Junichi Nomura wrote:
> On 12/22/15 12:59, Kent Overstreet wrote:
> > reproduced it with 32 bit pae:
> > 
> >> 1. Exclude memory above 4G line with boot param "max_addr=4G".
> > 
> > doesn't work - max_addr=1G doesn't work either
> > 
> >> 2. Disable highmem with "highmem=0".
> > 
> > works!
> > 
> >> 3. Try booting 64bit kernel.
> > 
> > works
> 
> blk_queue_bio() does split then bounce, which makes the segment
> counting based on pages before bouncing and could go wrong.
> 
> What do you think of a patch like this?

Shit, you nailed it. Can't believe I didn't think to check that.

> 
> -- 
> Jun'ichi Nomura, NEC Corporation
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 5131993b..1d1c3c7 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1689,8 +1689,6 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
>  	struct request *req;
>  	unsigned int request_count = 0;
>  
> -	blk_queue_split(q, &bio, q->bio_split);
> -
>  	/*
>  	 * low level driver can indicate that it wants pages above a
>  	 * certain limit bounced to low memory (ie for highmem, or even
> @@ -1698,6 +1696,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
>  	 */
>  	blk_queue_bounce(q, &bio);
>  
> +	blk_queue_split(q, &bio, q->bio_split);
> +
>  	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
>  		bio->bi_error = -EIO;
>  		bio_endio(bio);
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kent Overstreet Dec. 22, 2015, 5:38 a.m. UTC | #8
On Tue, Dec 22, 2015 at 05:26:12AM +0000, Junichi Nomura wrote:
> On 12/22/15 12:59, Kent Overstreet wrote:
> > reproduced it with 32 bit pae:
> > 
> >> 1. Exclude memory above 4G line with boot param "max_addr=4G".
> > 
> > doesn't work - max_addr=1G doesn't work either
> > 
> >> 2. Disable highmem with "highmem=0".
> > 
> > works!
> > 
> >> 3. Try booting 64bit kernel.
> > 
> > works
> 
> blk_queue_bio() does split then bounce, which makes the segment
> counting based on pages before bouncing and could go wrong.
> 
> What do you think of a patch like this?

Artem, can you give this patch a try?

> 
> -- 
> Jun'ichi Nomura, NEC Corporation
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 5131993b..1d1c3c7 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1689,8 +1689,6 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
>  	struct request *req;
>  	unsigned int request_count = 0;
>  
> -	blk_queue_split(q, &bio, q->bio_split);
> -
>  	/*
>  	 * low level driver can indicate that it wants pages above a
>  	 * certain limit bounced to low memory (ie for highmem, or even
> @@ -1698,6 +1696,8 @@ static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
>  	 */
>  	blk_queue_bounce(q, &bio);
>  
> +	blk_queue_split(q, &bio, q->bio_split);
> +
>  	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
>  		bio->bi_error = -EIO;
>  		bio_endio(bio);
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Artem S. Tashkinov Dec. 22, 2015, 5:52 a.m. UTC | #9
On 2015-12-22 10:38, Kent Overstreet wrote:
> On Tue, Dec 22, 2015 at 05:26:12AM +0000, Junichi Nomura wrote:
>> On 12/22/15 12:59, Kent Overstreet wrote:
>> > reproduced it with 32 bit pae:
>> >
>> >> 1. Exclude memory above 4G line with boot param "max_addr=4G".
>> >
>> > doesn't work - max_addr=1G doesn't work either
>> >
>> >> 2. Disable highmem with "highmem=0".
>> >
>> > works!
>> >
>> >> 3. Try booting 64bit kernel.
>> >
>> > works
>> 
>> blk_queue_bio() does split then bounce, which makes the segment
>> counting based on pages before bouncing and could go wrong.
>> 
>> What do you think of a patch like this?
> 
> Artem, can you give this patch a try?


This patch ostensibly fixes the issue - at least I cannot immediately 
reproduce it. You can count me in as "Tested-by: Artem S. Tashkinov"

> 
>> 
>> --
>> Jun'ichi Nomura, NEC Corporation
>> 
>> diff --git a/block/blk-core.c b/block/blk-core.c
>> index 5131993b..1d1c3c7 100644
>> --- a/block/blk-core.c
>> +++ b/block/blk-core.c
>> @@ -1689,8 +1689,6 @@ static blk_qc_t blk_queue_bio(struct 
>> request_queue *q, struct bio *bio)
>>  	struct request *req;
>>  	unsigned int request_count = 0;
>> 
>> -	blk_queue_split(q, &bio, q->bio_split);
>> -
>>  	/*
>>  	 * low level driver can indicate that it wants pages above a
>>  	 * certain limit bounced to low memory (ie for highmem, or even
>> @@ -1698,6 +1696,8 @@ static blk_qc_t blk_queue_bio(struct 
>> request_queue *q, struct bio *bio)
>>  	 */
>>  	blk_queue_bounce(q, &bio);
>> 
>> +	blk_queue_split(q, &bio, q->bio_split);
>> +
>>  	if (bio_integrity_enabled(bio) && bio_integrity_prep(bio)) {
>>  		bio->bi_error = -EIO;
>>  		bio_endio(bio);
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kent Overstreet Dec. 22, 2015, 5:55 a.m. UTC | #10
On Tue, Dec 22, 2015 at 10:52:37AM +0500, Artem S. Tashkinov wrote:
> On 2015-12-22 10:38, Kent Overstreet wrote:
> >On Tue, Dec 22, 2015 at 05:26:12AM +0000, Junichi Nomura wrote:
> >>On 12/22/15 12:59, Kent Overstreet wrote:
> >>> reproduced it with 32 bit pae:
> >>>
> >>>> 1. Exclude memory above 4G line with boot param "max_addr=4G".
> >>>
> >>> doesn't work - max_addr=1G doesn't work either
> >>>
> >>>> 2. Disable highmem with "highmem=0".
> >>>
> >>> works!
> >>>
> >>>> 3. Try booting 64bit kernel.
> >>>
> >>> works
> >>
> >>blk_queue_bio() does split then bounce, which makes the segment
> >>counting based on pages before bouncing and could go wrong.
> >>
> >>What do you think of a patch like this?
> >
> >Artem, can you give this patch a try?
> 
> 
> This patch ostensibly fixes the issue - at least I cannot immediately
> reproduce it. You can count me in as "Tested-by: Artem S. Tashkinov"

Let's all contemplate the fact that blk_segment_map_sg() _overrunning the end of
the provided sglist_ was this much of a clusterfuck to debug.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Artem S. Tashkinov Dec. 22, 2015, 5:59 a.m. UTC | #11
On 2015-12-22 10:55, Kent Overstreet wrote:
> On Tue, Dec 22, 2015 at 10:52:37AM +0500, Artem S. Tashkinov wrote:
>> On 2015-12-22 10:38, Kent Overstreet wrote:
>> >On Tue, Dec 22, 2015 at 05:26:12AM +0000, Junichi Nomura wrote:
>> >>On 12/22/15 12:59, Kent Overstreet wrote:
>> >>> reproduced it with 32 bit pae:
>> >>>
>> >>>> 1. Exclude memory above 4G line with boot param "max_addr=4G".
>> >>>
>> >>> doesn't work - max_addr=1G doesn't work either
>> >>>
>> >>>> 2. Disable highmem with "highmem=0".
>> >>>
>> >>> works!
>> >>>
>> >>>> 3. Try booting 64bit kernel.
>> >>>
>> >>> works
>> >>
>> >>blk_queue_bio() does split then bounce, which makes the segment
>> >>counting based on pages before bouncing and could go wrong.
>> >>
>> >>What do you think of a patch like this?
>> >
>> >Artem, can you give this patch a try?
>> 
>> 
>> This patch ostensibly fixes the issue - at least I cannot immediately
>> reproduce it. You can count me in as "Tested-by: Artem S. Tashkinov"
> 
> Let's all contemplate the fact that blk_segment_map_sg() _overrunning 
> the end of
> the provided sglist_ was this much of a clusterfuck to debug.

 From the look of it this fix has nothing to do with PAE, so then why 
only PAE users like me were affected by the original 
(b54ffb73cadcdcff9cc1ae0e11f502407e3e2e4c) patch?
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kent Overstreet Dec. 22, 2015, 6:02 a.m. UTC | #12
On Tue, Dec 22, 2015 at 10:59:09AM +0500, Artem S. Tashkinov wrote:
> On 2015-12-22 10:55, Kent Overstreet wrote:
> >On Tue, Dec 22, 2015 at 10:52:37AM +0500, Artem S. Tashkinov wrote:
> >>On 2015-12-22 10:38, Kent Overstreet wrote:
> >>>On Tue, Dec 22, 2015 at 05:26:12AM +0000, Junichi Nomura wrote:
> >>>>On 12/22/15 12:59, Kent Overstreet wrote:
> >>>>> reproduced it with 32 bit pae:
> >>>>>
> >>>>>> 1. Exclude memory above 4G line with boot param "max_addr=4G".
> >>>>>
> >>>>> doesn't work - max_addr=1G doesn't work either
> >>>>>
> >>>>>> 2. Disable highmem with "highmem=0".
> >>>>>
> >>>>> works!
> >>>>>
> >>>>>> 3. Try booting 64bit kernel.
> >>>>>
> >>>>> works
> >>>>
> >>>>blk_queue_bio() does split then bounce, which makes the segment
> >>>>counting based on pages before bouncing and could go wrong.
> >>>>
> >>>>What do you think of a patch like this?
> >>>
> >>>Artem, can you give this patch a try?
> >>
> >>
> >>This patch ostensibly fixes the issue - at least I cannot immediately
> >>reproduce it. You can count me in as "Tested-by: Artem S. Tashkinov"
> >
> >Let's all contemplate the fact that blk_segment_map_sg() _overrunning the
> >end of
> >the provided sglist_ was this much of a clusterfuck to debug.
> 
> From the look of it this fix has nothing to do with PAE, so then why only
> PAE users like me were affected by the original
> (b54ffb73cadcdcff9cc1ae0e11f502407e3e2e4c) patch?

The amusing thing is that I doubt PAE actually requires bouncing - addressing
limits come from the device, not the cpu.

But evidently in PAE mode, the block layer is in fact bouncing bios. Probably
from some default setting in the queue limits that no one ever looks at.

The whole queue limits design is an atrocity, it leads to exactly this kind of
crap where no one can predict the actual behaviour of any given setup.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/ata/ahci.h
+++ b/drivers/ata/ahci.h
@@ -365,7 +365,7 @@  extern struct device_attribute *ahci_sde
  */
 #define AHCI_SHT(drv_name)						\
 	ATA_NCQ_SHT(drv_name),						\
-	.can_queue		= AHCI_MAX_CMDS - 1,			\
+	.can_queue		= 1/*AHCI_MAX_CMDS - 1*/,		\
 	.sg_tablesize		= AHCI_MAX_SG,				\
 	.dma_boundary		= AHCI_DMA_BOUNDARY,			\
 	.shost_attrs		= ahci_shost_attrs,			\
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -420,7 +420,7 @@  void ahci_save_initial_config(struct dev
 		hpriv->saved_cap2 = cap2 = 0;
 
 	/* some chips have errata preventing 64bit use */
-	if ((cap & HOST_CAP_64) && (hpriv->flags & AHCI_HFLAG_32BIT_ONLY)) {
+	if ((cap & HOST_CAP_64)/* && (hpriv->flags & AHCI_HFLAG_32BIT_ONLY)*/) {
 		dev_info(dev, "controller can't do 64bit DMA, forcing 32bit\n");
 		cap &= ~HOST_CAP_64;
 	}