diff mbox series

[1/3] block: Respect the device's maximum segment size

Message ID 20190909125658.30559-2-thierry.reding@gmail.com
State Deferred
Headers show
Series mmc: Fix scatter/gather on SDHCI | expand

Commit Message

Thierry Reding Sept. 9, 2019, 12:56 p.m. UTC
From: Thierry Reding <treding@nvidia.com>

When enabling the DMA map merging capability for a queue, ensure that
the maximum segment size does not exceed the device's limit.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 block/blk-settings.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

Comments

Christoph Hellwig Sept. 9, 2019, 4:13 p.m. UTC | #1
On Mon, Sep 09, 2019 at 02:56:56PM +0200, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> When enabling the DMA map merging capability for a queue, ensure that
> the maximum segment size does not exceed the device's limit.

We can't do that unfortunately.  If we use the virt_boundary setting
we do aggressive merges that there is no accounting for.  So we can't
limit the segment size.

And at least for the case how devices usually do the addressing
(page based on not sgl based) that needs the virt_boundary there isn't
really any concept like a segment anyway.
Thierry Reding Sept. 9, 2019, 7:19 p.m. UTC | #2
On Mon, Sep 09, 2019 at 06:13:31PM +0200, Christoph Hellwig wrote:
> On Mon, Sep 09, 2019 at 02:56:56PM +0200, Thierry Reding wrote:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > When enabling the DMA map merging capability for a queue, ensure that
> > the maximum segment size does not exceed the device's limit.
> 
> We can't do that unfortunately.  If we use the virt_boundary setting
> we do aggressive merges that there is no accounting for.  So we can't
> limit the segment size.

But that's kind of the point here. My understanding is that the maximum
segment size in the device's DMA parameters defines the maximum size of
the segment that the device can handle.

In the particular case that I'm trying to fix, according to the SDHCI
specification, these devices can handle segments that are a maximum of
64 KiB in size. If we allow that segment size to be exceeded, the device
will no longer be able to handle them.

> And at least for the case how devices usually do the addressing
> (page based on not sgl based) that needs the virt_boundary there isn't
> really any concept like a segment anyway.

I do understand that aspect of it. However, devices that do the
addressing this way, wouldn't they want to set the maximum segment size
to something large (like UINT_MAX, which many users that don't have the
concept of a segment already do)?

If you disagree, do you have any alternative proposals other than
reverting the offending patch? linux-next is currently broken on all
systems where the SDHCI controller is behind an IOMMU.

Thierry
Yoshihiro Shimoda Sept. 10, 2019, 2:03 a.m. UTC | #3
Hi Thierry,

> From: Thierry Reding, Sent: Tuesday, September 10, 2019 4:19 AM
> 
> On Mon, Sep 09, 2019 at 06:13:31PM +0200, Christoph Hellwig wrote:
> > On Mon, Sep 09, 2019 at 02:56:56PM +0200, Thierry Reding wrote:
> > > From: Thierry Reding <treding@nvidia.com>
> > >
> > > When enabling the DMA map merging capability for a queue, ensure that
> > > the maximum segment size does not exceed the device's limit.
> >
> > We can't do that unfortunately.  If we use the virt_boundary setting
> > we do aggressive merges that there is no accounting for.  So we can't
> > limit the segment size.
> 
> But that's kind of the point here. My understanding is that the maximum
> segment size in the device's DMA parameters defines the maximum size of
> the segment that the device can handle.
> 
> In the particular case that I'm trying to fix, according to the SDHCI
> specification, these devices can handle segments that are a maximum of
> 64 KiB in size. If we allow that segment size to be exceeded, the device
> will no longer be able to handle them.
> 
> > And at least for the case how devices usually do the addressing
> > (page based on not sgl based) that needs the virt_boundary there isn't
> > really any concept like a segment anyway.
> 
> I do understand that aspect of it. However, devices that do the
> addressing this way, wouldn't they want to set the maximum segment size
> to something large (like UINT_MAX, which many users that don't have the
> concept of a segment already do)?
> 
> If you disagree, do you have any alternative proposals other than
> reverting the offending patch? linux-next is currently broken on all
> systems where the SDHCI controller is behind an IOMMU.

I'm sorry for causing this trouble on your environment. I have a proposal to
resolve this issue. The mmc_host struct will have a new caps2 flag
like MMC_CAP2_MERGE_CAPABLE and add a condition into the queue.c like below.

+	if (host->caps2 & MMC_CAP2_MERGE_CAPABLE &&
+	    host->max_segs < MMC_DMA_MAP_MERGE_SEGMENTS &&
	    dma_get_merge_boundary(mmc_dev(host)))
		host->can_dma_map_merge = 1;
	else
		host->can_dma_map_merge = 0;

After that, all mmc controllers disable the feature as default, and if a mmc
controller has such capable, the host driver should set the flag.
Ulf, is such a patch acceptable for v5.4-rcN? Anyway, I'll submit such a patch later.

Best regards,
Yoshihiro Shimoda

> Thierry
Christoph Hellwig Sept. 10, 2019, 6:13 a.m. UTC | #4
On Tue, Sep 10, 2019 at 02:03:17AM +0000, Yoshihiro Shimoda wrote:
> I'm sorry for causing this trouble on your environment. I have a proposal to
> resolve this issue. The mmc_host struct will have a new caps2 flag
> like MMC_CAP2_MERGE_CAPABLE and add a condition into the queue.c like below.
> 
> +	if (host->caps2 & MMC_CAP2_MERGE_CAPABLE &&
> +	    host->max_segs < MMC_DMA_MAP_MERGE_SEGMENTS &&
> 	    dma_get_merge_boundary(mmc_dev(host)))
> 		host->can_dma_map_merge = 1;
> 	else
> 		host->can_dma_map_merge = 0;
> 
> After that, all mmc controllers disable the feature as default, and if a mmc
> controller has such capable, the host driver should set the flag.

That sounds sensible to me.  Alternatively we'd have to limit
max_sectors to 16-bit values for sdhci if using an iommu that can
merge.
Thierry Reding Sept. 10, 2019, 7:30 a.m. UTC | #5
On Tue, Sep 10, 2019 at 02:03:17AM +0000, Yoshihiro Shimoda wrote:
> Hi Thierry,
> 
> > From: Thierry Reding, Sent: Tuesday, September 10, 2019 4:19 AM
> > 
> > On Mon, Sep 09, 2019 at 06:13:31PM +0200, Christoph Hellwig wrote:
> > > On Mon, Sep 09, 2019 at 02:56:56PM +0200, Thierry Reding wrote:
> > > > From: Thierry Reding <treding@nvidia.com>
> > > >
> > > > When enabling the DMA map merging capability for a queue, ensure that
> > > > the maximum segment size does not exceed the device's limit.
> > >
> > > We can't do that unfortunately.  If we use the virt_boundary setting
> > > we do aggressive merges that there is no accounting for.  So we can't
> > > limit the segment size.
> > 
> > But that's kind of the point here. My understanding is that the maximum
> > segment size in the device's DMA parameters defines the maximum size of
> > the segment that the device can handle.
> > 
> > In the particular case that I'm trying to fix, according to the SDHCI
> > specification, these devices can handle segments that are a maximum of
> > 64 KiB in size. If we allow that segment size to be exceeded, the device
> > will no longer be able to handle them.
> > 
> > > And at least for the case how devices usually do the addressing
> > > (page based on not sgl based) that needs the virt_boundary there isn't
> > > really any concept like a segment anyway.
> > 
> > I do understand that aspect of it. However, devices that do the
> > addressing this way, wouldn't they want to set the maximum segment size
> > to something large (like UINT_MAX, which many users that don't have the
> > concept of a segment already do)?
> > 
> > If you disagree, do you have any alternative proposals other than
> > reverting the offending patch? linux-next is currently broken on all
> > systems where the SDHCI controller is behind an IOMMU.
> 
> I'm sorry for causing this trouble on your environment. I have a proposal to
> resolve this issue. The mmc_host struct will have a new caps2 flag
> like MMC_CAP2_MERGE_CAPABLE and add a condition into the queue.c like below.
> 
> +	if (host->caps2 & MMC_CAP2_MERGE_CAPABLE &&
> +	    host->max_segs < MMC_DMA_MAP_MERGE_SEGMENTS &&
> 	    dma_get_merge_boundary(mmc_dev(host)))
> 		host->can_dma_map_merge = 1;
> 	else
> 		host->can_dma_map_merge = 0;
> 
> After that, all mmc controllers disable the feature as default, and if a mmc
> controller has such capable, the host driver should set the flag.
> Ulf, is such a patch acceptable for v5.4-rcN? Anyway, I'll submit such a patch later.

I'm sure that would work, but I think that's missing the point. It's not
that the setup isn't capable of merging at all. It just can't deal with
segments that are too large.

While I was debugging this, I was frequently seeing cases where the SG
was on the order of 40 entries initially and after dma_map_sg() it was
reduced to just 4 or 5.

So I think merging is still really useful if a setup supports it via an
IOMMU. I'm just not sure I see why we can't make the code respect what-
ever the maximum segment size that the driver may have configured. That
seems like it should allow us to get the best of both worlds.

Thierry
Thierry Reding Sept. 10, 2019, 7:37 a.m. UTC | #6
On Tue, Sep 10, 2019 at 08:13:48AM +0200, Christoph Hellwig wrote:
> On Tue, Sep 10, 2019 at 02:03:17AM +0000, Yoshihiro Shimoda wrote:
> > I'm sorry for causing this trouble on your environment. I have a proposal to
> > resolve this issue. The mmc_host struct will have a new caps2 flag
> > like MMC_CAP2_MERGE_CAPABLE and add a condition into the queue.c like below.
> > 
> > +	if (host->caps2 & MMC_CAP2_MERGE_CAPABLE &&
> > +	    host->max_segs < MMC_DMA_MAP_MERGE_SEGMENTS &&
> > 	    dma_get_merge_boundary(mmc_dev(host)))
> > 		host->can_dma_map_merge = 1;
> > 	else
> > 		host->can_dma_map_merge = 0;
> > 
> > After that, all mmc controllers disable the feature as default, and if a mmc
> > controller has such capable, the host driver should set the flag.
> 
> That sounds sensible to me.  Alternatively we'd have to limit
> max_sectors to 16-bit values for sdhci if using an iommu that can
> merge.

Isn't that effectively what dma_set_max_seg_size() is supposed to be
doing? That tells the DMA API what the maximum size of a segment can
be for the given device, right? If we make sure never to exceed that
when compacting the SG, the SG that we get back should map just fine
into the descriptors that SDHCI supports.

Now, for devices that support larger segments (or in fact no concept
of segments at all), if we set the maximum segment size to something
really big, isn't that going to automatically allow the huge, merged
segments that the original patch intended?

To me this sounds simply like an issue of the queue code thinking it
knows better than the driver and just overriding the maximum segment
size. Isn't that the real bug here that needs to be fixed?

Thierry
Yoshihiro Shimoda Sept. 11, 2019, 7:23 a.m. UTC | #7
Hi Thierry,

> From: Thierry Reding, Sent: Tuesday, September 10, 2019 4:31 PM
<snip>
> On Tue, Sep 10, 2019 at 02:03:17AM +0000, Yoshihiro Shimoda wrote:
> > Hi Thierry,
> >
> > > From: Thierry Reding, Sent: Tuesday, September 10, 2019 4:19 AM
> > >
> > > On Mon, Sep 09, 2019 at 06:13:31PM +0200, Christoph Hellwig wrote:
> > > > On Mon, Sep 09, 2019 at 02:56:56PM +0200, Thierry Reding wrote:
> > > > > From: Thierry Reding <treding@nvidia.com>
> > > > >
> > > > > When enabling the DMA map merging capability for a queue, ensure that
> > > > > the maximum segment size does not exceed the device's limit.
> > > >
> > > > We can't do that unfortunately.  If we use the virt_boundary setting
> > > > we do aggressive merges that there is no accounting for.  So we can't
> > > > limit the segment size.
> > >
> > > But that's kind of the point here. My understanding is that the maximum
> > > segment size in the device's DMA parameters defines the maximum size of
> > > the segment that the device can handle.
> > >
> > > In the particular case that I'm trying to fix, according to the SDHCI
> > > specification, these devices can handle segments that are a maximum of
> > > 64 KiB in size. If we allow that segment size to be exceeded, the device
> > > will no longer be able to handle them.
> > >
> > > > And at least for the case how devices usually do the addressing
> > > > (page based on not sgl based) that needs the virt_boundary there isn't
> > > > really any concept like a segment anyway.
> > >
> > > I do understand that aspect of it. However, devices that do the
> > > addressing this way, wouldn't they want to set the maximum segment size
> > > to something large (like UINT_MAX, which many users that don't have the
> > > concept of a segment already do)?
> > >
> > > If you disagree, do you have any alternative proposals other than
> > > reverting the offending patch? linux-next is currently broken on all
> > > systems where the SDHCI controller is behind an IOMMU.
> >
> > I'm sorry for causing this trouble on your environment. I have a proposal to
> > resolve this issue. The mmc_host struct will have a new caps2 flag
> > like MMC_CAP2_MERGE_CAPABLE and add a condition into the queue.c like below.
> >
> > +	if (host->caps2 & MMC_CAP2_MERGE_CAPABLE &&
> > +	    host->max_segs < MMC_DMA_MAP_MERGE_SEGMENTS &&
> > 	    dma_get_merge_boundary(mmc_dev(host)))
> > 		host->can_dma_map_merge = 1;
> > 	else
> > 		host->can_dma_map_merge = 0;
> >
> > After that, all mmc controllers disable the feature as default, and if a mmc
> > controller has such capable, the host driver should set the flag.
> > Ulf, is such a patch acceptable for v5.4-rcN? Anyway, I'll submit such a patch later.
> 
> I'm sure that would work, but I think that's missing the point. It's not
> that the setup isn't capable of merging at all. It just can't deal with
> segments that are too large.

IIUC, since SDHCI has a strictly 64 KiB limitation on each segment,
the controller cannot handle the following example 1 case on the plain next-20190904.

For example 1:
 - Original scatter lists are 4 segments:
  sg[0]: .dma_address = 0x12340000, .length = 65536,
  sg[1]: .dma_address = 0x12350000, .length = 65536,
  sg[2]: .dma_address = 0x12360000, .length = 65536,
  sg[3]: .dma_address = 0x12370000, .length = 65536,

 - Merging the above segments will be a segment:
  sg[0]: .dma_address = 0x12340000, .length = 262144,

> While I was debugging this, I was frequently seeing cases where the SG
> was on the order of 40 entries initially and after dma_map_sg() it was
> reduced to just 4 or 5.

If each segment size is small, it can merge them.

For example 2:
 - Original scatter lists are 4 segments:
  sg[0]: .dma_address = 0x12340000, .length = 4096,
  sg[1]: .dma_address = 0x12341000, .length = 4096,
  sg[2]: .dma_address = 0x12342000, .length = 4096,
  sg[3]: .dma_address = 0x12343000, .length = 4096,

 - Merging the above segments will be a segment:
  sg[0]: .dma_address = 0x12340000, .length = 16384,

> So I think merging is still really useful if a setup supports it via an
> IOMMU. I'm just not sure I see why we can't make the code respect what-
> ever the maximum segment size that the driver may have configured. That
> seems like it should allow us to get the best of both worlds.

I agree about merging is useful for the case of the "example 2".

By the way, I checked dma-iommu.c ,and then I found the __finalise_sg() has
a condition "seg_mask" that is from dma_get_seg_boundary(). So, I'm guessing
if the sdhci.c calls dma_set_seg_boundary() with 0x0000ffff, the issue disappears.
This is because the dma-iommu.c will not merge the segments even if the case of
example 1. What do you think?

Best regards,
Yoshihiro Shimoda

> Thierry
Christoph Hellwig Sept. 11, 2019, 10:36 a.m. UTC | #8
On Tue, Sep 10, 2019 at 09:37:39AM +0200, Thierry Reding wrote:
> > > After that, all mmc controllers disable the feature as default, and if a mmc
> > > controller has such capable, the host driver should set the flag.
> > 
> > That sounds sensible to me.  Alternatively we'd have to limit
> > max_sectors to 16-bit values for sdhci if using an iommu that can
> > merge.
> 
> Isn't that effectively what dma_set_max_seg_size() is supposed to be
> doing? That tells the DMA API what the maximum size of a segment can
> be for the given device, right? If we make sure never to exceed that
> when compacting the SG, the SG that we get back should map just fine
> into the descriptors that SDHCI supports.

dma_set_max_seg_size() does indeed instruct the iommu drivers about
the merging capabilities (btw, swiotlb should be able to implement
this kind of merging as well, but that is a different discussion).

But the problem is that you don't just change the dma_set_max_seg_size,
but also the block layer max segment size setting, which is used for
block layer merges.  And we don't have the accounting for the first and
last segment in a request (those that are being merged to), so if you
enable the virt_boundary segments can grow to a size only limited by the
maximum request size.  We could add that accounting with a bit of
work, it's just that for devices that typicall use the virt boundary
there is no point (their actually segment is a page and not related
to the Linux "segment").
Christoph Hellwig Sept. 11, 2019, 10:37 a.m. UTC | #9
On Tue, Sep 10, 2019 at 02:03:17AM +0000, Yoshihiro Shimoda wrote:
> I'm sorry for causing this trouble on your environment. I have a proposal to
> resolve this issue. The mmc_host struct will have a new caps2 flag
> like MMC_CAP2_MERGE_CAPABLE and add a condition into the queue.c like below.
> 
> +	if (host->caps2 & MMC_CAP2_MERGE_CAPABLE &&
> +	    host->max_segs < MMC_DMA_MAP_MERGE_SEGMENTS &&
> 	    dma_get_merge_boundary(mmc_dev(host)))
> 		host->can_dma_map_merge = 1;
> 	else
> 		host->can_dma_map_merge = 0;
> 
> After that, all mmc controllers disable the feature as default, and if a mmc
> controller has such capable, the host driver should set the flag.
> Ulf, is such a patch acceptable for v5.4-rcN? Anyway, I'll submit such a patch later.

FYI, I'd love to see such a patch, even if only as a temporary band-aid
so that I don't have to revert the series before sending the pull
request to Linus.
Ming Lei Sept. 12, 2019, 12:57 a.m. UTC | #10
On Mon, Sep 09, 2019 at 06:13:31PM +0200, Christoph Hellwig wrote:
> On Mon, Sep 09, 2019 at 02:56:56PM +0200, Thierry Reding wrote:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > When enabling the DMA map merging capability for a queue, ensure that
> > the maximum segment size does not exceed the device's limit.
> 
> We can't do that unfortunately.  If we use the virt_boundary setting
> we do aggressive merges that there is no accounting for.  So we can't
> limit the segment size.

Could you explain a bit why we can't do that?

The segment size limit is basically removed since the following commit
200a9aff7b02 ("block: remove the segment size check in bio_will_gap").

Before that commit, the max segment size limit worked.


Thanks,
Ming
Christoph Hellwig Sept. 12, 2019, 8:19 a.m. UTC | #11
On Thu, Sep 12, 2019 at 08:57:29AM +0800, Ming Lei wrote:
> Could you explain a bit why we can't do that?
> 
> The segment size limit is basically removed since the following commit
> 200a9aff7b02 ("block: remove the segment size check in bio_will_gap").
> 
> Before that commit, the max segment size limit worked.

No, it didn't as explained in the commit log.  It worked for some
cases but not others.
diff mbox series

Patch

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 70b39f14e974..9fb1288fbc12 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -738,12 +738,8 @@  void blk_queue_segment_boundary(struct request_queue *q, unsigned long mask)
 }
 EXPORT_SYMBOL(blk_queue_segment_boundary);
 
-/**
- * blk_queue_virt_boundary - set boundary rules for bio merging
- * @q:  the request queue for the device
- * @mask:  the memory boundary mask
- **/
-void blk_queue_virt_boundary(struct request_queue *q, unsigned long mask)
+void __blk_queue_virt_boundary(struct request_queue *q, unsigned long mask,
+			       unsigned int max_segment_size)
 {
 	q->limits.virt_boundary_mask = mask;
 
@@ -754,7 +750,17 @@  void blk_queue_virt_boundary(struct request_queue *q, unsigned long mask)
 	 * of that they are not limited by our notion of "segment size".
 	 */
 	if (mask)
-		q->limits.max_segment_size = UINT_MAX;
+		q->limits.max_segment_size = max_segment_size;
+}
+
+/**
+ * blk_queue_virt_boundary - set boundary rules for bio merging
+ * @q:  the request queue for the device
+ * @mask:  the memory boundary mask
+ **/
+void blk_queue_virt_boundary(struct request_queue *q, unsigned long mask)
+{
+	__blk_queue_virt_boundary(q, mask, UINT_MAX);
 }
 EXPORT_SYMBOL(blk_queue_virt_boundary);
 
@@ -843,13 +849,13 @@  EXPORT_SYMBOL_GPL(blk_queue_write_cache);
 bool blk_queue_can_use_dma_map_merging(struct request_queue *q,
 				       struct device *dev)
 {
+	unsigned int max_segment_size = dma_get_max_seg_size(dev);
 	unsigned long boundary = dma_get_merge_boundary(dev);
 
 	if (!boundary)
 		return false;
 
-	/* No need to update max_segment_size. see blk_queue_virt_boundary() */
-	blk_queue_virt_boundary(q, boundary);
+	__blk_queue_virt_boundary(q, boundary, max_segment_size);
 
 	return true;
 }