diff mbox

[RFC,5/8] scatterlist: Modify SG copy functions to support io memory.

Message ID 445bc352-75d7-438f-96ef-c2411215628d@deltatee.com
State Not Applicable
Headers show

Commit Message

Logan Gunthorpe April 3, 2017, 9:20 p.m. UTC
Hi Christoph,

What are your thoughts on an approach like the following untested
draft patch.

The patch (if fleshed out) makes it so iomem can be used in an sgl
and WARN_ONs will occur in places where drivers attempt to access
iomem directly through the sgl.

I'd also probably create a p2pmem_alloc_sgl helper function so driver
writers wouldn't have to mess with sg_set_iomem_page.

With all that in place, it should be relatively safe for drivers to
implement p2pmem even though we'd still technically be violating the
__iomem boundary in some places.

Logan


commit b435a154a4ec4f82766f6ab838092c3c5a9388ac
Author: Logan Gunthorpe <logang@deltatee.com>
Date:   Wed Feb 8 12:44:52 2017 -0700

    scatterlist: Add support for iomem pages

    This patch steals another bit from the page_link field to indicate the
    sg points to iomem. In sg_copy_buffer we use this flag to select
    between memcpy and iomemcpy. Other sg_miter users will get an WARN_ON
    unless they indicate they support iomemory by setting the
    SG_MITER_IOMEM flag.

    Also added are sg_kmap functions which would replace a common pattern
    of kmap(sg_page(sg)). These new functions then also warn if the caller
    tries to map io memory. Another option may be to automatically copy
    the iomem to a new page and return that transparently to the driver.

    Another coccinelle patch would then be done to convert kmap(sg_page(sg))
    instances to the appropriate sg_kmap calls.

    Signed-off-by: Logan Gunthorpe <logang@deltatee.com>

 		sg_flags |= SG_MITER_FROM_SG;
@@ -668,10 +671,17 @@ size_t sg_copy_buffer(struct scatterlist *sgl,
unsigned int nents, void *buf,

 		len = min(miter.length, buflen - offset);

-		if (to_buffer)
-			memcpy(buf + offset, miter.addr, len);
-		else
-			memcpy(miter.addr, buf + offset, len);
+		if (sg_is_iomem(miter.piter.sg)) {
+			if (to_buffer)
+				memcpy_fromio(buf + offset,  miter.addr, len);
+			else
+				memcpy_toio(miter.addr, buf + offset, len);
+		} else {
+			if (to_buffer)
+				memcpy(buf + offset, miter.addr, len);
+			else
+				memcpy(miter.addr, buf + offset, len);
+		}

 		offset += len;
 	}

Comments

Dan Williams April 3, 2017, 9:44 p.m. UTC | #1
On Mon, Apr 3, 2017 at 2:20 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
> Hi Christoph,
>
> What are your thoughts on an approach like the following untested
> draft patch.
>
> The patch (if fleshed out) makes it so iomem can be used in an sgl
> and WARN_ONs will occur in places where drivers attempt to access
> iomem directly through the sgl.
>
> I'd also probably create a p2pmem_alloc_sgl helper function so driver
> writers wouldn't have to mess with sg_set_iomem_page.
>
> With all that in place, it should be relatively safe for drivers to
> implement p2pmem even though we'd still technically be violating the
> __iomem boundary in some places.

Just reacting to this mail, I still haven't had a chance to take a
look at the rest of the series.

The pfn_t type was invented to carry extra type and page lookup
information about the memory behind a given pfn. At first glance that
seems a more natural place to carry an indication that this is an
"I/O" pfn.
Logan Gunthorpe April 3, 2017, 10:10 p.m. UTC | #2
On 03/04/17 03:44 PM, Dan Williams wrote:
> On Mon, Apr 3, 2017 at 2:20 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
>> Hi Christoph,
>>
>> What are your thoughts on an approach like the following untested
>> draft patch.
>>
>> The patch (if fleshed out) makes it so iomem can be used in an sgl
>> and WARN_ONs will occur in places where drivers attempt to access
>> iomem directly through the sgl.
>>
>> I'd also probably create a p2pmem_alloc_sgl helper function so driver
>> writers wouldn't have to mess with sg_set_iomem_page.
>>
>> With all that in place, it should be relatively safe for drivers to
>> implement p2pmem even though we'd still technically be violating the
>> __iomem boundary in some places.
> 
> Just reacting to this mail, I still haven't had a chance to take a
> look at the rest of the series.
> 
> The pfn_t type was invented to carry extra type and page lookup
> information about the memory behind a given pfn. At first glance that
> seems a more natural place to carry an indication that this is an
> "I/O" pfn.

I agree... But what are the plans for pfn_t? Is anyone working on using
it in the scatterlist code? Currently it's not there yet and given the
assertion that we will continue to be using struct page for DMA is that
a direction we'd want to go?

Logan
Dan Williams April 3, 2017, 10:47 p.m. UTC | #3
On Mon, Apr 3, 2017 at 3:10 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
> On 03/04/17 03:44 PM, Dan Williams wrote:
>> On Mon, Apr 3, 2017 at 2:20 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
>>> Hi Christoph,
>>>
>>> What are your thoughts on an approach like the following untested
>>> draft patch.
>>>
>>> The patch (if fleshed out) makes it so iomem can be used in an sgl
>>> and WARN_ONs will occur in places where drivers attempt to access
>>> iomem directly through the sgl.
>>>
>>> I'd also probably create a p2pmem_alloc_sgl helper function so driver
>>> writers wouldn't have to mess with sg_set_iomem_page.
>>>
>>> With all that in place, it should be relatively safe for drivers to
>>> implement p2pmem even though we'd still technically be violating the
>>> __iomem boundary in some places.
>>
>> Just reacting to this mail, I still haven't had a chance to take a
>> look at the rest of the series.
>>
>> The pfn_t type was invented to carry extra type and page lookup
>> information about the memory behind a given pfn. At first glance that
>> seems a more natural place to carry an indication that this is an
>> "I/O" pfn.
>
> I agree... But what are the plans for pfn_t? Is anyone working on using
> it in the scatterlist code? Currently it's not there yet and given the
> assertion that we will continue to be using struct page for DMA is that
> a direction we'd want to go?
>

I wouldn't necessarily conflate supporting pfn_t in the scatterlist
with the stalled stuct-page-less DMA effor. A pfn_t_to_page()
conversion will still work and be required. However you're right, the
minute we use pfn_t for this we're into the realm of special case
drivers that understand scatterlists with special "I/O-pfn_t" entries.
However, maybe that's what we want? I think peer-to-peer DMA is not a
general purpose feature unless/until we get it standardized in PCI. So
maybe drivers with special case scatterlist support is exactly what we
want for now.

Thoughts?
Logan Gunthorpe April 3, 2017, 11:12 p.m. UTC | #4
On 03/04/17 04:47 PM, Dan Williams wrote:
> I wouldn't necessarily conflate supporting pfn_t in the scatterlist
> with the stalled stuct-page-less DMA effor. A pfn_t_to_page()
> conversion will still work and be required. However you're right, the
> minute we use pfn_t for this we're into the realm of special case
> drivers that understand scatterlists with special "I/O-pfn_t" entries.

Well yes, it would certainly be possible to convert the scatterlist code
from page_link to pfn_t. (The only slightly tricky thing is that
scatterlist uses extra chaining bits and pfn_t uses extra flag bits so
they'd have to be harmonized somehow). But if we aren't moving toward
struct-page-less DMA, I fail to see the point of the conversion.

I'll definitely need IO scatterlists of some form or another and I like
pfn_t but right now it just seems like extra work with unclear benefit.
(Though, if someone told me that I can't use a third bit in the
page_link field then maybe that would be a good reason to move to pfn_t.)

> However, maybe that's what we want? I think peer-to-peer DMA is not a
> general purpose feature unless/until we get it standardized in PCI. So
> maybe drivers with special case scatterlist support is exactly what we
> want for now.

Well, I think this should be completely independent from PCI code. I see
no reason why we can't have infrastructure for DMA on iomem from any
bus. Largely all the work I've done in this area is completely agnostic
to the bus in use. (Except for any kind of white/black list when it is
used.)

The "special case scatterlist" is essentially what I'm proposing in the
patch I sent upthread, it just stores the flag in the page_link instead
of in a pfn_t.

Logan
Dan Williams April 4, 2017, 12:07 a.m. UTC | #5
On Mon, Apr 3, 2017 at 4:12 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
> On 03/04/17 04:47 PM, Dan Williams wrote:
>> I wouldn't necessarily conflate supporting pfn_t in the scatterlist
>> with the stalled stuct-page-less DMA effor. A pfn_t_to_page()
>> conversion will still work and be required. However you're right, the
>> minute we use pfn_t for this we're into the realm of special case
>> drivers that understand scatterlists with special "I/O-pfn_t" entries.
>
> Well yes, it would certainly be possible to convert the scatterlist code
> from page_link to pfn_t. (The only slightly tricky thing is that
> scatterlist uses extra chaining bits and pfn_t uses extra flag bits so
> they'd have to be harmonized somehow). But if we aren't moving toward
> struct-page-less DMA, I fail to see the point of the conversion.
>
> I'll definitely need IO scatterlists of some form or another and I like
> pfn_t but right now it just seems like extra work with unclear benefit.
> (Though, if someone told me that I can't use a third bit in the
> page_link field then maybe that would be a good reason to move to pfn_t.)
>
>> However, maybe that's what we want? I think peer-to-peer DMA is not a
>> general purpose feature unless/until we get it standardized in PCI. So
>> maybe drivers with special case scatterlist support is exactly what we
>> want for now.
>
> Well, I think this should be completely independent from PCI code. I see
> no reason why we can't have infrastructure for DMA on iomem from any
> bus. Largely all the work I've done in this area is completely agnostic
> to the bus in use. (Except for any kind of white/black list when it is
> used.)

The completely agnostic part is where I get worried, but I shouldn't
say anymore until I actually read the patch.The worry is cases where
this agnostic enabling allows unsuspecting code paths to do the wrong
thing. Like bypass iomem safety.

> The "special case scatterlist" is essentially what I'm proposing in the
> patch I sent upthread, it just stores the flag in the page_link instead
> of in a pfn_t.

Makes sense. The suggestion of pfn_t was to try to get more type
safety throughout the stack. So that, again, unsuspecting code paths
that get an I/O pfn aren't able to do things like page_address() or
kmap() without failing.

I'll stop commenting now and set aside some time to go read the patches.
Logan Gunthorpe April 7, 2017, 5:59 p.m. UTC | #6
Hi Dan,

On 03/04/17 06:07 PM, Dan Williams wrote:
> The completely agnostic part is where I get worried, but I shouldn't
> say anymore until I actually read the patch.The worry is cases where
> this agnostic enabling allows unsuspecting code paths to do the wrong
> thing. Like bypass iomem safety.

Yup, you're right the iomem safety issue is a really difficult problem.
I think replacing struct page with pfn_t in a bunch of places is
probably going to be a requirement for my work. However, this is going
to be a very large undertaking.

I've done an audit of sg_page users and there will indeed be some
difficult cases. However, I'm going to start doing some cleanup and
semantic changes to hopefully move in that direction. The first step
I've chosen to look at is to create an sg_kmap interface which replaces
about 77 (out of ~340) sg_page users. I'm hoping the new interface can
have the semantic that sg_kmap can fail (which would happen in the case
that no suitable page exists).

Eventually, I'd want to get to a place where sg_page either doesn't
exists or can fail and is always checked. At that point swapping out
pfn_t in the sgl would be manageable.

Thoughts?

Logan
diff mbox

Patch

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 0007b79..bd690a2c 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -37,6 +37,9 @@ 

 #include <uapi/linux/dma-buf.h>

+/* Avoid the highmem.h macro from aliasing our ops->kunmap_atomic */
+#undef kunmap_atomic
+
 static inline int is_dma_buf_file(struct file *);

 struct dma_buf_list {
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index cb3c8fe..7608da0 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -5,6 +5,7 @@ 
 #include <linux/types.h>
 #include <linux/bug.h>
 #include <linux/mm.h>
+#include <linux/highmem.h>
 #include <asm/io.h>

 struct scatterlist {
@@ -53,6 +54,9 @@  struct sg_table {
  *
  * If bit 1 is set, then this sg entry is the last element in a list.
  *
+ * We also use bit 2 to indicate whether the page_link points to an
+ * iomem page or not.
+ *
  * See sg_next().
  *
  */
@@ -64,10 +68,17 @@  struct sg_table {
  * a valid sg entry, or whether it points to the start of a new
scatterlist.
  * Those low bits are there for everyone! (thanks mason :-)
  */
-#define sg_is_chain(sg)		((sg)->page_link & 0x01)
-#define sg_is_last(sg)		((sg)->page_link & 0x02)
+#define PAGE_LINK_MASK	0x7
+#define PAGE_LINK_CHAIN	0x1
+#define PAGE_LINK_LAST	0x2
+#define PAGE_LINK_IOMEM	0x4
+
+#define sg_is_chain(sg)		((sg)->page_link & PAGE_LINK_CHAIN)
+#define sg_is_last(sg)		((sg)->page_link & PAGE_LINK_LAST)
 #define sg_chain_ptr(sg)	\
-	((struct scatterlist *) ((sg)->page_link & ~0x03))
+	((struct scatterlist *) ((sg)->page_link & ~(PAGE_LINK_CHAIN | \
+						     PAGE_LINK_LAST)))
+#define sg_is_iomem(sg)		((sg)->page_link & PAGE_LINK_IOMEM)

 /**
  * sg_assign_page - Assign a given page to an SG entry
@@ -81,13 +92,13 @@  struct sg_table {
  **/
 static inline void sg_assign_page(struct scatterlist *sg, struct page
*page)
 {
-	unsigned long page_link = sg->page_link & 0x3;
+	unsigned long page_link = sg->page_link & PAGE_LINK_MASK;

 	/*
 	 * In order for the low bit stealing approach to work, pages
-	 * must be aligned at a 32-bit boundary as a minimum.
+	 * must be aligned at a 64-bit boundary as a minimum.
 	 */
-	BUG_ON((unsigned long) page & 0x03);
+	BUG_ON((unsigned long) page & PAGE_LINK_MASK);
 #ifdef CONFIG_DEBUG_SG
 	BUG_ON(sg->sg_magic != SG_MAGIC);
 	BUG_ON(sg_is_chain(sg));
@@ -117,13 +128,56 @@  static inline void sg_set_page(struct scatterlist
*sg, struct page *page,
 	sg->length = len;
 }

+/**
+ * sg_set_page - Set sg entry to point at given iomem page
+ * @sg:		 SG entry
+ * @page:	 The page
+ * @len:	 Length of data
+ * @offset:	 Offset into page
+ *
+ * Description:
+ *   Same as sg_set_page but used when the page is a ZONE_DEVICE page that
+ *   points to IO memory.
+ *
+ **/
+static inline void sg_set_iomem_page(struct scatterlist *sg, struct
page *page,
+				     unsigned int len, unsigned int offset)
+{
+	sg_set_page(sg, page, len, offset);
+	sg->page_link |= PAGE_LINK_IOMEM;
+}
+
 static inline struct page *sg_page(struct scatterlist *sg)
 {
 #ifdef CONFIG_DEBUG_SG
 	BUG_ON(sg->sg_magic != SG_MAGIC);
 	BUG_ON(sg_is_chain(sg));
 #endif
-	return (struct page *)((sg)->page_link & ~0x3);
+	return (struct page *)((sg)->page_link & ~PAGE_LINK_MASK);
+}
+
+static inline void *sg_kmap(struct scatterlist *sg)
+{
+	WARN_ON(sg_is_iomem(sg));
+
+	return kmap(sg_page(sg));
+}
+
+static inline void sg_kunmap(struct scatterlist *sg, void *addr)
+{
+	kunmap(addr);
+}
+
+static inline void *sg_kmap_atomic(struct scatterlist *sg)
+{
+	WARN_ON(sg_is_iomem(sg));
+
+	return kmap(sg_page(sg));
+}
+
+static inline void sg_kunmap_atomic(struct scatterlist *sg, void *addr)
+{
+	kunmap_atomic(addr);
 }

 /**
@@ -171,7 +225,8 @@  static inline void sg_chain(struct scatterlist *prv,
unsigned int prv_nents,
 	 * Set lowest bit to indicate a link pointer, and make sure to clear
 	 * the termination bit if it happens to be set.
 	 */
-	prv[prv_nents - 1].page_link = ((unsigned long) sgl | 0x01) & ~0x02;
+	prv[prv_nents - 1].page_link =
+		((unsigned long) sgl & ~PAGE_LINK_MASK) | PAGE_LINK_CHAIN;
 }

 /**
@@ -191,8 +246,8 @@  static inline void sg_mark_end(struct scatterlist *sg)
 	/*
 	 * Set termination bit, clear potential chain bit
 	 */
-	sg->page_link |= 0x02;
-	sg->page_link &= ~0x01;
+	sg->page_link &= ~PAGE_LINK_MASK;
+	sg->page_link |= PAGE_LINK_LAST;
 }

 /**
@@ -208,7 +263,7 @@  static inline void sg_unmark_end(struct scatterlist *sg)
 #ifdef CONFIG_DEBUG_SG
 	BUG_ON(sg->sg_magic != SG_MAGIC);
 #endif
-	sg->page_link &= ~0x02;
+	sg->page_link &= ~PAGE_LINK_LAST;
 }

 /**
@@ -383,6 +438,7 @@  static inline dma_addr_t
sg_page_iter_dma_address(struct sg_page_iter *piter)
 #define SG_MITER_ATOMIC		(1 << 0)	 /* use kmap_atomic */
 #define SG_MITER_TO_SG		(1 << 1)	/* flush back to phys on unmap */
 #define SG_MITER_FROM_SG	(1 << 2)	/* nop */
+#define SG_MITER_IOMEM		(1 << 3)	/* support iomem in miter ops */

 struct sg_mapping_iter {
 	/* the following three fields can be accessed directly */
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index c6cf822..6d8f39b 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -580,6 +580,9 @@  bool sg_miter_next(struct sg_mapping_iter *miter)
 	if (!sg_miter_get_next_page(miter))
 		return false;

+	if (!(miter->__flags & SG_MITER_IOMEM))
+		WARN_ON(sg_is_iomem(miter->piter.sg));
+
 	miter->page = sg_page_iter_page(&miter->piter);
 	miter->consumed = miter->length = miter->__remaining;

@@ -651,7 +654,7 @@  size_t sg_copy_buffer(struct scatterlist *sgl,
unsigned int nents, void *buf,
 {
 	unsigned int offset = 0;
 	struct sg_mapping_iter miter;
-	unsigned int sg_flags = SG_MITER_ATOMIC;
+	unsigned int sg_flags = SG_MITER_ATOMIC | SG_MITER_IOMEM;

 	if (to_buffer)