Patchwork [08/16] dma-debug: add core checking functions

login
register
mail settings
Submitter Joerg Roedel
Date Jan. 9, 2009, 4:19 p.m.
Message ID <1231517970-20288-9-git-send-email-joerg.roedel@amd.com>
Download mbox | patch
Permalink /patch/17530/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Joerg Roedel - Jan. 9, 2009, 4:19 p.m.
Impact: add functions to check on dma unmap and sync

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 lib/dma-debug.c |  133 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 133 insertions(+), 0 deletions(-)
Ingo Molnar - Jan. 10, 2009, 11:11 p.m.
* Joerg Roedel <joerg.roedel@amd.com> wrote:

> +#define err_printk(dev, format, arg...) do {			\
> +		error_count += 1;				\
> +		if (show_all_errors || show_num_errors > 0) {	\
> +			WARN(1, "%s %s: " format,		\
> +			     dev_driver_string(dev),		\
> +			     dev_name(dev) , ## arg);		\
> +		}						\
> +		if (!show_all_errors && show_num_errors > 0)	\
> +			show_num_errors -= 1;			\

Note that the arithmetics here is SMP-unsafe: we only hold the hash bucket 
so if two errors hit at once on two CPUs then the error tracking variables 
can be accessed at once.

I'd suggest a simple global lock for this error case (taken inside the 
hash bucket lock), to be on the safe side.

Also, please dont use a macro for this - printk details can be passed in 
to helper inlines/functions too.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar - Jan. 10, 2009, 11:12 p.m.
* Joerg Roedel <joerg.roedel@amd.com> wrote:

> +	/*
> +	 *          * This may be no bug in reality - but most implementations of the
> +	 *                   * DMA API don't handle this properly, so check for it here
> +	 *                            */

You must be using Vim and copy & paste messed up? :)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar - Jan. 10, 2009, 11:13 p.m.
* Joerg Roedel <joerg.roedel@amd.com> wrote:

> +static char *type2name[3] = { "single", "scather-gather", "coherent" };
> +
> +static char *dir2name[4] = { "DMA_BIDIRECTIONAL", "DMA_TO_DEVICE",
> +			     "DMA_FROM_DEVICE", "DMA_NONE" };

Should be const i guess.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joerg Roedel - Jan. 11, 2009, 7:54 a.m.
On Sun, Jan 11, 2009 at 12:12:22AM +0100, Ingo Molnar wrote:
> 
> * Joerg Roedel <joerg.roedel@amd.com> wrote:
> 
> > +	/*
> > +	 *          * This may be no bug in reality - but most implementations of the
> > +	 *                   * DMA API don't handle this properly, so check for it here
> > +	 *                            */
> 
> You must be using Vim and copy & paste messed up? :)

Ups, true :) Thanks.

Joerg
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joerg Roedel - Jan. 11, 2009, 7:57 a.m.
On Sun, Jan 11, 2009 at 12:11:27AM +0100, Ingo Molnar wrote:
> 
> * Joerg Roedel <joerg.roedel@amd.com> wrote:
> 
> > +#define err_printk(dev, format, arg...) do {			\
> > +		error_count += 1;				\
> > +		if (show_all_errors || show_num_errors > 0) {	\
> > +			WARN(1, "%s %s: " format,		\
> > +			     dev_driver_string(dev),		\
> > +			     dev_name(dev) , ## arg);		\
> > +		}						\
> > +		if (!show_all_errors && show_num_errors > 0)	\
> > +			show_num_errors -= 1;			\
> 
> Note that the arithmetics here is SMP-unsafe: we only hold the hash bucket 
> so if two errors hit at once on two CPUs then the error tracking variables 
> can be accessed at once.
> 
> I'd suggest a simple global lock for this error case (taken inside the 
> hash bucket lock), to be on the safe side.
> 
> Also, please dont use a macro for this - printk details can be passed in 
> to helper inlines/functions too.

Yeah, this is not SMP-safe, I know. But debugfs does not support
atomic_t so I made the variables u32. But at least a race condition has
not a too bad impact. What may habben is that error_count misses a error
or the show_num_errors become negative.
But if we really want to avoid this I think its better to add atomic_t
support to debugfs. What do you think?

Joerg
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joerg Roedel - Jan. 11, 2009, 8:34 a.m.
On Sun, Jan 11, 2009 at 08:57:52AM +0100, Joerg Roedel wrote:
> On Sun, Jan 11, 2009 at 12:11:27AM +0100, Ingo Molnar wrote:
> > 
> > * Joerg Roedel <joerg.roedel@amd.com> wrote:
> > 
> > > +#define err_printk(dev, format, arg...) do {			\
> > > +		error_count += 1;				\
> > > +		if (show_all_errors || show_num_errors > 0) {	\
> > > +			WARN(1, "%s %s: " format,		\
> > > +			     dev_driver_string(dev),		\
> > > +			     dev_name(dev) , ## arg);		\
> > > +		}						\
> > > +		if (!show_all_errors && show_num_errors > 0)	\
> > > +			show_num_errors -= 1;			\
> > 
> > Note that the arithmetics here is SMP-unsafe: we only hold the hash bucket 
> > so if two errors hit at once on two CPUs then the error tracking variables 
> > can be accessed at once.
> > 
> > I'd suggest a simple global lock for this error case (taken inside the 
> > hash bucket lock), to be on the safe side.
> > 
> > Also, please dont use a macro for this - printk details can be passed in 
> > to helper inlines/functions too.
> 
> Yeah, this is not SMP-safe, I know. But debugfs does not support
> atomic_t so I made the variables u32. But at least a race condition has
> not a too bad impact. What may habben is that error_count misses a error
> or the show_num_errors become negative.
> But if we really want to avoid this I think its better to add atomic_t
> support to debugfs. What do you think?

Even a global lock will not really help here because show_num_errors and
show_all_errors can be set using debugfs. Either we live with the small
race (with limited impact) or I add atomic_t support to debugfs.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joerg Roedel - Jan. 14, 2009, 11:44 a.m.
On Sun, Jan 11, 2009 at 12:11:27AM +0100, Ingo Molnar wrote:
> 
> * Joerg Roedel <joerg.roedel@amd.com> wrote:
> 
> > +#define err_printk(dev, format, arg...) do {			\
> > +		error_count += 1;				\
> > +		if (show_all_errors || show_num_errors > 0) {	\
> > +			WARN(1, "%s %s: " format,		\
> > +			     dev_driver_string(dev),		\
> > +			     dev_name(dev) , ## arg);		\
> > +		}						\
> > +		if (!show_all_errors && show_num_errors > 0)	\
> > +			show_num_errors -= 1;			\
> 
> Note that the arithmetics here is SMP-unsafe: we only hold the hash bucket 
> so if two errors hit at once on two CPUs then the error tracking variables 
> can be accessed at once.
> 
> I'd suggest a simple global lock for this error case (taken inside the 
> hash bucket lock), to be on the safe side.

As I wrote in a previous email, a race here is no big deal. I add a
comment to document it. Or do we want another global lock here?

> Also, please dont use a macro for this - printk details can be passed in 
> to helper inlines/functions too.

Hmm, how does this look like? There is not WARN variant which can use
va_args as a parameter. And it is important that the error message is
logged in the warning itself. So the driver developer can see it when a
user reports the warning.

Joerg
Ingo Molnar - Jan. 14, 2009, 11:48 a.m.
* Joerg Roedel <joerg.roedel@amd.com> wrote:

> On Sun, Jan 11, 2009 at 12:11:27AM +0100, Ingo Molnar wrote:
> > 
> > * Joerg Roedel <joerg.roedel@amd.com> wrote:
> > 
> > > +#define err_printk(dev, format, arg...) do {			\
> > > +		error_count += 1;				\
> > > +		if (show_all_errors || show_num_errors > 0) {	\
> > > +			WARN(1, "%s %s: " format,		\
> > > +			     dev_driver_string(dev),		\
> > > +			     dev_name(dev) , ## arg);		\
> > > +		}						\
> > > +		if (!show_all_errors && show_num_errors > 0)	\
> > > +			show_num_errors -= 1;			\
> > 
> > Note that the arithmetics here is SMP-unsafe: we only hold the hash bucket 
> > so if two errors hit at once on two CPUs then the error tracking variables 
> > can be accessed at once.
> > 
> > I'd suggest a simple global lock for this error case (taken inside the 
> > hash bucket lock), to be on the safe side.
> 
> As I wrote in a previous email, a race here is no big deal. I add a 
> comment to document it. Or do we want another global lock here?

we commonly use global locks in debug exception cases - to serialize 
console output. But it's certainly no big deal.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/lib/dma-debug.c b/lib/dma-debug.c
index ca0ccb1..9f730a4 100644
--- a/lib/dma-debug.c
+++ b/lib/dma-debug.c
@@ -17,9 +17,11 @@ 
  * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA
  */
 
+#include <linux/dma-mapping.h>
 #include <linux/dma-debug.h>
 #include <linux/spinlock.h>
 #include <linux/debugfs.h>
+#include <linux/device.h>
 #include <linux/types.h>
 #include <linux/list.h>
 #include <linux/slab.h>
@@ -82,6 +84,22 @@  static struct dentry *show_num_errors_dent;
 static struct dentry *num_free_entries_dent;
 static struct dentry *min_free_entries_dent;
 
+static char *type2name[3] = { "single", "scather-gather", "coherent" };
+
+static char *dir2name[4] = { "DMA_BIDIRECTIONAL", "DMA_TO_DEVICE",
+			     "DMA_FROM_DEVICE", "DMA_NONE" };
+
+#define err_printk(dev, format, arg...) do {			\
+		error_count += 1;				\
+		if (show_all_errors || show_num_errors > 0) {	\
+			WARN(1, "%s %s: " format,		\
+			     dev_driver_string(dev),		\
+			     dev_name(dev) , ## arg);		\
+		}						\
+		if (!show_all_errors && show_num_errors > 0)	\
+			show_num_errors -= 1;			\
+	} while (0);
+
 /*
  * Hash related functions
  *
@@ -377,3 +395,118 @@  static __init int dma_debug_entries_cmdline(char *str)
 __setup("dma_debug=", dma_debug_cmdline);
 __setup("dma_debug_entries=", dma_debug_entries_cmdline);
 
+static void check_unmap(struct dma_debug_entry *ref)
+{
+	struct dma_debug_entry *entry;
+	struct hash_bucket *bucket;
+	unsigned long flags;
+
+	if (dma_mapping_error(ref->dev, ref->dev_addr))
+		return;
+
+	bucket = get_hash_bucket(ref, &flags);
+	entry = hash_bucket_find(bucket, ref);
+
+	if (!entry) {
+		err_printk(ref->dev, "DMA-API: device driver tries "
+			   "to free DMA memory it has not allocated "
+			   "[device address=0x%016llx] [size=%llu bytes]\n",
+			   ref->dev_addr, ref->size);
+		goto out;
+	}
+
+	if (ref->size != entry->size) {
+		err_printk(ref->dev, "DMA-API: device driver frees "
+			   "DMA memory with different size "
+			   "[device address=0x%016llx] [map size=%llu bytes] "
+			   "[unmap size=%llu bytes]\n",
+			   ref->dev_addr, entry->size, ref->size);
+	}
+
+	if (ref->type != entry->type) {
+		err_printk(ref->dev, "DMA-API: device driver frees "
+			   "DMA memory different that it was allocated "
+			   "[device address=0x%016llx] [size=%llu bytes] "
+			   "[mapped as %s] [unmapped as %s]\n",
+			   ref->dev_addr, ref->size,
+			   type2name[entry->type], type2name[ref->type]);
+	} else if ((entry->type == dma_debug_coherent) &&
+		   (ref->cpu_addr != entry->cpu_addr)) {
+		err_printk(ref->dev, "DMA-API: device driver frees "
+			   "DMA memory with different CPU address "
+			   "[device address=0x%016llx] [size=%llu bytes] "
+			   "[cpu alloc address=%p] [cpu free address=%p]",
+			   ref->dev_addr, ref->size,
+			   entry->cpu_addr, ref->cpu_addr);
+	}
+
+	/*
+	 *          * This may be no bug in reality - but most implementations of the
+	 *                   * DMA API don't handle this properly, so check for it here
+	 *                            */
+	if (ref->direction != entry->direction) {
+		err_printk(ref->dev, "DMA-API: device driver frees "
+			   "DMA memory with different direction "
+			   "[device address=0x%016llx] [size=%llu bytes] "
+			   "[mapped with %s] [unmapped with %s]\n",
+			   ref->dev_addr, ref->size,
+			   dir2name[entry->direction],
+			   dir2name[ref->direction]);
+	}
+
+	hash_bucket_del(entry);
+	dma_entry_free(entry);
+
+out:
+	put_hash_bucket(bucket, &flags);
+}
+
+static void check_sync(struct device *dev, dma_addr_t addr,
+		       u64 size, u64 offset, int direction, bool to_cpu)
+{
+	struct dma_debug_entry ref = {
+		.dev            = dev,
+		.dev_addr       = addr,
+		.size           = size,
+		.direction      = direction,
+	};
+	struct dma_debug_entry *entry;
+	struct hash_bucket *bucket;
+	unsigned long flags;
+
+	bucket = get_hash_bucket(&ref, &flags);
+
+	entry = hash_bucket_find(bucket, &ref);
+
+	if (!entry) {
+		err_printk(dev, "DMA-API: device driver tries "
+				"to sync DMA memory it has not allocated "
+				"[device address=0x%016llx] [size=%llu bytes]\n",
+				addr, size);
+		goto out;
+	}
+
+	if ((offset + size) > entry->size) {
+		err_printk(dev, "DMA-API: device driver syncs"
+				" DMA memory outside allocated range "
+				"[device address=0x%016llx] "
+				"[allocation size=%llu bytes] [sync offset=%llu] "
+				"[sync size=%llu]\n", entry->dev_addr, entry->size,
+				offset, size);
+	}
+
+	if (direction != entry->direction) {
+		err_printk(dev, "DMA-API: device driver syncs "
+				"DMA memory with different direction "
+				"[device address=0x%016llx] [size=%llu bytes] "
+				"[mapped with %s] [synced with %s]\n",
+				addr, entry->size,
+				dir2name[entry->direction],
+				dir2name[direction]);
+	}
+
+out:
+	put_hash_bucket(bucket, &flags);
+
+}
+