diff mbox

bnx2 fails to compile on parisc because of missing get_dma_ops()

Message ID 20100617211203O.fujita.tomonori@lab.ntt.co.jp
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

FUJITA Tomonori June 17, 2010, 12:13 p.m. UTC
On Wed, 16 Jun 2010 20:53:57 -0700
"Michael Chan" <mchan@broadcom.com> wrote:

> > > The commit that causes the problem:
> > >
> > > commit a33fa66bcf365ffe5b79d1ae1d3582cc261ae56e
> > > Author: Michael Chan <mchan@broadcom.com>
> > > Date:   Thu May 6 08:58:13 2010 +0000
> > >
> > >    bnx2: Add prefetches to rx path.
> > >
> > > Looks fairly innocuous by the description.
> > >
> > > Should parisc have a get_dma_ops()?  We don't need one
> > because our dma
> > > ops are per platform not per bus.
> >
> > looks like it'll be broken on more than just parisc:
> > $ grep get_dma_ops arch/*/include/asm/ -rl | cut -d/ -f 2
> > alpha
> > ia64
> > microblaze
> > powerpc
> > sh
> > sparc
> > x86
> 
> Most of these archs use the dma functions in:
> 
> <asm-genric/dma-mapping-common.h>
> 
> so it's not a problem.

No, it's wrong assumption. asm-genric/dma-mapping-common.h is the
helper code to simplify architecture's DMA core code. Some
architecture uses it and some don't.

You can't expect every architectures to use it.


> I think I'll send in a patch to remove that part of the code
> from bnx2.c for now.

Yeah. I'm not sure you already sent a patch.

=
From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Date: Thu, 17 Jun 2010 13:06:15 +0900
Subject: [PATCH] bnx2: fix dma_get_ops compilation breakage

This removes dma_get_ops() prefetch optimization in bnx2.

bnx2 uses dma_get_ops() to see if dma_sync_single_for_cpu() is
noop. bnx2 does prefetch if it's noop.

But dma_get_ops() isn't available on all the architectures (only the
architectures that uses dma_map_ops struct have it). Using
dma_get_ops() in drivers leads to compilation breakage on many
archtectures.

Currently, we don't have a way to see if dma_sync_single_for_cpu() is
noop. If it can improve the performance notably, we can add the new
DMA API for it.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
---
 drivers/net/bnx2.c |   10 +---------
 1 files changed, 1 insertions(+), 9 deletions(-)

Comments

Michael Chan June 17, 2010, 12:54 p.m. UTC | #1
FUJITA Tomonori wrote:

> From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> Date: Thu, 17 Jun 2010 13:06:15 +0900
> Subject: [PATCH] bnx2: fix dma_get_ops compilation breakage
>
> This removes dma_get_ops() prefetch optimization in bnx2.
>
> bnx2 uses dma_get_ops() to see if dma_sync_single_for_cpu() is
> noop. bnx2 does prefetch if it's noop.
>
> But dma_get_ops() isn't available on all the architectures (only the
> architectures that uses dma_map_ops struct have it). Using
> dma_get_ops() in drivers leads to compilation breakage on many
> archtectures.
>
> Currently, we don't have a way to see if dma_sync_single_for_cpu() is
> noop. If it can improve the performance notably, we can add the new
> DMA API for it.

This prefetch improves performance noticeably when the driver is
handling incoming 64-byte packets at a sustained rate.

>
> Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

Acked-by: Michael Chan <mchan@broadcom.com>

Thanks.

> ---
>  drivers/net/bnx2.c |   10 +---------
>  1 files changed, 1 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
> index 949d7a9..b3305fc 100644
> --- a/drivers/net/bnx2.c
> +++ b/drivers/net/bnx2.c
> @@ -3073,7 +3073,6 @@ bnx2_rx_int(struct bnx2 *bp, struct
> bnx2_napi *bnapi, int budget)
>       u16 hw_cons, sw_cons, sw_ring_cons, sw_prod, sw_ring_prod;
>       struct l2_fhdr *rx_hdr;
>       int rx_pkt = 0, pg_ring_used = 0;
> -     struct pci_dev *pdev = bp->pdev;
>
>       hw_cons = bnx2_get_hw_rx_cons(bnapi);
>       sw_cons = rxr->rx_cons;
> @@ -3086,7 +3085,7 @@ bnx2_rx_int(struct bnx2 *bp, struct
> bnx2_napi *bnapi, int budget)
>       while (sw_cons != hw_cons) {
>               unsigned int len, hdr_len;
>               u32 status;
> -             struct sw_bd *rx_buf, *next_rx_buf;
> +             struct sw_bd *rx_buf;
>               struct sk_buff *skb;
>               dma_addr_t dma_addr;
>               u16 vtag = 0;
> @@ -3098,13 +3097,6 @@ bnx2_rx_int(struct bnx2 *bp, struct
> bnx2_napi *bnapi, int budget)
>               rx_buf = &rxr->rx_buf_ring[sw_ring_cons];
>               skb = rx_buf->skb;
>               prefetchw(skb);
> -
> -             if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
> -                     next_rx_buf =
> -                             &rxr->rx_buf_ring[
> -
> RX_RING_IDX(NEXT_RX_BD(sw_cons))];
> -                     prefetch(next_rx_buf->desc);
> -             }
>               rx_buf->skb = NULL;
>
>               dma_addr = dma_unmap_addr(rx_buf, mapping);
> --
> 1.5.6.5
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley June 17, 2010, 1:12 p.m. UTC | #2
On Thu, 2010-06-17 at 05:54 -0700, Michael Chan wrote:
> FUJITA Tomonori wrote:
> 
> > From: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> > Date: Thu, 17 Jun 2010 13:06:15 +0900
> > Subject: [PATCH] bnx2: fix dma_get_ops compilation breakage
> >
> > This removes dma_get_ops() prefetch optimization in bnx2.
> >
> > bnx2 uses dma_get_ops() to see if dma_sync_single_for_cpu() is
> > noop. bnx2 does prefetch if it's noop.
> >
> > But dma_get_ops() isn't available on all the architectures (only the
> > architectures that uses dma_map_ops struct have it). Using
> > dma_get_ops() in drivers leads to compilation breakage on many
> > archtectures.
> >
> > Currently, we don't have a way to see if dma_sync_single_for_cpu() is
> > noop. If it can improve the performance notably, we can add the new
> > DMA API for it.
> 
> This prefetch improves performance noticeably when the driver is
> handling incoming 64-byte packets at a sustained rate.

So why not do it unconditionally?  The worst that can happen is that you
pull in a stale cache line which will get cleaned in the dma_sync, thus
slightly degrading performance on incoherent architectures.

Alternatively, come up with a dma prefetch infrastructure ... all you're
really doing is hinting to the architecture that you'll sync this region
next.

James


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Chan June 17, 2010, 1:30 p.m. UTC | #3
James Bottomley wrote:

> On Thu, 2010-06-17 at 05:54 -0700, Michael Chan wrote:
> > This prefetch improves performance noticeably when the driver is
> > handling incoming 64-byte packets at a sustained rate.
>
> So why not do it unconditionally?  The worst that can happen
> is that you
> pull in a stale cache line which will get cleaned in the
> dma_sync, thus
> slightly degrading performance on incoherent architectures.

The original patch was an unconditional prefetch.  There was
some discussion that it might not be correct if the DMA wasn't
sync'ed yet on some archs.  If the concensus is that it is ok to
do so, that would be the simplest solution.

>
> Alternatively, come up with a dma prefetch infrastructure ...
> all you're
> really doing is hinting to the architecture that you'll sync
> this region
> next.
>
> James
>
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley June 17, 2010, 1:36 p.m. UTC | #4
On Thu, 2010-06-17 at 06:30 -0700, Michael Chan wrote: 
> James Bottomley wrote:
> 
> > On Thu, 2010-06-17 at 05:54 -0700, Michael Chan wrote:
> > > This prefetch improves performance noticeably when the driver is
> > > handling incoming 64-byte packets at a sustained rate.
> >
> > So why not do it unconditionally?  The worst that can happen
> > is that you
> > pull in a stale cache line which will get cleaned in the
> > dma_sync, thus
> > slightly degrading performance on incoherent architectures.
> 
> The original patch was an unconditional prefetch.  There was
> some discussion that it might not be correct if the DMA wasn't
> sync'ed yet on some archs.  If the concensus is that it is ok to
> do so, that would be the simplest solution.

It's definitely not "correct" in that it may pull in stale data.  But it
should be harmless in that if it does, the subsequent sync will destroy
the cache line (even if it actually pulled in correct data) and prevent
the actual use of the prefetched data being wrong (or indeed being
prefetched at all).

James



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c
index 949d7a9..b3305fc 100644
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -3073,7 +3073,6 @@  bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 	u16 hw_cons, sw_cons, sw_ring_cons, sw_prod, sw_ring_prod;
 	struct l2_fhdr *rx_hdr;
 	int rx_pkt = 0, pg_ring_used = 0;
-	struct pci_dev *pdev = bp->pdev;
 
 	hw_cons = bnx2_get_hw_rx_cons(bnapi);
 	sw_cons = rxr->rx_cons;
@@ -3086,7 +3085,7 @@  bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 	while (sw_cons != hw_cons) {
 		unsigned int len, hdr_len;
 		u32 status;
-		struct sw_bd *rx_buf, *next_rx_buf;
+		struct sw_bd *rx_buf;
 		struct sk_buff *skb;
 		dma_addr_t dma_addr;
 		u16 vtag = 0;
@@ -3098,13 +3097,6 @@  bnx2_rx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 		rx_buf = &rxr->rx_buf_ring[sw_ring_cons];
 		skb = rx_buf->skb;
 		prefetchw(skb);
-
-		if (!get_dma_ops(&pdev->dev)->sync_single_for_cpu) {
-			next_rx_buf =
-				&rxr->rx_buf_ring[
-					RX_RING_IDX(NEXT_RX_BD(sw_cons))];
-			prefetch(next_rx_buf->desc);
-		}
 		rx_buf->skb = NULL;
 
 		dma_addr = dma_unmap_addr(rx_buf, mapping);