diff mbox

Bug: mv643xxx fails with highmem

Message ID 20141211194920.GR11285@n2100.arm.linux.org.uk
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Russell King - ARM Linux Dec. 11, 2014, 7:49 p.m. UTC
Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
all fragments with dma_map_single().  This fails when the driver is
used in an environment with highmem.

With this patch in place in 3.18:


I regularly hit the first BUG_ON().  Without the BUG_ON(), the machine
either freezes or oopses in the DMA API functions due to them being
passed an invalid struct page *.

So, I'm unclear whether this is a driver bug, or whether it's a core
netdev bug: should netdev be passing skbuffs with highmem pages
attached when the driver has NETIF_F_HIGHDMA clear, or is Ezequiel's
patch removing the ability to map highmem pages from this driver
incorrect?

Here's the ethtool -k eth0 output for this driver:

Features for eth0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Thanks.

Comments

David Miller Dec. 11, 2014, 8:10 p.m. UTC | #1
From: Russell King - ARM Linux <linux@arm.linux.org.uk>
Date: Thu, 11 Dec 2014 19:49:20 +0000

> Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
> all fragments with dma_map_single().  This fails when the driver is
> used in an environment with highmem.

This change looks really buggy to me.

Unfortunately, all the changes he subsequently makes for software TSO
support depend upon this :-/

The change is definitely wrong.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ezequiel Garcia Dec. 11, 2014, 8:12 p.m. UTC | #2
On 12/11/2014 05:10 PM, David Miller wrote:
> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
> Date: Thu, 11 Dec 2014 19:49:20 +0000
> 
>> Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
>> all fragments with dma_map_single().  This fails when the driver is
>> used in an environment with highmem.
> 
> This change looks really buggy to me.
> 
> Unfortunately, all the changes he subsequently makes for software TSO
> support depend upon this :-/
> 
> The change is definitely wrong.
> 

Got it. I'll take a closer look and will try to think a fix for this.
Russell King - ARM Linux Dec. 11, 2014, 8:25 p.m. UTC | #3
On Thu, Dec 11, 2014 at 03:10:55PM -0500, David Miller wrote:
> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
> Date: Thu, 11 Dec 2014 19:49:20 +0000
> 
> > Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
> > all fragments with dma_map_single().  This fails when the driver is
> > used in an environment with highmem.
> 
> This change looks really buggy to me.
> 
> Unfortunately, all the changes he subsequently makes for software TSO
> support depend upon this :-/
> 
> The change is definitely wrong.

Thanks for confirming where the bug is.

Would other drivers need fixing for this as well?  Eg, fec_main.c
does the following, and this driver is used on iMX6 which can also have
highmem:

static int
fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq,
                             struct sk_buff *skb,
                             struct net_device *ndev)
{
                bufaddr = page_address(this_frag->page.p) + this_frag->page_offset;
...
                addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
                                      DMA_TO_DEVICE);
Ezequiel Garcia Dec. 11, 2014, 8:25 p.m. UTC | #4
On 12/11/2014 05:25 PM, Russell King - ARM Linux wrote:
> On Thu, Dec 11, 2014 at 03:10:55PM -0500, David Miller wrote:
>> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
>> Date: Thu, 11 Dec 2014 19:49:20 +0000
>>
>>> Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
>>> all fragments with dma_map_single().  This fails when the driver is
>>> used in an environment with highmem.
>>
>> This change looks really buggy to me.
>>
>> Unfortunately, all the changes he subsequently makes for software TSO
>> support depend upon this :-/
>>
>> The change is definitely wrong.
> 
> Thanks for confirming where the bug is.
> 
> Would other drivers need fixing for this as well?  Eg, fec_main.c
> does the following, and this driver is used on iMX6 which can also have
> highmem:
> 
> static int
> fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq,
>                              struct sk_buff *skb,
>                              struct net_device *ndev)
> {
>                 bufaddr = page_address(this_frag->page.p) + this_frag->page_offset;
> ...
>                 addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
>                                       DMA_TO_DEVICE);
> 

And mvneta seems to have that pattern too: see mvneta_tx_frag_process().
David Miller Dec. 11, 2014, 8:27 p.m. UTC | #5
From: Russell King - ARM Linux <linux@arm.linux.org.uk>
Date: Thu, 11 Dec 2014 20:25:07 +0000

> Would other drivers need fixing for this as well?  Eg, fec_main.c
> does the following, and this driver is used on iMX6 which can also have
> highmem:
> 
> static int
> fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq,
>                              struct sk_buff *skb,
>                              struct net_device *ndev)
> {
>                 bufaddr = page_address(this_frag->page.p) + this_frag->page_offset;
> ...
>                 addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
>                                       DMA_TO_DEVICE);

Probably, yes.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Fugang Duan Dec. 12, 2014, 5:34 a.m. UTC | #6
From: David Miller <davem@davemloft.net> Sent: Friday, December 12, 2014 4:28 AM
> To: linux@arm.linux.org.uk
> Cc: Duan Fugang-B38611; Estevam Fabio-R49496; ezequiel.garcia@free-
> electrons.com; netdev@vger.kernel.org
> Subject: Re: Bug: mv643xxx fails with highmem
> 
> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
> Date: Thu, 11 Dec 2014 20:25:07 +0000
> 
> > Would other drivers need fixing for this as well?  Eg, fec_main.c does
> > the following, and this driver is used on iMX6 which can also have
> > highmem:
> >
> > static int
> > fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq,
> >                              struct sk_buff *skb,
> >                              struct net_device *ndev) {
> >                 bufaddr = page_address(this_frag->page.p) +
> > this_frag->page_offset; ...
> >                 addr = dma_map_single(&fep->pdev->dev, bufaddr,
> frag_len,
> >                                       DMA_TO_DEVICE);
> 
> Probably, yes.

Hi, all,

I will submit one patch to fix the issue.

Regards,
Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Dec. 15, 2014, 6:04 p.m. UTC | #7
On Fri, Dec 12, 2014 at 05:34:01AM +0000, fugang.duan@freescale.com wrote:
> I will submit one patch to fix the issue.

There's more bugs in the FEC driver... here's the relevant bits:

static void
fec_enet_tx_queue(struct net_device *ndev, u16 queue_id)
{
        bdp = txq->dirty_tx;

        bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);

        while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
                /* current queue is empty */
                if (bdp == txq->cur_tx)
                        break;

                skb = txq->tx_skbuff[index];
                txq->tx_skbuff[index] = NULL;
                if (!IS_TSO_HEADER(txq, bdp->cbd_bufaddr))
                        dma_unmap_single(&fep->pdev->dev, bdp->cbd_bufaddr,
                                        bdp->cbd_datlen, DMA_TO_DEVICE);
                bdp->cbd_bufaddr = 0;
                if (!skb) {
                        bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
                        continue;
                }
...
                txq->dirty_tx = bdp;
                bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
        }

Consider the following code path:
- we enter this function
- get the dirty_tx pointer
- move to the next descriptor (which we'll call descriptor A)
- next descriptor indicates that TX_READY = 0
- bdp != txq->cur_tx
- we unmap if needed
- we set bdp->cmdbufaddr = 0
- assume skb is NULL, so we move to the next descriptor (we'll call this B)
- next descriptor _may_ have TX_READY = 1
- we break out of the loop, and return

Some time later, we re-enter:
- get the dirty_tx pointer
- move to the next descriptor (which is descriptor A above)
- next descriptor indicates that TX_READY = 0
- bdp != txq->cur_tx
- we call dma_unmap_single(..., bdp->cbd_bufaddr, which we previously zeroed
  - the DMA API debugging complains that FEC is unmapping memory which it
    doesn't own

Unfortunately, this does appear to happen - from a paste from Jon
Nettleton from iMX6Q:

 32. [   45.033001] unmapping this address 0x0 size 66
 33. [   45.037470] ------------[ cut here ]------------
 34. [   45.042127] WARNING: CPU: 0 PID: 102 at lib/dma-debug.c:1080 check_unmap+0x784/0x9f4()
 35. [   45.050066] fec 2188000.ethernet: DMA-API: device driver tries to free DMA memory it has not a]

(where the printk at line 32 is something that was added to debug this.)

The sad thing is that the remainder of my FEC patches did go a long way
to clean up these kinds of issues in the driver (and there's /many/ of
them), but unfortunately other conflicting changes got merged before I
could finish rebasing them, I decided to move on to other things and
discard the remainder of my patch set.  Marek showed some interest in
taking the patch set over, but I've not heard anything more - and I'm
not about to resurect my efforts only to get into the same situation
where I'm carrying 50 odd patches which I can't merge back into mainline
without spending weeks endlessly rebasing them.
Fugang Duan Dec. 16, 2014, 2:19 a.m. UTC | #8
From: Russell King - ARM Linux <linux@arm.linux.org.uk> Sent: Tuesday, December 16, 2014 2:05 AM
> To: Duan Fugang-B38611
> Cc: David Miller; Estevam Fabio-R49496; ezequiel.garcia@free-
> electrons.com; netdev@vger.kernel.org
> Subject: Re: Bug: mv643xxx fails with highmem
> 
> On Fri, Dec 12, 2014 at 05:34:01AM +0000, fugang.duan@freescale.com wrote:
> > I will submit one patch to fix the issue.
> 
> There's more bugs in the FEC driver... here's the relevant bits:
> 
> static void
> fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) {
>         bdp = txq->dirty_tx;
> 
>         bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
> 
>         while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
>                 /* current queue is empty */
>                 if (bdp == txq->cur_tx)
>                         break;
> 
>                 skb = txq->tx_skbuff[index];
>                 txq->tx_skbuff[index] = NULL;
>                 if (!IS_TSO_HEADER(txq, bdp->cbd_bufaddr))
>                         dma_unmap_single(&fep->pdev->dev, bdp-
> >cbd_bufaddr,
>                                         bdp->cbd_datlen, DMA_TO_DEVICE);
>                 bdp->cbd_bufaddr = 0;
>                 if (!skb) {
>                         bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
>                         continue;
>                 }
> ...
>                 txq->dirty_tx = bdp;
>                 bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
>         }
> 
> Consider the following code path:
> - we enter this function
> - get the dirty_tx pointer
> - move to the next descriptor (which we'll call descriptor A)
> - next descriptor indicates that TX_READY = 0
> - bdp != txq->cur_tx
> - we unmap if needed
> - we set bdp->cmdbufaddr = 0
> - assume skb is NULL, so we move to the next descriptor (we'll call this
> B)
> - next descriptor _may_ have TX_READY = 1
> - we break out of the loop, and return
> 
> Some time later, we re-enter:
> - get the dirty_tx pointer
> - move to the next descriptor (which is descriptor A above)
> - next descriptor indicates that TX_READY = 0
> - bdp != txq->cur_tx
> - we call dma_unmap_single(..., bdp->cbd_bufaddr, which we previously
> zeroed
>   - the DMA API debugging complains that FEC is unmapping memory which it
>     doesn't own
> 
> Unfortunately, this does appear to happen - from a paste from Jon
> Nettleton from iMX6Q:
> 
>  32. [   45.033001] unmapping this address 0x0 size 66  33. [   45.037470]
> ------------[ cut here ]------------  34. [   45.042127] WARNING: CPU: 0
> PID: 102 at lib/dma-debug.c:1080 check_unmap+0x784/0x9f4()
> 35. [   45.050066] fec 2188000.ethernet: DMA-API: device driver tries to
> free DMA memory it has not a]
> 
> (where the printk at line 32 is something that was added to debug this.)
> 
> The sad thing is that the remainder of my FEC patches did go a long way
> to clean up these kinds of issues in the driver (and there's /many/ of
> them), but unfortunately other conflicting changes got merged before I
> could finish rebasing them, I decided to move on to other things and
> discard the remainder of my patch set.  Marek showed some interest in
> taking the patch set over, but I've not heard anything more - and I'm not
> about to resurect my efforts only to get into the same situation where
> I'm carrying 50 odd patches which I can't merge back into mainline
> without spending weeks endlessly rebasing them.
> 
Russell, many thanks for your effort and thanks for your pointing out the bug.
I will think one method to fix it.

And I have one question for highmem dma mapping issue as below:
fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq, struct sk_buff *skb, struct net_device *ndev)
{
	...
		    bufaddr = page_address(this_frag->page.p) + this_frag->page_offset;

                index = fec_enet_get_bd_index(txq->tx_bd_base, bdp, fep);
                if (((unsigned long) bufaddr) & fep->tx_align ||
                        fep->quirks & FEC_QUIRK_SWAP_FRAME) {
                        memcpy(txq->tx_bounce[index], bufaddr, frag_len);
                        bufaddr = txq->tx_bounce[index];

                        if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
                                swap_buffer(bufaddr, frag_len);
                }

                addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
                                      DMA_TO_DEVICE);
                if (dma_mapping_error(&fep->pdev->dev, addr)) {
                        dev_kfree_skb_any(skb);
                        if (net_ratelimit())
                                netdev_err(ndev, "Tx DMA memory map failed\n");
                        goto dma_mapping_error;
                }
	...
}
If the frag page is located at high memory, use dma_map_single() is not right, must use skb_frag_dma_map() or dma_map_page().
But before mapping, if tx has buffer alignment limitation (tx_align is not zero), there need to do memcpy for buffer alignment.
So, there we need to check whether the page is in highmem, if so, we need to call kmap_atomic() or kmap_high_get() to get cpu address,
And then do memcpy or swap buffer operation.

Do you think the above solution is right ? or maybe there have better method to fix it ?

Regards,
Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Dec. 16, 2014, 10:57 a.m. UTC | #9
On Tue, Dec 16, 2014 at 02:19:38AM +0000, fugang.duan@freescale.com wrote:
> From: Russell King - ARM Linux <linux@arm.linux.org.uk> Sent: Tuesday, December 16, 2014 2:05 AM
> > To: Duan Fugang-B38611
> > Cc: David Miller; Estevam Fabio-R49496; ezequiel.garcia@free-
> > electrons.com; netdev@vger.kernel.org
> > Subject: Re: Bug: mv643xxx fails with highmem
> > 
> > On Fri, Dec 12, 2014 at 05:34:01AM +0000, fugang.duan@freescale.com wrote:
> > > I will submit one patch to fix the issue.
> > 
> > There's more bugs in the FEC driver... here's the relevant bits:
> > 
> > static void
> > fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) {
> >         bdp = txq->dirty_tx;
> > 
> >         bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
> > 
> >         while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
> >                 /* current queue is empty */
> >                 if (bdp == txq->cur_tx)
> >                         break;
> > 
> >                 skb = txq->tx_skbuff[index];
> >                 txq->tx_skbuff[index] = NULL;
> >                 if (!IS_TSO_HEADER(txq, bdp->cbd_bufaddr))
> >                         dma_unmap_single(&fep->pdev->dev, bdp-
> > >cbd_bufaddr,
> >                                         bdp->cbd_datlen, DMA_TO_DEVICE);
> >                 bdp->cbd_bufaddr = 0;
> >                 if (!skb) {
> >                         bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
> >                         continue;
> >                 }
> > ...
> >                 txq->dirty_tx = bdp;
> >                 bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
> >         }
> > 
> > Consider the following code path:
> > - we enter this function
> > - get the dirty_tx pointer
> > - move to the next descriptor (which we'll call descriptor A)
> > - next descriptor indicates that TX_READY = 0
> > - bdp != txq->cur_tx
> > - we unmap if needed
> > - we set bdp->cmdbufaddr = 0
> > - assume skb is NULL, so we move to the next descriptor (we'll call this
> > B)
> > - next descriptor _may_ have TX_READY = 1
> > - we break out of the loop, and return
> > 
> > Some time later, we re-enter:
> > - get the dirty_tx pointer
> > - move to the next descriptor (which is descriptor A above)
> > - next descriptor indicates that TX_READY = 0
> > - bdp != txq->cur_tx
> > - we call dma_unmap_single(..., bdp->cbd_bufaddr, which we previously
> > zeroed
> >   - the DMA API debugging complains that FEC is unmapping memory which it
> >     doesn't own
> > 
> > Unfortunately, this does appear to happen - from a paste from Jon
> > Nettleton from iMX6Q:
> > 
> >  32. [   45.033001] unmapping this address 0x0 size 66  33. [   45.037470]
> > ------------[ cut here ]------------  34. [   45.042127] WARNING: CPU: 0
> > PID: 102 at lib/dma-debug.c:1080 check_unmap+0x784/0x9f4()
> > 35. [   45.050066] fec 2188000.ethernet: DMA-API: device driver tries to
> > free DMA memory it has not a]
> > 
> > (where the printk at line 32 is something that was added to debug this.)
> > 
> > The sad thing is that the remainder of my FEC patches did go a long way
> > to clean up these kinds of issues in the driver (and there's /many/ of
> > them), but unfortunately other conflicting changes got merged before I
> > could finish rebasing them, I decided to move on to other things and
> > discard the remainder of my patch set.  Marek showed some interest in
> > taking the patch set over, but I've not heard anything more - and I'm not
> > about to resurect my efforts only to get into the same situation where
> > I'm carrying 50 odd patches which I can't merge back into mainline
> > without spending weeks endlessly rebasing them.
> > 
> Russell, many thanks for your effort and thanks for your pointing out the bug.
> I will think one method to fix it.
> 
> And I have one question for highmem dma mapping issue as below:
> fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq, struct sk_buff *skb, struct net_device *ndev)
> {
> 	...
> 		    bufaddr = page_address(this_frag->page.p) + this_frag->page_offset;
> 
>                 index = fec_enet_get_bd_index(txq->tx_bd_base, bdp, fep);
>                 if (((unsigned long) bufaddr) & fep->tx_align ||
>                         fep->quirks & FEC_QUIRK_SWAP_FRAME) {
>                         memcpy(txq->tx_bounce[index], bufaddr, frag_len);
>                         bufaddr = txq->tx_bounce[index];
> 
>                         if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
>                                 swap_buffer(bufaddr, frag_len);
>                 }
> 
>                 addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
>                                       DMA_TO_DEVICE);
>                 if (dma_mapping_error(&fep->pdev->dev, addr)) {
>                         dev_kfree_skb_any(skb);
>                         if (net_ratelimit())
>                                 netdev_err(ndev, "Tx DMA memory map failed\n");
>                         goto dma_mapping_error;
>                 }
> 	...
> }
>
> If the frag page is located at high memory, use dma_map_single() is not
> right, must use skb_frag_dma_map() or dma_map_page().

Correct.

> But before mapping, if tx has buffer alignment limitation (tx_align is
> not zero), there need to do memcpy for buffer alignment.

Right, and that can be detected by simply checking this_frag->page_offset
as we know that a page address is always aligned to a page.

> So, there we need to check whether the page is in highmem, if so, we
> need to call kmap_atomic() or kmap_high_get() to get cpu address,
> And then do memcpy or swap buffer operation.

Yes - you'd need to do something like this:

	if (this_frag->page_offset & fep->tx_align ||
	    fep->quirks & FEC_QUIRK_SWAP_FRAME) {
		bufaddr = kmap_atomic(this_frag->page.p) + this_frag->page_offset;
		memcpy(txq->tx_bounce[index], bufaddr, frag_len);
		kunmap_atomic(bufaddr);

		bufaddr = txq->tx_bounce[index];
		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
			swap_buffer(bufaddr, frag_len);
                addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
                                      DMA_TO_DEVICE);
	} else {
		addr = skb_frag_dma_map(&fep->pdev->dev, this_frag, 0,
					frag_len, DMA_TO_DEVICE);
	}

	if (dma_mapping_error(&fep->pdev->dev, addr)) {
		dev_kfree_skb_any(skb);
		if (net_ratelimit())
			netdev_err(ndev, "Tx DMA memory map failed\n");
		goto dma_mapping_error;
	}

You'll also need to record whether you should use dma_unmap_page() or
dma_unmap_single().
Ezequiel Garcia Dec. 17, 2014, 9:18 p.m. UTC | #10
Russell, David:

On 12/11/2014 05:25 PM, Russell King - ARM Linux wrote:
> On Thu, Dec 11, 2014 at 03:10:55PM -0500, David Miller wrote:
>> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
>> Date: Thu, 11 Dec 2014 19:49:20 +0000
>>
>>> Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
>>> all fragments with dma_map_single().  This fails when the driver is
>>> used in an environment with highmem.
>>
>> This change looks really buggy to me.
>>
>> Unfortunately, all the changes he subsequently makes for software TSO
>> support depend upon this :-/
>>
>> The change is definitely wrong.

I've been trying to find a fix for this issue, and also trying to
reproduce the bug.

As for the fix, we need to fix the non-TSO and TSO paths independently.
The former is fairly straightforward, but the latter might be a bit more
involved.

The problem is that the tso_t struct holds a pointer to the skb linear
and non-linear data.

struct tso_t {
        int next_frag_idx;
        void *data;
        size_t size;
        u16 ip_id;
        u32 tcp_seq;
};

Instead, we should deal with pages, and only map the non-linear skb with
skb_frag_dma_map().

On the other side, I haven't been able to reproduce this on my boards. I
did try to put a hack to hold most lowmem pages, but it didn't make a
difference. (In fact, I haven't been able to clearly see how the pages
for the skbuff are allocated from high memory.)

Russell, would you share any hints about your setup? I don't have access
to any Dove boards at the moment, but I do have Kirkwoods, Armadas and
i.MX6.

Thanks a lot for your report and help!
Fugang Duan Dec. 18, 2014, 2:29 a.m. UTC | #11
From: Russell King - ARM Linux <linux@arm.linux.org.uk> Sent: Tuesday, December 16, 2014 6:57 PM
> To: Duan Fugang-B38611
> Cc: David Miller; Estevam Fabio-R49496; ezequiel.garcia@free-
> electrons.com; netdev@vger.kernel.org
> Subject: Re: Bug: mv643xxx fails with highmem
> 
> On Tue, Dec 16, 2014 at 02:19:38AM +0000, fugang.duan@freescale.com wrote:
> > From: Russell King - ARM Linux <linux@arm.linux.org.uk> Sent: Tuesday,
> December 16, 2014 2:05 AM
> > > To: Duan Fugang-B38611
> > > Cc: David Miller; Estevam Fabio-R49496; ezequiel.garcia@free-
> > > electrons.com; netdev@vger.kernel.org
> > > Subject: Re: Bug: mv643xxx fails with highmem
> > >
> > > On Fri, Dec 12, 2014 at 05:34:01AM +0000, fugang.duan@freescale.com
> wrote:
> > > > I will submit one patch to fix the issue.
> > >
> > > There's more bugs in the FEC driver... here's the relevant bits:
> > >
> > > static void
> > > fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) {
> > >         bdp = txq->dirty_tx;
> > >
> > >         bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
> > >
> > >         while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
> > >                 /* current queue is empty */
> > >                 if (bdp == txq->cur_tx)
> > >                         break;
> > >
> > >                 skb = txq->tx_skbuff[index];
> > >                 txq->tx_skbuff[index] = NULL;
> > >                 if (!IS_TSO_HEADER(txq, bdp->cbd_bufaddr))
> > >                         dma_unmap_single(&fep->pdev->dev, bdp-
> > > >cbd_bufaddr,
> > >                                         bdp->cbd_datlen,
> DMA_TO_DEVICE);
> > >                 bdp->cbd_bufaddr = 0;
> > >                 if (!skb) {
> > >                         bdp = fec_enet_get_nextdesc(bdp, fep,
> queue_id);
> > >                         continue;
> > >                 }
> > > ...
> > >                 txq->dirty_tx = bdp;
> > >                 bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
> > >         }
> > >
> > > Consider the following code path:
> > > - we enter this function
> > > - get the dirty_tx pointer
> > > - move to the next descriptor (which we'll call descriptor A)
> > > - next descriptor indicates that TX_READY = 0
> > > - bdp != txq->cur_tx
> > > - we unmap if needed
> > > - we set bdp->cmdbufaddr = 0
> > > - assume skb is NULL, so we move to the next descriptor (we'll call
> this
> > > B)
> > > - next descriptor _may_ have TX_READY = 1
> > > - we break out of the loop, and return
> > >
> > > Some time later, we re-enter:
> > > - get the dirty_tx pointer
> > > - move to the next descriptor (which is descriptor A above)
> > > - next descriptor indicates that TX_READY = 0
> > > - bdp != txq->cur_tx
> > > - we call dma_unmap_single(..., bdp->cbd_bufaddr, which we previously
> > > zeroed
> > >   - the DMA API debugging complains that FEC is unmapping memory
> which it
> > >     doesn't own
> > >
> > > Unfortunately, this does appear to happen - from a paste from Jon
> > > Nettleton from iMX6Q:
> > >
> > >  32. [   45.033001] unmapping this address 0x0 size 66
> 33. [   45.037470]
> > > ------------[ cut here ]------------  34. [   45.042127] WARNING: CPU:
> 0
> > > PID: 102 at lib/dma-debug.c:1080 check_unmap+0x784/0x9f4()
> > > 35. [   45.050066] fec 2188000.ethernet: DMA-API: device driver tries
> to
> > > free DMA memory it has not a]
> > >
> > > (where the printk at line 32 is something that was added to debug
> this.)
> > >
> > > The sad thing is that the remainder of my FEC patches did go a long
> way
> > > to clean up these kinds of issues in the driver (and there's /many/
> of
> > > them), but unfortunately other conflicting changes got merged before
> I
> > > could finish rebasing them, I decided to move on to other things and
> > > discard the remainder of my patch set.  Marek showed some interest in
> > > taking the patch set over, but I've not heard anything more - and I'm
> not
> > > about to resurect my efforts only to get into the same situation
> where
> > > I'm carrying 50 odd patches which I can't merge back into mainline
> > > without spending weeks endlessly rebasing them.
> > >
> > Russell, many thanks for your effort and thanks for your pointing out
> the bug.
> > I will think one method to fix it.
> >
> > And I have one question for highmem dma mapping issue as below:
> > fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq, struct
> sk_buff *skb, struct net_device *ndev)
> > {
> > 	...
> > 		    bufaddr = page_address(this_frag->page.p) + this_frag-
> >page_offset;
> >
> >                 index = fec_enet_get_bd_index(txq->tx_bd_base, bdp,
> fep);
> >                 if (((unsigned long) bufaddr) & fep->tx_align ||
> >                         fep->quirks & FEC_QUIRK_SWAP_FRAME) {
> >                         memcpy(txq->tx_bounce[index], bufaddr,
> frag_len);
> >                         bufaddr = txq->tx_bounce[index];
> >
> >                         if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> >                                 swap_buffer(bufaddr, frag_len);
> >                 }
> >
> >                 addr = dma_map_single(&fep->pdev->dev, bufaddr,
> frag_len,
> >                                       DMA_TO_DEVICE);
> >                 if (dma_mapping_error(&fep->pdev->dev, addr)) {
> >                         dev_kfree_skb_any(skb);
> >                         if (net_ratelimit())
> >                                 netdev_err(ndev, "Tx DMA memory map
> failed\n");
> >                         goto dma_mapping_error;
> >                 }
> > 	...
> > }
> >
> > If the frag page is located at high memory, use dma_map_single() is not
> > right, must use skb_frag_dma_map() or dma_map_page().
> 
> Correct.
> 
> > But before mapping, if tx has buffer alignment limitation (tx_align is
> > not zero), there need to do memcpy for buffer alignment.
> 
> Right, and that can be detected by simply checking this_frag->page_offset
> as we know that a page address is always aligned to a page.
> 
> > So, there we need to check whether the page is in highmem, if so, we
> > need to call kmap_atomic() or kmap_high_get() to get cpu address,
> > And then do memcpy or swap buffer operation.
> 
> Yes - you'd need to do something like this:
> 
> 	if (this_frag->page_offset & fep->tx_align ||
> 	    fep->quirks & FEC_QUIRK_SWAP_FRAME) {
> 		bufaddr = kmap_atomic(this_frag->page.p) + this_frag-
> >page_offset;
> 		memcpy(txq->tx_bounce[index], bufaddr, frag_len);
> 		kunmap_atomic(bufaddr);
> 
> 		bufaddr = txq->tx_bounce[index];
> 		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> 			swap_buffer(bufaddr, frag_len);
>                 addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
>                                       DMA_TO_DEVICE);
> 	} else {
> 		addr = skb_frag_dma_map(&fep->pdev->dev, this_frag, 0,
> 					frag_len, DMA_TO_DEVICE);
> 	}
> 
> 	if (dma_mapping_error(&fep->pdev->dev, addr)) {
> 		dev_kfree_skb_any(skb);
> 		if (net_ratelimit())
> 			netdev_err(ndev, "Tx DMA memory map failed\n");
> 		goto dma_mapping_error;
> 	}
> 
> You'll also need to record whether you should use dma_unmap_page() or
> dma_unmap_single().
> 
Russell, thanks for your suggestion and comments. I will generate one patch to fix it like your suggestion.
(Sorry for late to reply your mail since I missed to read this mail)

Regards,
Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
index d44560d1d268..14d1fc9ff485 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -882,7 +882,9 @@  static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
 		void *addr;
  
 		this_frag = &skb_shinfo(skb)->frags[frag];
+		BUG_ON(PageHighMem(this_frag->page.p));
 		addr = page_address(this_frag->page.p) + this_frag->page_offset;
+		BUG_ON(!addr);
 		tx_index = txq->tx_curr_desc++;
 		if (txq->tx_curr_desc == txq->tx_ring_size)
 			txq->tx_curr_desc = 0;