diff mbox

mlx4_en: fix transmit of packages when blue frame is enabled

Message ID 1317388995-411-1-git-send-email-cascardo@linux.vnet.ibm.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Thadeu Lima de Souza Cascardo Sept. 30, 2011, 1:23 p.m. UTC
With the addition of Blue Frame support in the network driver for mlx4,
the doorbell is not written in the path where blue frame is enabled and
the package follows some characteristics.

The consequence of that is that ICMP ECHO requests, for example, were
not transmitted by the device. A ping flood, for example, would make the
watchdog dispatch, because the ring was full and transmissions have
timed out.

After this fix, I could ping two systems using mlx4_en (both with the
fix).

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

Comments

Yevgeny Petrilin Oct. 2, 2011, 8:58 a.m. UTC | #1
> With the addition of Blue Frame support in the network driver for mlx4, the doorbell is not written in the path where blue frame is enabled and the package follows some characteristics.
> 
> The consequence of that is that ICMP ECHO requests, for example, were not transmitted by the device. A ping flood, for example, would make the
> watchdog dispatch, because the ring was full and transmissions have timed out.
> 
> After this fix, I could ping two systems using mlx4_en (both with the fix).
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo
> <cascardo@linux.vnet.ibm.com>
> Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_tx.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> index 6e03de0..270da80 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> @@ -811,10 +811,11 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb,
> struct net_device *dev)
>  		* before setting ownership of this descriptor to HW */
>  		wmb();
>  		tx_desc->ctrl.owner_opcode = op_own;
> -		wmb();
> -		writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
>  	}
> 
> +	wmb();
> +	writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
> +
>  	/* Poll CQ here */
>  	mlx4_en_xmit_poll(priv, tx_ind);
> 
> --
> 1.7.4.4

What this patch does is ringing the doorbell unconditionally, whether blue flame is used or not.
The whole idea, is to copy the data to the blueflame register and have the HW take the data from there and save the memory access to the work queue entry.
In this patch you do both, copy the data to the register, and then still the HW accesses the memory to take it.
For some reason Blue flame is not working on your system and we need to understand why, I'll be glad to work with you to debug it.
Anyway, this patch is not the solution (even that it works on your systems) for the problem.

Yevgeny
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thadeu Lima de Souza Cascardo Oct. 3, 2011, 2:37 p.m. UTC | #2
On Sun, Oct 02, 2011 at 08:58:21AM +0000, Yevgeny Petrilin wrote:
> 
> > With the addition of Blue Frame support in the network driver for mlx4, the doorbell is not written in the path where blue frame is enabled and the package follows some characteristics.
> > 
> > The consequence of that is that ICMP ECHO requests, for example, were not transmitted by the device. A ping flood, for example, would make the
> > watchdog dispatch, because the ring was full and transmissions have timed out.
> > 
> > After this fix, I could ping two systems using mlx4_en (both with the fix).
> > 
> > Signed-off-by: Thadeu Lima de Souza Cascardo
> > <cascardo@linux.vnet.ibm.com>
> > Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
> > ---
> >  drivers/net/ethernet/mellanox/mlx4/en_tx.c |    5 +++--
> >  1 files changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > index 6e03de0..270da80 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
> > @@ -811,10 +811,11 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb,
> > struct net_device *dev)
> >  		* before setting ownership of this descriptor to HW */
> >  		wmb();
> >  		tx_desc->ctrl.owner_opcode = op_own;
> > -		wmb();
> > -		writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
> >  	}
> > 
> > +	wmb();
> > +	writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
> > +
> >  	/* Poll CQ here */
> >  	mlx4_en_xmit_poll(priv, tx_ind);
> > 
> > --
> > 1.7.4.4
> 
> What this patch does is ringing the doorbell unconditionally, whether blue flame is used or not.
> The whole idea, is to copy the data to the blueflame register and have the HW take the data from there and save the memory access to the work queue entry.
> In this patch you do both, copy the data to the register, and then still the HW accesses the memory to take it.
> For some reason Blue flame is not working on your system and we need to understand why, I'll be glad to work with you to debug it.
> Anyway, this patch is not the solution (even that it works on your systems) for the problem.
> 
> Yevgeny

Hello, Yevgeny.

We use a MT26448 (lspci -v output bellow) on a POWER7. Any other
information, tests or debug patches you want me to try, just tell me.

I expected this was really not the proper fix, but thought it would be
better to send it than just guess. Any clues what the problem might be?
Perhaps, we would have to disable blue frame for this particular device?

Regards,
Cascardo.

---
lspci output
0006:01:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
---
lspci -v output
0006:01:00.0 0200: 15b3:6750 (rev b0)
        Subsystem: 1014:0416
        Flags: bus master, fast devsel, latency 0, IRQ 31
        Memory at 3da47be00000 (64-bit, non-prefetchable) [size=1M]
        Memory at 3da47c000000 (64-bit, prefetchable) [size=32M]
        Expansion ROM at 3da47bf00000 [disabled] [size=1M]
        Capabilities: [40] Power Management version 3
        Capabilities: [48] Vital Product Data
        Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
        Capabilities: [60] Express Endpoint, MSI 00
        Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [148] Device Serial Number 00-02-c9-03-00-0f-50-be
        Kernel driver in use: mlx4_core
        Kernel modules: mlx4_en, mlx4_core
---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yevgeny Petrilin Oct. 3, 2011, 2:56 p.m. UTC | #3
> Hello, Yevgeny.
> 
> We use a MT26448 (lspci -v output bellow) on a POWER7. Any other
> information, tests or debug patches you want me to try, just tell me.
> 
> I expected this was really not the proper fix, but thought it would be
> better to send it than just guess. Any clues what the problem might be?
> Perhaps, we would have to disable blue frame for this particular device?
> 
> Regards,
> Cascardo.
> 
> ---
> lspci output
> 0006:01:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
> ---
> lspci -v output
> 0006:01:00.0 0200: 15b3:6750 (rev b0)
>         Subsystem: 1014:0416
>         Flags: bus master, fast devsel, latency 0, IRQ 31
>         Memory at 3da47be00000 (64-bit, non-prefetchable) [size=1M]
>         Memory at 3da47c000000 (64-bit, prefetchable) [size=32M]
>         Expansion ROM at 3da47bf00000 [disabled] [size=1M]
>         Capabilities: [40] Power Management version 3
>         Capabilities: [48] Vital Product Data
>         Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
>         Capabilities: [60] Express Endpoint, MSI 00
>         Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
>         Capabilities: [148] Device Serial Number 00-02-c9-03-00-0f-50-be
>         Kernel driver in use: mlx4_core
>         Kernel modules: mlx4_en, mlx4_core
> ---

Cascardo,

Can you also send me the output of ethtool -i?
It seems that there is a problem with write combining on Power processors, we will check this issue.

Yevgeny
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thadeu Lima de Souza Cascardo Oct. 3, 2011, 8:53 p.m. UTC | #4
On Mon, Oct 03, 2011 at 02:56:08PM +0000, Yevgeny Petrilin wrote:
> > Hello, Yevgeny.
> > 
> > We use a MT26448 (lspci -v output bellow) on a POWER7. Any other
> > information, tests or debug patches you want me to try, just tell me.
> > 
> > I expected this was really not the proper fix, but thought it would be
> > better to send it than just guess. Any clues what the problem might be?
> > Perhaps, we would have to disable blue frame for this particular device?
> > 
> > Regards,
> > Cascardo.
> > 
> > ---
> > lspci output
> > 0006:01:00.0 Ethernet controller: Mellanox Technologies MT26448 [ConnectX EN 10GigE, PCIe 2.0 5GT/s] (rev b0)
> > ---
> > lspci -v output
> > 0006:01:00.0 0200: 15b3:6750 (rev b0)
> >         Subsystem: 1014:0416
> >         Flags: bus master, fast devsel, latency 0, IRQ 31
> >         Memory at 3da47be00000 (64-bit, non-prefetchable) [size=1M]
> >         Memory at 3da47c000000 (64-bit, prefetchable) [size=32M]
> >         Expansion ROM at 3da47bf00000 [disabled] [size=1M]
> >         Capabilities: [40] Power Management version 3
> >         Capabilities: [48] Vital Product Data
> >         Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
> >         Capabilities: [60] Express Endpoint, MSI 00
> >         Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
> >         Capabilities: [148] Device Serial Number 00-02-c9-03-00-0f-50-be
> >         Kernel driver in use: mlx4_core
> >         Kernel modules: mlx4_en, mlx4_core
> > ---
> 
> Cascardo,
> 
> Can you also send me the output of ethtool -i?
> It seems that there is a problem with write combining on Power processors, we will check this issue.
> 
> Yevgeny

Hello, Yevgeny.

You will find the output of ethtool -i below.

I am copying Ben and powerpc list, in case this is an issue with Power
processors. They can provide us some more insight into this.

Thanks,
Cascardo.

---
# ethtool -i eth10
driver: mlx4_en
version: 1.5.4.1 (March 2011)
firmware-version: 2.9.4130
bus-info: 0006:01:00.0
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt Oct. 4, 2011, 6:02 a.m. UTC | #5
On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:

 .../...

> > Can you also send me the output of ethtool -i?
> > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > 
> > Yevgeny
> 
> Hello, Yevgeny.
> 
> You will find the output of ethtool -i below.
> 
> I am copying Ben and powerpc list, in case this is an issue with Power
> processors. They can provide us some more insight into this.

May I get some background please ? :-)

I'm not aware of a specific issue with write combining but I'd need to
know more about what you are doing and the code to do it to comment on
whether it should work or not.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thadeu Lima de Souza Cascardo Oct. 4, 2011, 8:26 p.m. UTC | #6
On Tue, Oct 04, 2011 at 08:02:12AM +0200, Benjamin Herrenschmidt wrote:
> On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:
> 
>  .../...
> 
> > > Can you also send me the output of ethtool -i?
> > > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > > 
> > > Yevgeny
> > 
> > Hello, Yevgeny.
> > 
> > You will find the output of ethtool -i below.
> > 
> > I am copying Ben and powerpc list, in case this is an issue with Power
> > processors. They can provide us some more insight into this.
> 
> May I get some background please ? :-)
> 
> I'm not aware of a specific issue with write combining but I'd need to
> know more about what you are doing and the code to do it to comment on
> whether it should work or not.
> 
> Cheers,
> Ben.
> 
> 

Hello, Ben.

Sorry for that. I am testing mlx4_en driver on a POWER. Yevgeny has
added blue frame support, that does not require writing to the device
memory to indicate a new packet (the doorbell register as it is called).

Well, the ring is getting full with no interrupt or packets transmitted.
I simply added a write to the doorbell register and it works for me.
Yevgeny says this is not the right fix, claiming there is a problem with
write combining on POWER. The code uses memory barriers, so I don't know
why there is any problem.

I am posting the code here to show better what the situation is.
Yevgeny can tell more about the device and the driver.

The code below is the driver as of now, including a diff with what I
changed and had resulted OK for me. Before the blue frame support, the
only code executed was the else part. I can't tell much what the device
should be seeing and doing after the blue frame part of the code is
executed. But it does send the packet if I write to the doorbell
register.

Yevgeny, can you tell us what the device should be doing and why you
think this is a problem on POWER? Is it possible that this is simply a
problem with the firmware version?

Thanks,
Cascardo.

---
        if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
!vlan_tag) {
                *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
ring->doorbell_qpn;
                op_own |= htonl((bf_index & 0xffff) << 8);
                /* Ensure new descirptor hits memory
                * before setting ownership of this descriptor to HW */
                wmb();
                tx_desc->ctrl.owner_opcode = op_own;

                wmb();

                mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
long *) &tx_desc->ctrl,
                     desc_size);

                wmb();

                ring->bf.offset ^= ring->bf.buf_size;
        } else {
                /* Ensure new descirptor hits memory
                * before setting ownership of this descriptor to HW */
                wmb();
                tx_desc->ctrl.owner_opcode = op_own;
-               wmb();
-               writel(ring->doorbell_qpn, ring->bf.uar->map +
MLX4_SEND_DOORBELL);
        }

+       wmb();
+       writel(ring->doorbell_qpn, ring->bf.uar->map +
MLX4_SEND_DOORBELL);
+
---

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eli Cohen Oct. 5, 2011, 8:15 a.m. UTC | #7
On Tue, Oct 04, 2011 at 05:26:20PM -0300, Thadeu Lima de Souza Cascardo wrote:

I believe we have an endianess problem here. The source buffer is in
big endian - in x86 archs, it will rich the pci device unswapped since
both x86 and pci are little endian. In ppc, it wil be swapped by the
chipset so it will reach the device in little endian which is wrong.
So, in mlx4_bf_copy, you could loop over the buffer and swap32 the all
the dwords before the call to __iowrite64_copy. Of course which should
fix this in an arch independent manner. Let me know this works for
you.

> On Tue, Oct 04, 2011 at 08:02:12AM +0200, Benjamin Herrenschmidt wrote:
> > On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:
> > 
> >  .../...
> > 
> > > > Can you also send me the output of ethtool -i?
> > > > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > > > 
> > > > Yevgeny
> > > 
> > > Hello, Yevgeny.
> > > 
> > > You will find the output of ethtool -i below.
> > > 
> > > I am copying Ben and powerpc list, in case this is an issue with Power
> > > processors. They can provide us some more insight into this.
> > 
> > May I get some background please ? :-)
> > 
> > I'm not aware of a specific issue with write combining but I'd need to
> > know more about what you are doing and the code to do it to comment on
> > whether it should work or not.
> > 
> > Cheers,
> > Ben.
> > 
> > 
> 
> Hello, Ben.
> 
> Sorry for that. I am testing mlx4_en driver on a POWER. Yevgeny has
> added blue frame support, that does not require writing to the device
> memory to indicate a new packet (the doorbell register as it is called).
> 
> Well, the ring is getting full with no interrupt or packets transmitted.
> I simply added a write to the doorbell register and it works for me.
> Yevgeny says this is not the right fix, claiming there is a problem with
> write combining on POWER. The code uses memory barriers, so I don't know
> why there is any problem.
> 
> I am posting the code here to show better what the situation is.
> Yevgeny can tell more about the device and the driver.
> 
> The code below is the driver as of now, including a diff with what I
> changed and had resulted OK for me. Before the blue frame support, the
> only code executed was the else part. I can't tell much what the device
> should be seeing and doing after the blue frame part of the code is
> executed. But it does send the packet if I write to the doorbell
> register.
> 
> Yevgeny, can you tell us what the device should be doing and why you
> think this is a problem on POWER? Is it possible that this is simply a
> problem with the firmware version?
> 
> Thanks,
> Cascardo.
> 
> ---
>         if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
> !vlan_tag) {
>                 *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
> ring->doorbell_qpn;
>                 op_own |= htonl((bf_index & 0xffff) << 8);
>                 /* Ensure new descirptor hits memory
>                 * before setting ownership of this descriptor to HW */
>                 wmb();
>                 tx_desc->ctrl.owner_opcode = op_own;
> 
>                 wmb();
> 
>                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
> long *) &tx_desc->ctrl,
>                      desc_size);
> 
>                 wmb();
> 
>                 ring->bf.offset ^= ring->bf.buf_size;
>         } else {
>                 /* Ensure new descirptor hits memory
>                 * before setting ownership of this descriptor to HW */
>                 wmb();
>                 tx_desc->ctrl.owner_opcode = op_own;
> -               wmb();
> -               writel(ring->doorbell_qpn, ring->bf.uar->map +
> MLX4_SEND_DOORBELL);
>         }
> 
> +       wmb();
> +       writel(ring->doorbell_qpn, ring->bf.uar->map +
> MLX4_SEND_DOORBELL);
> +
> ---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt Oct. 9, 2011, 7:19 a.m. UTC | #8
On Tue, 2011-10-04 at 17:26 -0300, Thadeu Lima de Souza Cascardo wrote:

>         if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
> !vlan_tag) {
>                 *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
> ring->doorbell_qpn;

Could this have endianness problems ?

>                 op_own |= htonl((bf_index & 0xffff) << 8);

Probably better to use cpu_to_be/lexx() here. Also make sure your
descriptors are defined with the appropriate __beXX or __leXX types
so sparse can tell you when you get this wrong

>                 /* Ensure new descirptor hits memory
>                 * before setting ownership of this descriptor to HW */
>                 wmb();
>                 tx_desc->ctrl.owner_opcode = op_own;

>                 wmb();
> 
>                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
> long *) &tx_desc->ctrl,
>                      desc_size);

What does the above do ?

>                 wmb();
> 
>                 ring->bf.offset ^= ring->bf.buf_size;

Is this HW visible ? It's a bit odd, are you doing double bufferring and
trying to flip a bit ? If it's HW visible, is endian correct ?
 
        } else {
>                 /* Ensure new descirptor hits memory
>                 * before setting ownership of this descriptor to HW */
>                 wmb();
>                 tx_desc->ctrl.owner_opcode = op_own;
> -               wmb();
> -               writel(ring->doorbell_qpn, ring->bf.uar->map +
> MLX4_SEND_DOORBELL);
>         }
> 
> +       wmb();
> +       writel(ring->doorbell_qpn, ring->bf.uar->map +
> MLX4_SEND_DOORBELL);
> +
> ---

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt Oct. 9, 2011, 7:21 a.m. UTC | #9
On Wed, 2011-10-05 at 10:15 +0200, Eli Cohen wrote:
> On Tue, Oct 04, 2011 at 05:26:20PM -0300, Thadeu Lima de Souza Cascardo wrote:
> 
> I believe we have an endianess problem here. The source buffer is in
> big endian - in x86 archs, it will rich the pci device unswapped since
> both x86 and pci are little endian. In ppc, it wil be swapped by the
> chipset 
  ^^^^^^^ ugh ?

> so it will reach the device in little endian which is wrong.
> So, in mlx4_bf_copy, you could loop over the buffer and swap32 the all
> the dwords before the call to __iowrite64_copy. Of course which should
> fix this in an arch independent manner. Let me know this works for
> you.

That's generally the wrong way to handle endian (to swap32 a whole
buffer).

But I need to know more about the structure of those descriptors etc...
to judge properly.

Cheers,
Ben.

> > On Tue, Oct 04, 2011 at 08:02:12AM +0200, Benjamin Herrenschmidt wrote:
> > > On Mon, 2011-10-03 at 17:53 -0300, Thadeu Lima de Souza Cascardo wrote:
> > > 
> > >  .../...
> > > 
> > > > > Can you also send me the output of ethtool -i?
> > > > > It seems that there is a problem with write combining on Power processors, we will check this issue.
> > > > > 
> > > > > Yevgeny
> > > > 
> > > > Hello, Yevgeny.
> > > > 
> > > > You will find the output of ethtool -i below.
> > > > 
> > > > I am copying Ben and powerpc list, in case this is an issue with Power
> > > > processors. They can provide us some more insight into this.
> > > 
> > > May I get some background please ? :-)
> > > 
> > > I'm not aware of a specific issue with write combining but I'd need to
> > > know more about what you are doing and the code to do it to comment on
> > > whether it should work or not.
> > > 
> > > Cheers,
> > > Ben.
> > > 
> > > 
> > 
> > Hello, Ben.
> > 
> > Sorry for that. I am testing mlx4_en driver on a POWER. Yevgeny has
> > added blue frame support, that does not require writing to the device
> > memory to indicate a new packet (the doorbell register as it is called).
> > 
> > Well, the ring is getting full with no interrupt or packets transmitted.
> > I simply added a write to the doorbell register and it works for me.
> > Yevgeny says this is not the right fix, claiming there is a problem with
> > write combining on POWER. The code uses memory barriers, so I don't know
> > why there is any problem.
> > 
> > I am posting the code here to show better what the situation is.
> > Yevgeny can tell more about the device and the driver.
> > 
> > The code below is the driver as of now, including a diff with what I
> > changed and had resulted OK for me. Before the blue frame support, the
> > only code executed was the else part. I can't tell much what the device
> > should be seeing and doing after the blue frame part of the code is
> > executed. But it does send the packet if I write to the doorbell
> > register.
> > 
> > Yevgeny, can you tell us what the device should be doing and why you
> > think this is a problem on POWER? Is it possible that this is simply a
> > problem with the firmware version?
> > 
> > Thanks,
> > Cascardo.
> > 
> > ---
> >         if (ring->bf_enabled && desc_size <= MAX_BF && !bounce &&
> > !vlan_tag) {
> >                 *(u32 *) (&tx_desc->ctrl.vlan_tag) |=
> > ring->doorbell_qpn;
> >                 op_own |= htonl((bf_index & 0xffff) << 8);
> >                 /* Ensure new descirptor hits memory
> >                 * before setting ownership of this descriptor to HW */
> >                 wmb();
> >                 tx_desc->ctrl.owner_opcode = op_own;
> > 
> >                 wmb();
> > 
> >                 mlx4_bf_copy(ring->bf.reg + ring->bf.offset, (unsigned
> > long *) &tx_desc->ctrl,
> >                      desc_size);
> > 
> >                 wmb();
> > 
> >                 ring->bf.offset ^= ring->bf.buf_size;
> >         } else {
> >                 /* Ensure new descirptor hits memory
> >                 * before setting ownership of this descriptor to HW */
> >                 wmb();
> >                 tx_desc->ctrl.owner_opcode = op_own;
> > -               wmb();
> > -               writel(ring->doorbell_qpn, ring->bf.uar->map +
> > MLX4_SEND_DOORBELL);
> >         }
> > 
> > +       wmb();
> > +       writel(ring->doorbell_qpn, ring->bf.uar->map +
> > MLX4_SEND_DOORBELL);
> > +
> > ---


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 6e03de0..270da80 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -811,10 +811,11 @@  netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		* before setting ownership of this descriptor to HW */
 		wmb();
 		tx_desc->ctrl.owner_opcode = op_own;
-		wmb();
-		writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
 	}
 
+	wmb();
+	writel(ring->doorbell_qpn, ring->bf.uar->map + MLX4_SEND_DOORBELL);
+
 	/* Poll CQ here */
 	mlx4_en_xmit_poll(priv, tx_ind);