diff mbox

VLAN I/F's and TX queue.

Message ID 1273222403.2261.26.camel@edumazet-laptop
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet May 7, 2010, 8:53 a.m. UTC
Le vendredi 07 mai 2010 à 10:04 +0200, Joakim Tjernlund a écrit :
> Joakim Tjernlund/Transmode wrote on 2010/05/03 13:34:28:
> >
> > We noted dropped pkgs on our VLAN interfaces and i stated to look
> > for a cause. Here is a ifconfig example:
> >
> > eth0      Link encap:Ethernet  HWaddr 00:AA:BB:CC:DD:EE
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:8886910 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:8880219 errors:0 dropped:0 overruns:0 carrier:0
> >           collisions:0 txqueuelen:100
> >           RX bytes:1626842951 (1.5 GiB)  TX bytes:1555540810 (1.4 GiB)
> >
> > eth0.1    Link encap:Ethernet  HWaddr 00:AA:BB:CC:DD:EE
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:2163164 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:2161943 errors:0 dropped:98 overruns:0 carrier:0
> >           collisions:0 txqueuelen:0
> >           RX bytes:2467090557 (2.2 GiB)  TX bytes:2480246455 (2.3 GiB)
> >
> > eth0.1.1  Link encap:Ethernet  HWaddr 00:AA:BB:CC:DD:EE
> >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >           RX packets:2163164 errors:0 dropped:0 overruns:0 frame:0
> >           TX packets:2161943 errors:0 dropped:98 overruns:0 carrier:0
> >           collisions:0 txqueuelen:0
> >           RX bytes:2458437901 (2.2 GiB)  TX bytes:2471598683 (2.3 GiB)
> >
> > Here I note that txqueuelen is 0 for eth0.1/eth0.1.1 and 100 for eth0 and
> > that it is only eth0.1 and eth0.1.1 that drops pkgs. It feels as if eth0.1
> > bypasses eth0's tx queue and passes pkgs directly to the HW driver. Is that so?
> > If so, that feels a bit strange and I am not sure how to best
> > fix this. Any ides?
> >
> > Using kernel 2.6.33
> 
> So I did some more testing
> two nodes A and B connected over a slow link.
> Create two VLAN's as above and start pinging from A to B
> with pkg size 9600, start a few(4-10) parallel ping processes.
> 
> Now I see dropped packages on B, the receiver of pings, and no
> pkg loss on A.
> 

dropped on RX path or TX path ?

> 1) since the link is symmetrical, why do I only see pkg loss
>    at B?
> 
> 2) pkg loss in B only manifests on the VLAN's interfaces and
>    always in pair as if one lost pkg is counted twice?
> 

Congestion notifications can be stacked since commit cbbef5e183079
(vlan/macvlan: propagate transmission state to upper layers)

> 3) I would expect lost pkgs to be accounted on eth0 instead of
>    the VLAN interface(s) since that is where the pkg is lost, why
>    isn't it so?

You try to send packets on eth0.XXX, some are dropped, and accounted for
on eth0.XXX stats. What is wrong with this ?

If you want to avoid this, just add queues to your vlans

ip link add link eth0 eth0.103 txqueuelen 100 type vlan id 103

Patrick what do you think of special casing NET_XMIT_CN ?




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Joakim Tjernlund May 7, 2010, 9:29 a.m. UTC | #1
Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/05/07 10:53:23:
>
> Le vendredi 07 mai 2010 à 10:04 +0200, Joakim Tjernlund a écrit :
> > Joakim Tjernlund/Transmode wrote on 2010/05/03 13:34:28:
> > >
> > > We noted dropped pkgs on our VLAN interfaces and i stated to look
> > > for a cause. Here is a ifconfig example:
> > >
> > > eth0      Link encap:Ethernet  HWaddr 00:AA:BB:CC:DD:EE
> > >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> > >           RX packets:8886910 errors:0 dropped:0 overruns:0 frame:0
> > >           TX packets:8880219 errors:0 dropped:0 overruns:0 carrier:0
> > >           collisions:0 txqueuelen:100
> > >           RX bytes:1626842951 (1.5 GiB)  TX bytes:1555540810 (1.4 GiB)
> > >
> > > eth0.1    Link encap:Ethernet  HWaddr 00:AA:BB:CC:DD:EE
> > >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> > >           RX packets:2163164 errors:0 dropped:0 overruns:0 frame:0
> > >           TX packets:2161943 errors:0 dropped:98 overruns:0 carrier:0
> > >           collisions:0 txqueuelen:0
> > >           RX bytes:2467090557 (2.2 GiB)  TX bytes:2480246455 (2.3 GiB)
> > >
> > > eth0.1.1  Link encap:Ethernet  HWaddr 00:AA:BB:CC:DD:EE
> > >           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> > >           RX packets:2163164 errors:0 dropped:0 overruns:0 frame:0
> > >           TX packets:2161943 errors:0 dropped:98 overruns:0 carrier:0
> > >           collisions:0 txqueuelen:0
> > >           RX bytes:2458437901 (2.2 GiB)  TX bytes:2471598683 (2.3 GiB)
> > >
> > > Here I note that txqueuelen is 0 for eth0.1/eth0.1.1 and 100 for eth0 and
> > > that it is only eth0.1 and eth0.1.1 that drops pkgs. It feels as if eth0.1
> > > bypasses eth0's tx queue and passes pkgs directly to the HW driver. Is that so?
> > > If so, that feels a bit strange and I am not sure how to best
> > > fix this. Any ides?
> > >
> > > Using kernel 2.6.33
> >
> > So I did some more testing
> > two nodes A and B connected over a slow link.
> > Create two VLAN's as above and start pinging from A to B
> > with pkg size 9600, start a few(4-10) parallel ping processes.
> >
> > Now I see dropped packages on B, the receiver of pings, and no
> > pkg loss on A.
> >
>
> dropped on RX path or TX path ?

On TX path(see the ifconfig listing above)

>
> > 1) since the link is symmetrical, why do I only see pkg loss
> >    at B?
> >
> > 2) pkg loss in B only manifests on the VLAN's interfaces and
> >    always in pair as if one lost pkg is counted twice?
> >
>
> Congestion notifications can be stacked since commit cbbef5e183079
> (vlan/macvlan: propagate transmission state to upper layers)

I see.

>
> > 3) I would expect lost pkgs to be accounted on eth0 instead of
> >    the VLAN interface(s) since that is where the pkg is lost, why
> >    isn't it so?
>
> You try to send packets on eth0.XXX, some are dropped, and accounted for
> on eth0.XXX stats. What is wrong with this ?

In this case one lost pkg is accounted for twice, once on eth0.1 and
once more on eth0.1.1. Note that eth0.1.1 is stacked on
top of eth0.1

I would at least expect eth0 to also account lost pkgs too.
I was confused by the current accounting as I knew that
the underlying HW I/F should be the only I/F that could
drop pkgs.

>
> If you want to avoid this, just add queues to your vlans
>
> ip link add link eth0 eth0.103 txqueuelen 100 type vlan id 103

From memory now, but that didn't help. Still accounts pgks
as described. Why would where to account pkgs be affected by
queue or no queue?

  Jocke

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy May 10, 2010, 2:26 p.m. UTC | #2
Eric Dumazet wrote:
> Le vendredi 07 mai 2010 à 10:04 +0200, Joakim Tjernlund a écrit :
>> So I did some more testing
>> two nodes A and B connected over a slow link.
>> Create two VLAN's as above and start pinging from A to B
>> with pkg size 9600, start a few(4-10) parallel ping processes.
>>
>> Now I see dropped packages on B, the receiver of pings, and no
>> pkg loss on A.
>>
> 
> dropped on RX path or TX path ?
> 
>> 1) since the link is symmetrical, why do I only see pkg loss
>>    at B?
>>
>> 2) pkg loss in B only manifests on the VLAN's interfaces and
>>    always in pair as if one lost pkg is counted twice?
>>
> 
> Congestion notifications can be stacked since commit cbbef5e183079
> (vlan/macvlan: propagate transmission state to upper layers)
> 
>> 3) I would expect lost pkgs to be accounted on eth0 instead of
>>    the VLAN interface(s) since that is where the pkg is lost, why
>>    isn't it so?
> 
> You try to send packets on eth0.XXX, some are dropped, and accounted for
> on eth0.XXX stats. What is wrong with this ?
> 
> If you want to avoid this, just add queues to your vlans
> 
> ip link add link eth0 eth0.103 txqueuelen 100 type vlan id 103
> 
> Patrick what do you think of special casing NET_XMIT_CN ?

Is the intention just to avoid accounting the packet as dropped?
That seems fine to me since in case of NET_XMIT_CN its actually
not the currently transmitted packet that was dropped.

But part of the intention of the above mentioned patch was actually
to inform higher layers of congestion so they can take action if
desired, which would be defeated by this patch.

> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
> index b5249c5..c671b1a 100644
> --- a/net/8021q/vlan_dev.c
> +++ b/net/8021q/vlan_dev.c
> @@ -327,6 +327,8 @@ static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
>  	len = skb->len;
>  	ret = dev_queue_xmit(skb);
>  
> +	ret = net_xmit_eval(ret);
> +
>  	if (likely(ret == NET_XMIT_SUCCESS)) {
>  		txq->tx_packets++;
>  		txq->tx_bytes += len;
> @@ -353,6 +355,8 @@ static netdev_tx_t vlan_dev_hwaccel_hard_start_xmit(struct sk_buff *skb,
>  	len = skb->len;
>  	ret = dev_queue_xmit(skb);
>  
> +	ret = net_xmit_eval(ret);
> +
>  	if (likely(ret == NET_XMIT_SUCCESS)) {
>  		txq->tx_packets++;
>  		txq->tx_bytes += len;
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy May 10, 2010, 2:33 p.m. UTC | #3
Joakim Tjernlund wrote:
> Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/05/07 10:53:23:
>>> 3) I would expect lost pkgs to be accounted on eth0 instead of
>>>    the VLAN interface(s) since that is where the pkg is lost, why
>>>    isn't it so?
>> You try to send packets on eth0.XXX, some are dropped, and accounted for
>> on eth0.XXX stats. What is wrong with this ?
> 
> In this case one lost pkg is accounted for twice, once on eth0.1 and
> once more on eth0.1.1. Note that eth0.1.1 is stacked on
> top of eth0.1
> 
> I would at least expect eth0 to also account lost pkgs too.
> I was confused by the current accounting as I knew that
> the underlying HW I/F should be the only I/F that could
> drop pkgs.

In case of NET_XMIT_CN, the packet is dropped by the qdisc before
it reaches eth0, so its only accounted on the upper devices.

>> If you want to avoid this, just add queues to your vlans
>>
>> ip link add link eth0 eth0.103 txqueuelen 100 type vlan id 103
> 
>>From memory now, but that didn't help. Still accounts pgks
> as described. Why would where to account pkgs be affected by
> queue or no queue?

If a queue is used on the vlan device, it will queue the packet
until the lower device is able to transmit it (unless its own
queue overflows).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joakim Tjernlund May 10, 2010, 2:50 p.m. UTC | #4
Patrick McHardy <kaber@trash.net> wrote on 2010/05/10 16:33:00:
>
> Joakim Tjernlund wrote:
> > Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/05/07 10:53:23:
> >>> 3) I would expect lost pkgs to be accounted on eth0 instead of
> >>>    the VLAN interface(s) since that is where the pkg is lost, why
> >>>    isn't it so?
> >> You try to send packets on eth0.XXX, some are dropped, and accounted for
> >> on eth0.XXX stats. What is wrong with this ?
> >
> > In this case one lost pkg is accounted for twice, once on eth0.1 and
> > once more on eth0.1.1. Note that eth0.1.1 is stacked on
> > top of eth0.1
> >
> > I would at least expect eth0 to also account lost pkgs too.
> > I was confused by the current accounting as I knew that
> > the underlying HW I/F should be the only I/F that could
> > drop pkgs.
>
> In case of NET_XMIT_CN, the packet is dropped by the qdisc before
> it reaches eth0, so its only accounted on the upper devices.

hmm, I am afraid I don't follow this. Why would a pkg be dropped before
it reaches eth0?

>
> >> If you want to avoid this, just add queues to your vlans
> >>
> >> ip link add link eth0 eth0.103 txqueuelen 100 type vlan id 103
> >
> >>From memory now, but that didn't help. Still accounts pgks
> > as described. Why would where to account pkgs be affected by
> > queue or no queue?
>
> If a queue is used on the vlan device, it will queue the packet
> until the lower device is able to transmit it (unless its own
> queue overflows).

And if a pkg is is lost this also changes where to account dropped?
I don't understand this. The queue may prevent pkg loss to some degree
but I don't get why a queue!=0 would change on which interface to
account for lost pkg's.

      Jocke

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller May 16, 2010, 7:40 a.m. UTC | #5
From: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Date: Mon, 10 May 2010 16:50:20 +0200

> Patrick McHardy <kaber@trash.net> wrote on 2010/05/10 16:33:00:
>>
>> Joakim Tjernlund wrote:
>> > Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/05/07 10:53:23:
>> >>> 3) I would expect lost pkgs to be accounted on eth0 instead of
>> >>>    the VLAN interface(s) since that is where the pkg is lost, why
>> >>>    isn't it so?
>> >> You try to send packets on eth0.XXX, some are dropped, and accounted for
>> >> on eth0.XXX stats. What is wrong with this ?
>> >
>> > In this case one lost pkg is accounted for twice, once on eth0.1 and
>> > once more on eth0.1.1. Note that eth0.1.1 is stacked on
>> > top of eth0.1
>> >
>> > I would at least expect eth0 to also account lost pkgs too.
>> > I was confused by the current accounting as I knew that
>> > the underlying HW I/F should be the only I/F that could
>> > drop pkgs.
>>
>> In case of NET_XMIT_CN, the packet is dropped by the qdisc before
>> it reaches eth0, so its only accounted on the upper devices.
> 
> hmm, I am afraid I don't follow this. Why would a pkg be dropped before
> it reaches eth0?

Because we have packet schedulers that sit before the device transmit
happens, and those packet schedulers enforce limits based upon
classification results or other criteria, and if those limits are
exceeded packets are droppers and NET_XMIT_CN is returned back up into
the transmit path of the networking stack.

The device never sees that packet get submitted to it's ->ndo_start_xmit()
routine, and this is entirely intentional.  And it is entirely intentional
that NET_XMIT_CN gets passed up into the caller, where protocols such as
TCP can key off this information to make congestion control decisions.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joakim Tjernlund May 16, 2010, 2:22 p.m. UTC | #6
David Miller <davem@davemloft.net> wrote on 2010/05/16 09:40:41:
>
> From: Joakim Tjernlund <joakim.tjernlund@transmode.se>
> Date: Mon, 10 May 2010 16:50:20 +0200
>
> > Patrick McHardy <kaber@trash.net> wrote on 2010/05/10 16:33:00:
> >>
> >> Joakim Tjernlund wrote:
> >> > Eric Dumazet <eric.dumazet@gmail.com> wrote on 2010/05/07 10:53:23:
> >> >>> 3) I would expect lost pkgs to be accounted on eth0 instead of
> >> >>>    the VLAN interface(s) since that is where the pkg is lost, why
> >> >>>    isn't it so?
> >> >> You try to send packets on eth0.XXX, some are dropped, and accounted for
> >> >> on eth0.XXX stats. What is wrong with this ?
> >> >
> >> > In this case one lost pkg is accounted for twice, once on eth0.1 and
> >> > once more on eth0.1.1. Note that eth0.1.1 is stacked on
> >> > top of eth0.1
> >> >
> >> > I would at least expect eth0 to also account lost pkgs too.
> >> > I was confused by the current accounting as I knew that
> >> > the underlying HW I/F should be the only I/F that could
> >> > drop pkgs.
> >>
> >> In case of NET_XMIT_CN, the packet is dropped by the qdisc before
> >> it reaches eth0, so its only accounted on the upper devices.
> >
> > hmm, I am afraid I don't follow this. Why would a pkg be dropped before
> > it reaches eth0?
>
> Because we have packet schedulers that sit before the device transmit
> happens, and those packet schedulers enforce limits based upon
> classification results or other criteria, and if those limits are
> exceeded packets are droppers and NET_XMIT_CN is returned back up into
> the transmit path of the networking stack.

OK, but what I don't get is if pgks are dropped as soon as the underlying
device cannot handle the pkg directly(returns !NETDEV_TX_OK or stops the queue)?
Are !NETDEV_TX_OK and stopping the queue handled differently by upper layers?
I would have expected the pkg be added to the TX queue and transmitted somewhat later.
If not, what is the TX queue for?

>
> The device never sees that packet get submitted to it's ->ndo_start_xmit()
> routine, and this is entirely intentional.  And it is entirely intentional
> that NET_XMIT_CN gets passed up into the caller, where protocols such as
> TCP can key off this information to make congestion control decisions.

In this case it gets passed up to the VLAN driver, should the VLAN driver
do something else to use the TX queue?

      Jocke

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index b5249c5..c671b1a 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -327,6 +327,8 @@  static netdev_tx_t vlan_dev_hard_start_xmit(struct sk_buff *skb,
 	len = skb->len;
 	ret = dev_queue_xmit(skb);
 
+	ret = net_xmit_eval(ret);
+
 	if (likely(ret == NET_XMIT_SUCCESS)) {
 		txq->tx_packets++;
 		txq->tx_bytes += len;
@@ -353,6 +355,8 @@  static netdev_tx_t vlan_dev_hwaccel_hard_start_xmit(struct sk_buff *skb,
 	len = skb->len;
 	ret = dev_queue_xmit(skb);
 
+	ret = net_xmit_eval(ret);
+
 	if (likely(ret == NET_XMIT_SUCCESS)) {
 		txq->tx_packets++;
 		txq->tx_bytes += len;