[1/4] veth: move loopback logic to common location

Message ID	200911241002.20904.arnd@arndb.de
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id AC919B6EE8 for <patchwork-incoming@ozlabs.org>; Tue, 24 Nov 2009 21:04:36 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932583AbZKXKDu (ORCPT <rfc822;patchwork-incoming@ozlabs.org>); Tue, 24 Nov 2009 05:03:50 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932556AbZKXKDu (ORCPT <rfc822; netdev-outgoing>); Tue, 24 Nov 2009 05:03:50 -0500 Received: from moutng.kundenserver.de ([212.227.17.9]:62797 "EHLO moutng.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932371AbZKXKDt (ORCPT <rfc822;netdev@vger.kernel.org>); Tue, 24 Nov 2009 05:03:49 -0500 Received: from wuerfel.localnet (port-92-200-14-72.dynamic.qsc.de [92.200.14.72]) by mrelayeu.kundenserver.de (node=mrbap2) with ESMTP (Nemesis) id 0MOU8z-1NI6tG1nt7-005zI6; Tue, 24 Nov 2009 11:02:25 +0100 From: Arnd Bergmann <arnd@arndb.de> To: Patrick McHardy <kaber@trash.net> Subject: Re: [PATCH 1/4] veth: move loopback logic to common location Date: Tue, 24 Nov 2009 10:02:20 +0000 User-Agent: KMail/1.12.2 (Linux/2.6.31bisect; KDE/4.3.2; x86_64; ; ) Cc: Eric Dumazet <eric.dumazet@gmail.com>, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, David Miller <davem@davemloft.net>, Stephen Hemminger <shemminger@vyatta.com>, Herbert Xu <herbert@gondor.apana.org.au>, Patrick Mullaney <pmullaney@novell.com>, "Eric W. Biederman" <ebiederm@xmission.com>, Edge Virtual Bridging <evb@yahoogroups.com>, Anna Fischer <anna.fischer@hp.com>, bridge@lists.linux-foundation.org, virtualization@lists.linux-foundation.org, Jens Osterkamp <jens@linux.vnet.ibm.com>, Gerhard Stenzel <gerhard.stenzel@de.ibm.com>, Mark Smith <lk-netdev@lk-netdev.nosense.org> References: <1259024166-28158-1-git-send-email-arnd@arndb.de> <1259024166-28158-2-git-send-email-arnd@arndb.de> <4B0BAC97.6010000@trash.net> In-Reply-To: <4B0BAC97.6010000@trash.net> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <200911241002.20904.arnd@arndb.de> X-Provags-ID: V01U2FsdGVkX1+kFr4mD15/7C3jE/DIKKqroOqpH13MwUtAkf9 mnTMkwZzXScQmooyijsTqg3kYAnLJSllD2AwbuNPFg8P9lMTlG XJyzmGUvXdi5PNTHwN94w== Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org

Arnd Bergmann Nov. 24, 2009, 10:02 a.m. UTC

On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote:
> > +     skb_dst_drop(skb);
> > +     skb->tstamp.tv64 = 0;
> > +     skb->pkt_type = PACKET_HOST;
> > +     skb->protocol = eth_type_trans(skb, dev);
> > +     skb->mark = 0;
> 
> skb->mark clearing should stay private to veth since its usually
> supposed to stay intact. The only exception is packets crossing
> namespaces, where they should appear like a freshly received skbs.

But isn't that what we want in macvlan as well when we're
forwarding from one downstream interface to another?

I did all my testing with macvlan interfaces in separate namespaces
communicating with each other, so I'd assume that we should always
clear skb->mark and skb->dst in this function. Maybe I should make
the documentation clearer?

---
net: clarify documentation of dev_forward_skb

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patrick McHardy Nov. 24, 2009, 10:17 a.m. UTC | #1

Arnd Bergmann wrote:
> On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote:
>>> +     skb_dst_drop(skb);
>>> +     skb->tstamp.tv64 = 0;
>>> +     skb->pkt_type = PACKET_HOST;
>>> +     skb->protocol = eth_type_trans(skb, dev);
>>> +     skb->mark = 0;
>> skb->mark clearing should stay private to veth since its usually
>> supposed to stay intact. The only exception is packets crossing
>> namespaces, where they should appear like a freshly received skbs.
> 
> But isn't that what we want in macvlan as well when we're
> forwarding from one downstream interface to another?

In the TX direction you can use the mark for TC classification
on the underlying device.

> I did all my testing with macvlan interfaces in separate namespaces
> communicating with each other, so I'd assume that we should always
> clear skb->mark and skb->dst in this function.

Good point, in that case we probably should clear it as well. But
in the non-namespace case the TC classification currently works and
this is consistent with any other virtual device driver, so it
should continue to work.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann Nov. 24, 2009, 10:34 a.m. UTC | #2

On Tuesday 24 November 2009 10:17:11 Patrick McHardy wrote:
> Arnd Bergmann wrote:
> > On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote:
> >>> +     skb_dst_drop(skb);
> >>> +     skb->tstamp.tv64 = 0;
> >>> +     skb->pkt_type = PACKET_HOST;
> >>> +     skb->protocol = eth_type_trans(skb, dev);
> >>> +     skb->mark = 0;
> >> skb->mark clearing should stay private to veth since its usually
> >> supposed to stay intact. The only exception is packets crossing
> >> namespaces, where they should appear like a freshly received skbs.
> > 
> > But isn't that what we want in macvlan as well when we're
> > forwarding from one downstream interface to another?
> 
> In the TX direction you can use the mark for TC classification
> on the underlying device.

I don't use dev_forward_skb for the case where the data is sent
to the underlying device, so the TC classification should stay
intact.
 
> > I did all my testing with macvlan interfaces in separate namespaces
> > communicating with each other, so I'd assume that we should always
> > clear skb->mark and skb->dst in this function.
> 
> Good point, in that case we probably should clear it as well. But
> in the non-namespace case the TC classification currently works and
> this is consistent with any other virtual device driver, so it
> should continue to work.

Do you think we should be able to use TC to direct traffic between
macvlans on the same underlying device in bridge mode? It does sound
useful, but I'm not sure how to implement that or if you'd expect
it to work with the current code. If we support that, it should probably
also work with namespaces, by consuming the mark in the macvlan
and veth drivers.

	Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patrick McHardy Nov. 24, 2009, 10:40 a.m. UTC | #3

Arnd Bergmann wrote:
> On Tuesday 24 November 2009 10:17:11 Patrick McHardy wrote:
>> Arnd Bergmann wrote:
>>> On Tuesday 24 November 2009 09:51:19 Patrick McHardy wrote:
>>>>> +     skb_dst_drop(skb);
>>>>> +     skb->tstamp.tv64 = 0;
>>>>> +     skb->pkt_type = PACKET_HOST;
>>>>> +     skb->protocol = eth_type_trans(skb, dev);
>>>>> +     skb->mark = 0;
>>>> skb->mark clearing should stay private to veth since its usually
>>>> supposed to stay intact. The only exception is packets crossing
>>>> namespaces, where they should appear like a freshly received skbs.
>>> But isn't that what we want in macvlan as well when we're
>>> forwarding from one downstream interface to another?
>> In the TX direction you can use the mark for TC classification
>> on the underlying device.
> 
> I don't use dev_forward_skb for the case where the data is sent
> to the underlying device, so the TC classification should stay
> intact.

Right, I see. This looks fine.

>>> I did all my testing with macvlan interfaces in separate namespaces
>>> communicating with each other, so I'd assume that we should always
>>> clear skb->mark and skb->dst in this function.
>> Good point, in that case we probably should clear it as well. But
>> in the non-namespace case the TC classification currently works and
>> this is consistent with any other virtual device driver, so it
>> should continue to work.
> 
> Do you think we should be able to use TC to direct traffic between
> macvlans on the same underlying device in bridge mode? It does sound
> useful, but I'm not sure how to implement that or if you'd expect
> it to work with the current code. If we support that, it should probably
> also work with namespaces, by consuming the mark in the macvlan
> and veth drivers.

I don't think its necessary, we bypass outgoing queuing anyways.
But if you'd want to add it, just keeping the skb->mark clearing
in veth should work from what I can tell.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann Nov. 24, 2009, 1:13 p.m. UTC | #4

On Tuesday 24 November 2009, Patrick McHardy wrote:
> I don't think its necessary, we bypass outgoing queuing anyways.
> But if you'd want to add it, just keeping the skb->mark clearing
> in veth should work from what I can tell.

Ok, I won't bother with it for now then.

	Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric W. Biederman Nov. 24, 2009, 4:42 p.m. UTC | #5

Patrick McHardy <kaber@trash.net> writes:

>>>> I did all my testing with macvlan interfaces in separate namespaces
>>>> communicating with each other, so I'd assume that we should always
>>>> clear skb->mark and skb->dst in this function.
>>> Good point, in that case we probably should clear it as well. But
>>> in the non-namespace case the TC classification currently works and
>>> this is consistent with any other virtual device driver, so it
>>> should continue to work.
>> 
>> Do you think we should be able to use TC to direct traffic between
>> macvlans on the same underlying device in bridge mode? It does sound
>> useful, but I'm not sure how to implement that or if you'd expect
>> it to work with the current code. If we support that, it should probably
>> also work with namespaces, by consuming the mark in the macvlan
>> and veth drivers.
>
> I don't think its necessary, we bypass outgoing queuing anyways.
> But if you'd want to add it, just keeping the skb->mark clearing
> in veth should work from what I can tell.

veth doesn't have an outgoing queue.  The reason we clear skb->mark
in veth is because when reentering the networking stack the packet
needs to be reclassified.  At the point of loopback we are talking
a packet that has at least logically gone out of the machine on a
wire and come back into the machine on another physical interface.

So it seems to me we should have consistent handling for macvlans,
veth, for the cases where we are looping packets back around.  In
practice I expect all of those cases are going to be cross namespace
as otherwise we would have intercepted the packet before going
out a physical interface.

Eric

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patrick McHardy Nov. 24, 2009, 4:56 p.m. UTC | #6

Eric W. Biederman wrote:
> Patrick McHardy <kaber@trash.net> writes:
> 
>>>>> I did all my testing with macvlan interfaces in separate namespaces
>>>>> communicating with each other, so I'd assume that we should always
>>>>> clear skb->mark and skb->dst in this function.
>>>> Good point, in that case we probably should clear it as well. But
>>>> in the non-namespace case the TC classification currently works and
>>>> this is consistent with any other virtual device driver, so it
>>>> should continue to work.
>>> Do you think we should be able to use TC to direct traffic between
>>> macvlans on the same underlying device in bridge mode? It does sound
>>> useful, but I'm not sure how to implement that or if you'd expect
>>> it to work with the current code. If we support that, it should probably
>>> also work with namespaces, by consuming the mark in the macvlan
>>> and veth drivers.
>> I don't think its necessary, we bypass outgoing queuing anyways.
>> But if you'd want to add it, just keeping the skb->mark clearing
>> in veth should work from what I can tell.
> 
> veth doesn't have an outgoing queue.  The reason we clear skb->mark
> in veth is because when reentering the networking stack the packet
> needs to be reclassified.  At the point of loopback we are talking
> a packet that has at least logically gone out of the machine on a
> wire and come back into the machine on another physical interface.
> 
> So it seems to me we should have consistent handling for macvlans,
> veth, for the cases where we are looping packets back around.  In
> practice I expect all of those cases are going to be cross namespace
> as otherwise we would have intercepted the packet before going
> out a physical interface.

Agreed on the looping case, that's what we're doing now.

In the layered case (macvlan -> eth0) its common behaviour to
keep the mark however. But in case of different namespaces,
I think macvlan should also clear the mark on the dev_queue_xmit()
path since this is just a shortcut to looping the packets
through veth. In fact probably both of them should also clear
skb->priority so other namespaces don't accidentally misclassify
packets.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric W. Biederman Nov. 24, 2009, 6:10 p.m. UTC | #7

Patrick McHardy <kaber@trash.net> writes:

> Eric W. Biederman wrote:
>> Patrick McHardy <kaber@trash.net> writes:
>> 
>>>>>> I did all my testing with macvlan interfaces in separate namespaces
>>>>>> communicating with each other, so I'd assume that we should always
>>>>>> clear skb->mark and skb->dst in this function.
>>>>> Good point, in that case we probably should clear it as well. But
>>>>> in the non-namespace case the TC classification currently works and
>>>>> this is consistent with any other virtual device driver, so it
>>>>> should continue to work.
>>>> Do you think we should be able to use TC to direct traffic between
>>>> macvlans on the same underlying device in bridge mode? It does sound
>>>> useful, but I'm not sure how to implement that or if you'd expect
>>>> it to work with the current code. If we support that, it should probably
>>>> also work with namespaces, by consuming the mark in the macvlan
>>>> and veth drivers.
>>> I don't think its necessary, we bypass outgoing queuing anyways.
>>> But if you'd want to add it, just keeping the skb->mark clearing
>>> in veth should work from what I can tell.
>> 
>> veth doesn't have an outgoing queue.  The reason we clear skb->mark
>> in veth is because when reentering the networking stack the packet
>> needs to be reclassified.  At the point of loopback we are talking
>> a packet that has at least logically gone out of the machine on a
>> wire and come back into the machine on another physical interface.
>> 
>> So it seems to me we should have consistent handling for macvlans,
>> veth, for the cases where we are looping packets back around.  In
>> practice I expect all of those cases are going to be cross namespace
>> as otherwise we would have intercepted the packet before going
>> out a physical interface.
>
> Agreed on the looping case, that's what we're doing now.
>
> In the layered case (macvlan -> eth0) its common behaviour to
> keep the mark however. But in case of different namespaces,
> I think macvlan should also clear the mark on the dev_queue_xmit()
> path since this is just a shortcut to looping the packets
> through veth. In fact probably both of them should also clear
> skb->priority so other namespaces don't accidentally misclassify
> packets.

That is why I pushed for what is becoming dev_forward_skb.  So that
we have one place where we can make all of those tweaks.  It seems
like in every review we find another field that should be cleared/handled
specially.

I don't quite follow what you intend with dev_queue_xmit when the macvlan
is in one namespace and the real physical device is in another.  Are
you mentioning that the packet classifier runs in the namespace where
the primary device lives with packets from a different namespace?

Eric

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Arnd Bergmann Nov. 24, 2009, 6:28 p.m. UTC | #8

On Tuesday 24 November 2009, Eric W. Biederman wrote:
> I don't quite follow what you intend with dev_queue_xmit when the macvlan
> is in one namespace and the real physical device is in another.  Are
> you mentioning that the packet classifier runs in the namespace where
> the primary device lives with packets from a different namespace?

I treat internal and external delivery very differently, the three
cases are:

1. skb from real device to macvlan (macvlan_handle_frame): basically
unchanged from before, except avoiding duplicate broadcasts. All
skbs end up in netif_rx(vlan->dev) without clearing any data.
We catch the frame in netif_receive_skb before it interacts with the
namespace of the real device.

2. skb to external device (macvlan_start_xmit): if the destination
is external, we just end up in dev_queue_xmit, with skb->dev set to
the external device but no other changes. The data is already on the
way out at this stage, so the namespace should not matter any more.

3. internal delivery: an skb from one macvlan to another gets always
sent through dev_forward_skb, which is supposed to clear anything
that must not leave the namespace.

Does this make sense?

	Arnd <><
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patrick McHardy Nov. 24, 2009, 6:38 p.m. UTC | #9

Eric W. Biederman wrote:
> Patrick McHardy <kaber@trash.net> writes:
> 
>> In the layered case (macvlan -> eth0) its common behaviour to
>> keep the mark however. But in case of different namespaces,
>> I think macvlan should also clear the mark on the dev_queue_xmit()
>> path since this is just a shortcut to looping the packets
>> through veth. In fact probably both of them should also clear
>> skb->priority so other namespaces don't accidentally misclassify
>> packets.
> 
> That is why I pushed for what is becoming dev_forward_skb.  So that
> we have one place where we can make all of those tweaks.  It seems
> like in every review we find another field that should be cleared/handled
> specially.
> 
> I don't quite follow what you intend with dev_queue_xmit when the macvlan
> is in one namespace and the real physical device is in another.  Are
> you mentioning that the packet classifier runs in the namespace where
> the primary device lives with packets from a different namespace?

Exactly. And I think we should make sure that the namespace of
the macvlan device can't (deliberately or accidentally) cause
misclassification.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[1/4] veth: move loopback logic to common location

Commit Message

Comments

Patch