mbox series

[0/3] Check gso_size of packets when forwarding

Message ID 20180116020920.20232-1-dja@axtens.net
Headers show
Series Check gso_size of packets when forwarding | expand

Message

Daniel Axtens Jan. 16, 2018, 2:09 a.m. UTC
When regular packets are forwarded, we validate their size against the
MTU of the destination device. However, when GSO packets are
forwarded, we do not validate their size against the MTU. We
implicitly assume that when they are segmented, the resultant packets
will be correctly sized.

This is not always the case.

We observed a case where a packet received on an ibmveth device had a
GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
device, where it caused a firmware assert. This is described in detail
at [0] and was the genesis of this series. Rather than fixing it in
the driver, this series fixes the forwarding path.

To fix this:

 - Move a helper in patch 1.

 - Validate GSO segment lengths in is_skb_forwardable() in the GSO
   case, rather than assuming all will be well. This fixes bridges.
   This is patch 2.

 - Open vSwitch uses its own slightly specialised algorithm for
   checking lengths. Wire up checking for that in patch 3.

[0]: https://patchwork.ozlabs.org/patch/859410/

Cc: Manish.Chopra@cavium.com
Cc: dev@openvswitch.org

Daniel Axtens (3):
  net: move skb_gso_mac_seglen to skbuff.h
  net: is_skb_forwardable: validate length of GSO packet segments
  openvswitch: drop GSO packets that are too large

 include/linux/skbuff.h  | 16 ++++++++++++++++
 net/core/dev.c          |  7 ++++---
 net/core/skbuff.c       | 34 ++++++++++++++++++++++++++++++++++
 net/openvswitch/vport.c | 37 ++++++++++++++++++++++++++++++-------
 net/sched/sch_tbf.c     | 10 ----------
 5 files changed, 84 insertions(+), 20 deletions(-)

Comments

David Miller Jan. 17, 2018, 8:20 p.m. UTC | #1
From: Daniel Axtens <dja@axtens.net>
Date: Tue, 16 Jan 2018 13:09:17 +1100

> When regular packets are forwarded, we validate their size against the
> MTU of the destination device. However, when GSO packets are
> forwarded, we do not validate their size against the MTU. We
> implicitly assume that when they are segmented, the resultant packets
> will be correctly sized.
> 
> This is not always the case.
> 
> We observed a case where a packet received on an ibmveth device had a
> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
> device, where it caused a firmware assert. This is described in detail
> at [0] and was the genesis of this series. Rather than fixing it in
> the driver, this series fixes the forwarding path.
> 
> To fix this:
> 
>  - Move a helper in patch 1.
> 
>  - Validate GSO segment lengths in is_skb_forwardable() in the GSO
>    case, rather than assuming all will be well. This fixes bridges.
>    This is patch 2.
> 
>  - Open vSwitch uses its own slightly specialised algorithm for
>    checking lengths. Wire up checking for that in patch 3.
> 
> [0]: https://patchwork.ozlabs.org/patch/859410/

This looks good to me, could the OVS folks please review this patch
series?

Thank you.
Pravin Shelar Jan. 18, 2018, 8:28 a.m. UTC | #2
On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens <dja@axtens.net> wrote:
> When regular packets are forwarded, we validate their size against the
> MTU of the destination device. However, when GSO packets are
> forwarded, we do not validate their size against the MTU. We
> implicitly assume that when they are segmented, the resultant packets
> will be correctly sized.
>
> This is not always the case.
>
> We observed a case where a packet received on an ibmveth device had a
> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
> device, where it caused a firmware assert. This is described in detail
> at [0] and was the genesis of this series. Rather than fixing it in
> the driver, this series fixes the forwarding path.
>
Are there any other possible forwarding path in networking stack? TC
is one subsystem that could forward such a packet to the bnx2x device,
how is that handled ?

> To fix this:
>
>  - Move a helper in patch 1.
>
>  - Validate GSO segment lengths in is_skb_forwardable() in the GSO
>    case, rather than assuming all will be well. This fixes bridges.
>    This is patch 2.
>
>  - Open vSwitch uses its own slightly specialised algorithm for
>    checking lengths. Wire up checking for that in patch 3.
>
> [0]: https://patchwork.ozlabs.org/patch/859410/
>
> Cc: Manish.Chopra@cavium.com
> Cc: dev@openvswitch.org
>
> Daniel Axtens (3):
>   net: move skb_gso_mac_seglen to skbuff.h
>   net: is_skb_forwardable: validate length of GSO packet segments
>   openvswitch: drop GSO packets that are too large
>
>  include/linux/skbuff.h  | 16 ++++++++++++++++
>  net/core/dev.c          |  7 ++++---
>  net/core/skbuff.c       | 34 ++++++++++++++++++++++++++++++++++
>  net/openvswitch/vport.c | 37 ++++++++++++++++++++++++++++++-------
>  net/sched/sch_tbf.c     | 10 ----------
>  5 files changed, 84 insertions(+), 20 deletions(-)
>
> --
> 2.14.1
>
Jason Wang Jan. 18, 2018, 9:49 a.m. UTC | #3
On 2018年01月18日 16:28, Pravin Shelar wrote:
> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens <dja@axtens.net> wrote:
>> When regular packets are forwarded, we validate their size against the
>> MTU of the destination device. However, when GSO packets are
>> forwarded, we do not validate their size against the MTU. We
>> implicitly assume that when they are segmented, the resultant packets
>> will be correctly sized.
>>
>> This is not always the case.
>>
>> We observed a case where a packet received on an ibmveth device had a
>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>> device, where it caused a firmware assert. This is described in detail
>> at [0] and was the genesis of this series. Rather than fixing it in
>> the driver, this series fixes the forwarding path.
>>
> Are there any other possible forwarding path in networking stack? TC
> is one subsystem that could forward such a packet to the bnx2x device,
> how is that handled ?

Yes, so it looks to me we should do the check in e.g netif_needs_gso() 
before passing it to hardware. And bnx2x needs to set its gso_max_size 
correctly.

Btw, looks like this could be triggered from a guest which is a DOS. We 
need request a CVE for this.

Thanks

>
>> To fix this:
>>
>>   - Move a helper in patch 1.
>>
>>   - Validate GSO segment lengths in is_skb_forwardable() in the GSO
>>     case, rather than assuming all will be well. This fixes bridges.
>>     This is patch 2.
>>
>>   - Open vSwitch uses its own slightly specialised algorithm for
>>     checking lengths. Wire up checking for that in patch 3.
>>
>> [0]: https://patchwork.ozlabs.org/patch/859410/
>>
>> Cc: Manish.Chopra@cavium.com
>> Cc: dev@openvswitch.org
>>
>> Daniel Axtens (3):
>>    net: move skb_gso_mac_seglen to skbuff.h
>>    net: is_skb_forwardable: validate length of GSO packet segments
>>    openvswitch: drop GSO packets that are too large
>>
>>   include/linux/skbuff.h  | 16 ++++++++++++++++
>>   net/core/dev.c          |  7 ++++---
>>   net/core/skbuff.c       | 34 ++++++++++++++++++++++++++++++++++
>>   net/openvswitch/vport.c | 37 ++++++++++++++++++++++++++++++-------
>>   net/sched/sch_tbf.c     | 10 ----------
>>   5 files changed, 84 insertions(+), 20 deletions(-)
>>
>> --
>> 2.14.1
>>
Daniel Axtens Jan. 18, 2018, 1:08 p.m. UTC | #4
Pravin Shelar <pshelar@ovn.org> writes:

> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens <dja@axtens.net> wrote:
>> When regular packets are forwarded, we validate their size against the
>> MTU of the destination device. However, when GSO packets are
>> forwarded, we do not validate their size against the MTU. We
>> implicitly assume that when they are segmented, the resultant packets
>> will be correctly sized.
>>
>> This is not always the case.
>>
>> We observed a case where a packet received on an ibmveth device had a
>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>> device, where it caused a firmware assert. This is described in detail
>> at [0] and was the genesis of this series. Rather than fixing it in
>> the driver, this series fixes the forwarding path.
>>
> Are there any other possible forwarding path in networking stack? TC
> is one subsystem that could forward such a packet to the bnx2x device,
> how is that handled ?

So far I have only looked at bridges, openvswitch and macvlan. In
general, if the code uses dev_forward_skb() it should automatically be
fine as that invokes is_skb_forwardable(), which we patch.

There is potentially a forwarding path in macvlan that doesn't pass
through dev_forward_skb() (which calls is_skb_forwardable). I did post a
patch to this earlier, but it got somewhat derailed by how you would get
such a packet in to a macvlan device and whether that should be fixed
instead. Once we have some consensus on Jason's questions and the
overall approach is solid, I'm happy to work on that again.

To answer your specific question, I'm not aware of how tc can cause a
packet to be forwarded - if you can point me to the right general area I
can have a look.

Regards,
Daniel

>
>> To fix this:
>>
>>  - Move a helper in patch 1.
>>
>>  - Validate GSO segment lengths in is_skb_forwardable() in the GSO
>>    case, rather than assuming all will be well. This fixes bridges.
>>    This is patch 2.
>>
>>  - Open vSwitch uses its own slightly specialised algorithm for
>>    checking lengths. Wire up checking for that in patch 3.
>>
>> [0]: https://patchwork.ozlabs.org/patch/859410/
>>
>> Cc: Manish.Chopra@cavium.com
>> Cc: dev@openvswitch.org
>>
>> Daniel Axtens (3):
>>   net: move skb_gso_mac_seglen to skbuff.h
>>   net: is_skb_forwardable: validate length of GSO packet segments
>>   openvswitch: drop GSO packets that are too large
>>
>>  include/linux/skbuff.h  | 16 ++++++++++++++++
>>  net/core/dev.c          |  7 ++++---
>>  net/core/skbuff.c       | 34 ++++++++++++++++++++++++++++++++++
>>  net/openvswitch/vport.c | 37 ++++++++++++++++++++++++++++++-------
>>  net/sched/sch_tbf.c     | 10 ----------
>>  5 files changed, 84 insertions(+), 20 deletions(-)
>>
>> --
>> 2.14.1
>>
Daniel Axtens Jan. 18, 2018, 1:17 p.m. UTC | #5
Jason Wang <jasowang@redhat.com> writes:

> On 2018年01月18日 16:28, Pravin Shelar wrote:
>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens <dja@axtens.net> wrote:
>>> When regular packets are forwarded, we validate their size against the
>>> MTU of the destination device. However, when GSO packets are
>>> forwarded, we do not validate their size against the MTU. We
>>> implicitly assume that when they are segmented, the resultant packets
>>> will be correctly sized.
>>>
>>> This is not always the case.
>>>
>>> We observed a case where a packet received on an ibmveth device had a
>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>> device, where it caused a firmware assert. This is described in detail
>>> at [0] and was the genesis of this series. Rather than fixing it in
>>> the driver, this series fixes the forwarding path.
>>>
>> Are there any other possible forwarding path in networking stack? TC
>> is one subsystem that could forward such a packet to the bnx2x device,
>> how is that handled ?
>
> Yes, so it looks to me we should do the check in e.g netif_needs_gso() 
> before passing it to hardware. And bnx2x needs to set its gso_max_size 
> correctly.

I don't think gso_max_size is quite the same. If I understand
net/ipv4/tcp.c correctly, gso_max_size is supposed to cap the total
length of the skb, not the length of each segmented packet. The problem
for the bnx2x card is the length of the segment, not the overall length.

>
> Btw, looks like this could be triggered from a guest which is a DOS. We 
> need request a CVE for this.
>

You are correct about how this can be triggered: in fact it came to my
attention because a network packet from one LPAR (PowerVM virtual
machine) brought down the card attached to a different LPAR. It didn't
occur to me that it was potentially a security issue. I am talking with
the security team at Canonical regarding this.

Regards,
Daniel

> Thanks
>
>>
>>> To fix this:
>>>
>>>   - Move a helper in patch 1.
>>>
>>>   - Validate GSO segment lengths in is_skb_forwardable() in the GSO
>>>     case, rather than assuming all will be well. This fixes bridges.
>>>     This is patch 2.
>>>
>>>   - Open vSwitch uses its own slightly specialised algorithm for
>>>     checking lengths. Wire up checking for that in patch 3.
>>>
>>> [0]: https://patchwork.ozlabs.org/patch/859410/
>>>
>>> Cc: Manish.Chopra@cavium.com
>>> Cc: dev@openvswitch.org
>>>
>>> Daniel Axtens (3):
>>>    net: move skb_gso_mac_seglen to skbuff.h
>>>    net: is_skb_forwardable: validate length of GSO packet segments
>>>    openvswitch: drop GSO packets that are too large
>>>
>>>   include/linux/skbuff.h  | 16 ++++++++++++++++
>>>   net/core/dev.c          |  7 ++++---
>>>   net/core/skbuff.c       | 34 ++++++++++++++++++++++++++++++++++
>>>   net/openvswitch/vport.c | 37 ++++++++++++++++++++++++++++++-------
>>>   net/sched/sch_tbf.c     | 10 ----------
>>>   5 files changed, 84 insertions(+), 20 deletions(-)
>>>
>>> --
>>> 2.14.1
>>>
Daniel Axtens Jan. 18, 2018, 2:05 p.m. UTC | #6
Daniel Axtens <dja@axtens.net> writes:

> Jason Wang <jasowang@redhat.com> writes:
>
>> On 2018年01月18日 16:28, Pravin Shelar wrote:
>>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens <dja@axtens.net> wrote:
>>>> When regular packets are forwarded, we validate their size against the
>>>> MTU of the destination device. However, when GSO packets are
>>>> forwarded, we do not validate their size against the MTU. We
>>>> implicitly assume that when they are segmented, the resultant packets
>>>> will be correctly sized.
>>>>
>>>> This is not always the case.
>>>>
>>>> We observed a case where a packet received on an ibmveth device had a
>>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>>> device, where it caused a firmware assert. This is described in detail
>>>> at [0] and was the genesis of this series. Rather than fixing it in
>>>> the driver, this series fixes the forwarding path.
>>>>
>>> Are there any other possible forwarding path in networking stack? TC
>>> is one subsystem that could forward such a packet to the bnx2x device,
>>> how is that handled ?
>>
>> Yes, so it looks to me we should do the check in e.g netif_needs_gso() 
>> before passing it to hardware. And bnx2x needs to set its gso_max_size 
>> correctly.
>
> I don't think gso_max_size is quite the same. If I understand
> net/ipv4/tcp.c correctly, gso_max_size is supposed to cap the total
> length of the skb, not the length of each segmented packet. The problem
> for the bnx2x card is the length of the segment, not the overall length.
>
>>
>> Btw, looks like this could be triggered from a guest which is a DOS. We 
>> need request a CVE for this.
>>
>
> You are correct about how this can be triggered: in fact it came to my
> attention because a network packet from one LPAR (PowerVM virtual
> machine) brought down the card attached to a different LPAR. It didn't
> occur to me that it was potentially a security issue. I am talking with
> the security team at Canonical regarding this.

I have requested a CVE from the Distributed Weakness Filing.

Regards,
Daniel

>
> Regards,
> Daniel
>
>> Thanks
>>
>>>
>>>> To fix this:
>>>>
>>>>   - Move a helper in patch 1.
>>>>
>>>>   - Validate GSO segment lengths in is_skb_forwardable() in the GSO
>>>>     case, rather than assuming all will be well. This fixes bridges.
>>>>     This is patch 2.
>>>>
>>>>   - Open vSwitch uses its own slightly specialised algorithm for
>>>>     checking lengths. Wire up checking for that in patch 3.
>>>>
>>>> [0]: https://patchwork.ozlabs.org/patch/859410/
>>>>
>>>> Cc: Manish.Chopra@cavium.com
>>>> Cc: dev@openvswitch.org
>>>>
>>>> Daniel Axtens (3):
>>>>    net: move skb_gso_mac_seglen to skbuff.h
>>>>    net: is_skb_forwardable: validate length of GSO packet segments
>>>>    openvswitch: drop GSO packets that are too large
>>>>
>>>>   include/linux/skbuff.h  | 16 ++++++++++++++++
>>>>   net/core/dev.c          |  7 ++++---
>>>>   net/core/skbuff.c       | 34 ++++++++++++++++++++++++++++++++++
>>>>   net/openvswitch/vport.c | 37 ++++++++++++++++++++++++++++++-------
>>>>   net/sched/sch_tbf.c     | 10 ----------
>>>>   5 files changed, 84 insertions(+), 20 deletions(-)
>>>>
>>>> --
>>>> 2.14.1
>>>>
Pravin Shelar Jan. 18, 2018, 9:57 p.m. UTC | #7
On Thu, Jan 18, 2018 at 5:08 AM, Daniel Axtens <dja@axtens.net> wrote:
> Pravin Shelar <pshelar@ovn.org> writes:
>
>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens <dja@axtens.net> wrote:
>>> When regular packets are forwarded, we validate their size against the
>>> MTU of the destination device. However, when GSO packets are
>>> forwarded, we do not validate their size against the MTU. We
>>> implicitly assume that when they are segmented, the resultant packets
>>> will be correctly sized.
>>>
>>> This is not always the case.
>>>
>>> We observed a case where a packet received on an ibmveth device had a
>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>> device, where it caused a firmware assert. This is described in detail
>>> at [0] and was the genesis of this series. Rather than fixing it in
>>> the driver, this series fixes the forwarding path.
>>>
>> Are there any other possible forwarding path in networking stack? TC
>> is one subsystem that could forward such a packet to the bnx2x device,
>> how is that handled ?
>
> So far I have only looked at bridges, openvswitch and macvlan. In
> general, if the code uses dev_forward_skb() it should automatically be
> fine as that invokes is_skb_forwardable(), which we patch.
>
But there are other ways to forward packets, e.g tc-mirred or bpf
redirect. We need to handle all these cases rather than fixing one at
a time. As Jason suggested netif_needs_gso() looks like good function
to validate if a device is capable of handling given GSO packet.
Daniel Axtens Jan. 19, 2018, 1:28 a.m. UTC | #8
Pravin Shelar <pshelar@ovn.org> writes:

> On Thu, Jan 18, 2018 at 5:08 AM, Daniel Axtens <dja@axtens.net> wrote:
>> Pravin Shelar <pshelar@ovn.org> writes:
>>
>>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens <dja@axtens.net> wrote:
>>>> When regular packets are forwarded, we validate their size against the
>>>> MTU of the destination device. However, when GSO packets are
>>>> forwarded, we do not validate their size against the MTU. We
>>>> implicitly assume that when they are segmented, the resultant packets
>>>> will be correctly sized.
>>>>
>>>> This is not always the case.
>>>>
>>>> We observed a case where a packet received on an ibmveth device had a
>>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>>> device, where it caused a firmware assert. This is described in detail
>>>> at [0] and was the genesis of this series. Rather than fixing it in
>>>> the driver, this series fixes the forwarding path.
>>>>
>>> Are there any other possible forwarding path in networking stack? TC
>>> is one subsystem that could forward such a packet to the bnx2x device,
>>> how is that handled ?
>>
>> So far I have only looked at bridges, openvswitch and macvlan. In
>> general, if the code uses dev_forward_skb() it should automatically be
>> fine as that invokes is_skb_forwardable(), which we patch.
>>
> But there are other ways to forward packets, e.g tc-mirred or bpf
> redirect. We need to handle all these cases rather than fixing one at
> a time. As Jason suggested netif_needs_gso() looks like good function
> to validate if a device is capable of handling given GSO packet.

I am not entirely sure this is a better solution.

The biggest reason I am uncomfortable with this is that if
netif_needs_gso() returns true, the skb will be segmented. The segment
sizes will be based on gso_size. Since gso_size is greater than the MTU,
the resulting segments will themselves be over-MTU. Those over-MTU
segments will then be passed to the network card. I think we should not
be creating over-MTU segments; we should instead be dropping the packet
and logging.

I do take the point that you and Jason are making: a more generic
fix would be good. I'm just not sure where to put it.


Regards,
Daniel
Daniel Axtens Jan. 19, 2018, 6:11 a.m. UTC | #9
Daniel Axtens <dja@axtens.net> writes:

> Pravin Shelar <pshelar@ovn.org> writes:
>
>> On Thu, Jan 18, 2018 at 5:08 AM, Daniel Axtens <dja@axtens.net> wrote:
>>> Pravin Shelar <pshelar@ovn.org> writes:
>>>
>>>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens <dja@axtens.net> wrote:
>>>>> When regular packets are forwarded, we validate their size against the
>>>>> MTU of the destination device. However, when GSO packets are
>>>>> forwarded, we do not validate their size against the MTU. We
>>>>> implicitly assume that when they are segmented, the resultant packets
>>>>> will be correctly sized.
>>>>>
>>>>> This is not always the case.
>>>>>
>>>>> We observed a case where a packet received on an ibmveth device had a
>>>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>>>> device, where it caused a firmware assert. This is described in detail
>>>>> at [0] and was the genesis of this series. Rather than fixing it in
>>>>> the driver, this series fixes the forwarding path.
>>>>>
>>>> Are there any other possible forwarding path in networking stack? TC
>>>> is one subsystem that could forward such a packet to the bnx2x device,
>>>> how is that handled ?
>>>
>>> So far I have only looked at bridges, openvswitch and macvlan. In
>>> general, if the code uses dev_forward_skb() it should automatically be
>>> fine as that invokes is_skb_forwardable(), which we patch.
>>>
>> But there are other ways to forward packets, e.g tc-mirred or bpf
>> redirect. We need to handle all these cases rather than fixing one at
>> a time. As Jason suggested netif_needs_gso() looks like good function
>> to validate if a device is capable of handling given GSO packet.
>
> I am not entirely sure this is a better solution.
>
> The biggest reason I am uncomfortable with this is that if
> netif_needs_gso() returns true, the skb will be segmented. The segment
> sizes will be based on gso_size. Since gso_size is greater than the MTU,
> the resulting segments will themselves be over-MTU. Those over-MTU
> segments will then be passed to the network card. I think we should not
> be creating over-MTU segments; we should instead be dropping the packet
> and logging.
>
> I do take the point that you and Jason are making: a more generic
> fix would  be good. I'm just not sure where to put it.

How about this? It puts the check in validate_xmit_skb(). (completely untested)

Ultimately I think we need to make a broader policy decision about who
is responsible for ensuring that packets are correctly sized - is it the
caller of dev_queue_xmit (as in bridges and openswitch) or is it lower
down the stack, such as validate_xmit_skb()? Making the caller check is
more efficient - packets created on the system are always going to fit
due to the checks in the protocol code, so rechecking is inefficient and
unnecessary. But checking later is more reliable as we don't have to
identify all forwarding paths.

Regards,
Daniel

diff --git a/net/core/dev.c b/net/core/dev.c
index 5af04be128c1..b8c3f33ece19 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1830,13 +1830,11 @@ static inline void net_timestamp_set(struct sk_buff *skb)
                        __net_timestamp(SKB);           \
        }                                               \
 
-bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
+static inline bool does_skb_fit_mtu(const struct net_device *dev,
+                                   const struct sk_buff *skb)
 {
        unsigned int len;
 
-       if (!(dev->flags & IFF_UP))
-               return false;
-
        len = dev->mtu + dev->hard_header_len + VLAN_HLEN;
        if (skb->len <= len)
                return true;
@@ -1850,6 +1848,14 @@ bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
 
        return false;
 }
+
+bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
+{
+       if (!(dev->flags & IFF_UP))
+               return false;
+
+       return does_skb_fit_mtu(dev, skb);
+}
 EXPORT_SYMBOL_GPL(is_skb_forwardable);
 
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
@@ -3081,6 +3087,9 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device
        if (unlikely(!skb))
                goto out_null;
 
+       if (unlikely(!does_skb_fit_mtu(dev, skb)))
+               goto out_kfree_skb;
+
        if (netif_needs_gso(skb, features)) {
                struct sk_buff *segs;
 



>
>
> Regards,
> Daniel
Pravin Shelar Jan. 19, 2018, 7:08 a.m. UTC | #10
On Thu, Jan 18, 2018 at 5:28 PM, Daniel Axtens <dja@axtens.net> wrote:
> Pravin Shelar <pshelar@ovn.org> writes:
>
>> On Thu, Jan 18, 2018 at 5:08 AM, Daniel Axtens <dja@axtens.net> wrote:
>>> Pravin Shelar <pshelar@ovn.org> writes:
>>>
>>>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens <dja@axtens.net> wrote:
>>>>> When regular packets are forwarded, we validate their size against the
>>>>> MTU of the destination device. However, when GSO packets are
>>>>> forwarded, we do not validate their size against the MTU. We
>>>>> implicitly assume that when they are segmented, the resultant packets
>>>>> will be correctly sized.
>>>>>
>>>>> This is not always the case.
>>>>>
>>>>> We observed a case where a packet received on an ibmveth device had a
>>>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>>>> device, where it caused a firmware assert. This is described in detail
>>>>> at [0] and was the genesis of this series. Rather than fixing it in
>>>>> the driver, this series fixes the forwarding path.
>>>>>
>>>> Are there any other possible forwarding path in networking stack? TC
>>>> is one subsystem that could forward such a packet to the bnx2x device,
>>>> how is that handled ?
>>>
>>> So far I have only looked at bridges, openvswitch and macvlan. In
>>> general, if the code uses dev_forward_skb() it should automatically be
>>> fine as that invokes is_skb_forwardable(), which we patch.
>>>
>> But there are other ways to forward packets, e.g tc-mirred or bpf
>> redirect. We need to handle all these cases rather than fixing one at
>> a time. As Jason suggested netif_needs_gso() looks like good function
>> to validate if a device is capable of handling given GSO packet.
>
> I am not entirely sure this is a better solution.
>
> The biggest reason I am uncomfortable with this is that if
> netif_needs_gso() returns true, the skb will be segmented. The segment
> sizes will be based on gso_size. Since gso_size is greater than the MTU,
> the resulting segments will themselves be over-MTU. Those over-MTU
> segments will then be passed to the network card. I think we should not
> be creating over-MTU segments; we should instead be dropping the packet
> and logging.
>

Would this oversized segment cause the firmware assert?
If this solves the assert issue then I do not see much value in adding
checks in fast-path just for logging.
Daniel Axtens Jan. 19, 2018, 11:58 a.m. UTC | #11
Pravin Shelar <pshelar@ovn.org> writes:

> On Thu, Jan 18, 2018 at 5:28 PM, Daniel Axtens <dja@axtens.net> wrote:
>> Pravin Shelar <pshelar@ovn.org> writes:
>>
>>> On Thu, Jan 18, 2018 at 5:08 AM, Daniel Axtens <dja@axtens.net> wrote:
>>>> Pravin Shelar <pshelar@ovn.org> writes:
>>>>
>>>>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens <dja@axtens.net> wrote:
>>>>>> When regular packets are forwarded, we validate their size against the
>>>>>> MTU of the destination device. However, when GSO packets are
>>>>>> forwarded, we do not validate their size against the MTU. We
>>>>>> implicitly assume that when they are segmented, the resultant packets
>>>>>> will be correctly sized.
>>>>>>
>>>>>> This is not always the case.
>>>>>>
>>>>>> We observed a case where a packet received on an ibmveth device had a
>>>>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>>>>> device, where it caused a firmware assert. This is described in detail
>>>>>> at [0] and was the genesis of this series. Rather than fixing it in
>>>>>> the driver, this series fixes the forwarding path.
>>>>>>
>>>>> Are there any other possible forwarding path in networking stack? TC
>>>>> is one subsystem that could forward such a packet to the bnx2x device,
>>>>> how is that handled ?
>>>>
>>>> So far I have only looked at bridges, openvswitch and macvlan. In
>>>> general, if the code uses dev_forward_skb() it should automatically be
>>>> fine as that invokes is_skb_forwardable(), which we patch.
>>>>
>>> But there are other ways to forward packets, e.g tc-mirred or bpf
>>> redirect. We need to handle all these cases rather than fixing one at
>>> a time. As Jason suggested netif_needs_gso() looks like good function
>>> to validate if a device is capable of handling given GSO packet.
>>
>> I am not entirely sure this is a better solution.
>>
>> The biggest reason I am uncomfortable with this is that if
>> netif_needs_gso() returns true, the skb will be segmented. The segment
>> sizes will be based on gso_size. Since gso_size is greater than the MTU,
>> the resulting segments will themselves be over-MTU. Those over-MTU
>> segments will then be passed to the network card. I think we should not
>> be creating over-MTU segments; we should instead be dropping the packet
>> and logging.
>>
>
> Would this oversized segment cause the firmware assert?
> If this solves the assert issue then I do not see much value in adding
> checks in fast-path just for logging.

No - I tested this (or rather, as I don't have direct access to a bnx2x
card, this was tested on my behalf): as long as the packet is not a GSO
packet, it doesn't cause the crash. So we *could* segment them, I just
think that knowingly creating invalid segments is not particularly
pleasant.

Do you think my approach in a later email which checks and drops without
logging is sufficient? It is the same GSO check, and also checks non-GSO
packets against the MTU.

Regards,
Daniel
Pravin Shelar Jan. 19, 2018, 9:54 p.m. UTC | #12
On Fri, Jan 19, 2018 at 3:58 AM, Daniel Axtens <dja@axtens.net> wrote:
> Pravin Shelar <pshelar@ovn.org> writes:
>
>> On Thu, Jan 18, 2018 at 5:28 PM, Daniel Axtens <dja@axtens.net> wrote:
>>> Pravin Shelar <pshelar@ovn.org> writes:
>>>
>>>> On Thu, Jan 18, 2018 at 5:08 AM, Daniel Axtens <dja@axtens.net> wrote:
>>>>> Pravin Shelar <pshelar@ovn.org> writes:
>>>>>
>>>>>> On Mon, Jan 15, 2018 at 6:09 PM, Daniel Axtens <dja@axtens.net> wrote:
>>>>>>> When regular packets are forwarded, we validate their size against the
>>>>>>> MTU of the destination device. However, when GSO packets are
>>>>>>> forwarded, we do not validate their size against the MTU. We
>>>>>>> implicitly assume that when they are segmented, the resultant packets
>>>>>>> will be correctly sized.
>>>>>>>
>>>>>>> This is not always the case.
>>>>>>>
>>>>>>> We observed a case where a packet received on an ibmveth device had a
>>>>>>> GSO size of around 10kB. This was forwarded by Open vSwitch to a bnx2x
>>>>>>> device, where it caused a firmware assert. This is described in detail
>>>>>>> at [0] and was the genesis of this series. Rather than fixing it in
>>>>>>> the driver, this series fixes the forwarding path.
>>>>>>>
>>>>>> Are there any other possible forwarding path in networking stack? TC
>>>>>> is one subsystem that could forward such a packet to the bnx2x device,
>>>>>> how is that handled ?
>>>>>
>>>>> So far I have only looked at bridges, openvswitch and macvlan. In
>>>>> general, if the code uses dev_forward_skb() it should automatically be
>>>>> fine as that invokes is_skb_forwardable(), which we patch.
>>>>>
>>>> But there are other ways to forward packets, e.g tc-mirred or bpf
>>>> redirect. We need to handle all these cases rather than fixing one at
>>>> a time. As Jason suggested netif_needs_gso() looks like good function
>>>> to validate if a device is capable of handling given GSO packet.
>>>
>>> I am not entirely sure this is a better solution.
>>>
>>> The biggest reason I am uncomfortable with this is that if
>>> netif_needs_gso() returns true, the skb will be segmented. The segment
>>> sizes will be based on gso_size. Since gso_size is greater than the MTU,
>>> the resulting segments will themselves be over-MTU. Those over-MTU
>>> segments will then be passed to the network card. I think we should not
>>> be creating over-MTU segments; we should instead be dropping the packet
>>> and logging.
>>>
>>
>> Would this oversized segment cause the firmware assert?
>> If this solves the assert issue then I do not see much value in adding
>> checks in fast-path just for logging.
>
> No - I tested this (or rather, as I don't have direct access to a bnx2x
> card, this was tested on my behalf): as long as the packet is not a GSO
> packet, it doesn't cause the crash. So we *could* segment them, I just
> think that knowingly creating invalid segments is not particularly
> pleasant.
>
I agree it is not perfect. But the other proposed patch does not fix
the connectivity issue. It only adds log msg in such cases at cost of
extra checks/code. Therefore I prefer the easier fix for the issue
which also fixes for all cases of packet forwarding rather than just
OVS and Bridge.
David Miller Jan. 22, 2018, 8:14 p.m. UTC | #13
From: Pravin Shelar <pshelar@ovn.org>
Date: Fri, 19 Jan 2018 13:54:15 -0800

> I agree it is not perfect. But the other proposed patch does not fix
> the connectivity issue. It only adds log msg in such cases at cost
> of extra checks/code. Therefore I prefer the easier fix for the
> issue which also fixes for all cases of packet forwarding rather
> than just OVS and Bridge.

I really think that something needs to guarantee that device drivers
will never be given either over-MTU or over-max-GSO-seg-size SKBs.

Otherwise drivers need to add completely stupid checks like making
sure that SKB lengths do not exceed the maxmimu value that can be
encoded into descriptors.

What's probably happening often now in such situations is that the
driver ends up masking the length blindly and ends up sending out a
truncated packet.

Which frankly is quite bad too.

It doesn't scale to add these checks into every driver, or trying to
"figure out" which drivers will behave adversely and only add checks
to those.

The kernel shouldn't pass objects with out-of-range attributes
to the driver, period.
Stephen Hemminger Jan. 22, 2018, 9:31 p.m. UTC | #14
On Mon, 22 Jan 2018 15:14:53 -0500 (EST)
David Miller <davem@davemloft.net> wrote:

> From: Pravin Shelar <pshelar@ovn.org>
> Date: Fri, 19 Jan 2018 13:54:15 -0800
> 
> > I agree it is not perfect. But the other proposed patch does not fix
> > the connectivity issue. It only adds log msg in such cases at cost
> > of extra checks/code. Therefore I prefer the easier fix for the
> > issue which also fixes for all cases of packet forwarding rather
> > than just OVS and Bridge.  
> 
> I really think that something needs to guarantee that device drivers
> will never be given either over-MTU or over-max-GSO-seg-size SKBs.
> 
> Otherwise drivers need to add completely stupid checks like making
> sure that SKB lengths do not exceed the maxmimu value that can be
> encoded into descriptors.
> 
> What's probably happening often now in such situations is that the
> driver ends up masking the length blindly and ends up sending out a
> truncated packet.
> 
> Which frankly is quite bad too.
> 
> It doesn't scale to add these checks into every driver, or trying to
> "figure out" which drivers will behave adversely and only add checks
> to those.
> 
> The kernel shouldn't pass objects with out-of-range attributes
> to the driver, period.

Agreed. We should make it easier to write non-buggy drivers.
Pravin Shelar Jan. 23, 2018, 5:47 a.m. UTC | #15
On Mon, Jan 22, 2018 at 12:14 PM, David Miller <davem@davemloft.net> wrote:
> From: Pravin Shelar <pshelar@ovn.org>
> Date: Fri, 19 Jan 2018 13:54:15 -0800
>
>> I agree it is not perfect. But the other proposed patch does not fix
>> the connectivity issue. It only adds log msg in such cases at cost
>> of extra checks/code. Therefore I prefer the easier fix for the
>> issue which also fixes for all cases of packet forwarding rather
>> than just OVS and Bridge.
>
> I really think that something needs to guarantee that device drivers
> will never be given either over-MTU or over-max-GSO-seg-size SKBs.
>
> Otherwise drivers need to add completely stupid checks like making
> sure that SKB lengths do not exceed the maxmimu value that can be
> encoded into descriptors.
>
> What's probably happening often now in such situations is that the
> driver ends up masking the length blindly and ends up sending out a
> truncated packet.
>
> Which frankly is quite bad too.
>
> It doesn't scale to add these checks into every driver, or trying to
> "figure out" which drivers will behave adversely and only add checks
> to those.
>
> The kernel shouldn't pass objects with out-of-range attributes
> to the driver, period.

OK, in that case we could add a check in validate_xmit_skb() as done
by Daniel in a patch posted earlier.