[ovs-dev] dpif-netlink-rtnl: Work around MTU bug in kernel GRE driver.

Message ID 20180112204459.2311-1-blp@ovn.org
State Superseded
Headers show
Series
  • [ovs-dev] dpif-netlink-rtnl: Work around MTU bug in kernel GRE driver.
Related show

Commit Message

Ben Pfaff Jan. 12, 2018, 8:44 p.m.
The kernel GRE driver ignores IFLA_MTU in RTM_NEWLINK requests and
overrides the MTU to 1472 bytes.  This commit works around the problem by
following up a request to create a GRE device with a second request to set
the MTU.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1488484
Reported-by: Eric Garver <e@erig.me>
Reported-by: James Page <james.page@ubuntu.com>
Signed-off-by: Ben Pfaff <blp@ovn.org>
---
This is not properly tested.  It needs to be tested before it makes sense
to commit it.

 lib/dpif-netlink-rtnl.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Eric Garver Jan. 12, 2018, 9:35 p.m. | #1
On Fri, Jan 12, 2018 at 12:44:59PM -0800, Ben Pfaff wrote:
> The kernel GRE driver ignores IFLA_MTU in RTM_NEWLINK requests and
> overrides the MTU to 1472 bytes.  This commit works around the problem by
> following up a request to create a GRE device with a second request to set
> the MTU.
> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1488484
> Reported-by: Eric Garver <e@erig.me>
> Reported-by: James Page <james.page@ubuntu.com>
> Signed-off-by: Ben Pfaff <blp@ovn.org>
> ---
> This is not properly tested.  It needs to be tested before it makes sense
> to commit it.
> 
>  lib/dpif-netlink-rtnl.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
> index fe9c8ed7104f..0bf965d29f41 100644
> --- a/lib/dpif-netlink-rtnl.c
> +++ b/lib/dpif-netlink-rtnl.c
> @@ -329,6 +329,19 @@ dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg,
>      nl_msg_end_nested(&request, linkinfo_off);
>  
>      err = nl_transact(NETLINK_ROUTE, &request, NULL);
> +    if (!err && type == OVS_VPORT_TYPE_GRE) {
> +        /* Work around a bug in kernel GRE driver, which ignores IFLA_MTU in
> +         * RTM_NEWLINK, by setting the MTU again.  See
> +         * https://bugzilla.redhat.com/show_bug.cgi?id=1488484. */
> +        ofpbuf_clear(&request);
> +        nl_msg_put_nlmsghdr(&request, 0, RTM_SETLINK,
> +                            NLM_F_REQUEST | NLM_F_ACK);
> +        ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
> +        nl_msg_put_string(&request, IFLA_IFNAME, name);
> +        nl_msg_put_u32(&request, IFLA_MTU, UINT16_MAX);
> +
> +        err = nl_transact(NETLINK_ROUTE, &request, NULL);
> +    }
>  
>  exit:
>      ofpbuf_uninit(&request);
> -- 
> 2.10.2
> 

This looks sane to me. It would be nice if James could test it though.

Acked-by: Eric Garver <e@erig.me>
Eric Garver Jan. 12, 2018, 9:51 p.m. | #2
On Fri, Jan 12, 2018 at 04:35:06PM -0500, Eric Garver wrote:
> On Fri, Jan 12, 2018 at 12:44:59PM -0800, Ben Pfaff wrote:
> > The kernel GRE driver ignores IFLA_MTU in RTM_NEWLINK requests and
> > overrides the MTU to 1472 bytes.  This commit works around the problem by
> > following up a request to create a GRE device with a second request to set
> > the MTU.
> > 
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1488484
> > Reported-by: Eric Garver <e@erig.me>
> > Reported-by: James Page <james.page@ubuntu.com>
> > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > ---
> > This is not properly tested.  It needs to be tested before it makes sense
> > to commit it.
> > 
> >  lib/dpif-netlink-rtnl.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
> > index fe9c8ed7104f..0bf965d29f41 100644
> > --- a/lib/dpif-netlink-rtnl.c
> > +++ b/lib/dpif-netlink-rtnl.c
> > @@ -329,6 +329,19 @@ dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg,
> >      nl_msg_end_nested(&request, linkinfo_off);
> >  
> >      err = nl_transact(NETLINK_ROUTE, &request, NULL);
> > +    if (!err && type == OVS_VPORT_TYPE_GRE) {
> > +        /* Work around a bug in kernel GRE driver, which ignores IFLA_MTU in
> > +         * RTM_NEWLINK, by setting the MTU again.  See
> > +         * https://bugzilla.redhat.com/show_bug.cgi?id=1488484. */
> > +        ofpbuf_clear(&request);
> > +        nl_msg_put_nlmsghdr(&request, 0, RTM_SETLINK,
> > +                            NLM_F_REQUEST | NLM_F_ACK);
> > +        ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
> > +        nl_msg_put_string(&request, IFLA_IFNAME, name);
> > +        nl_msg_put_u32(&request, IFLA_MTU, UINT16_MAX);
> > +
> > +        err = nl_transact(NETLINK_ROUTE, &request, NULL);
> > +    }
> >  
> >  exit:
> >      ofpbuf_uninit(&request);
> > -- 
> > 2.10.2
> > 
> 
> This looks sane to me. It would be nice if James could test it though.

On second thought.. if SETLINK fails we'll report "err" that the tunnel
didn't get created, but it actually did. We can either ignore the err
from SETLINK or destroy the interface. Depends on if you prefer low MTU
tunnel or no tunnel. :)
Ben Pfaff Jan. 12, 2018, 10:04 p.m. | #3
On Fri, Jan 12, 2018 at 04:51:00PM -0500, Eric Garver wrote:
> On Fri, Jan 12, 2018 at 04:35:06PM -0500, Eric Garver wrote:
> > On Fri, Jan 12, 2018 at 12:44:59PM -0800, Ben Pfaff wrote:
> > > The kernel GRE driver ignores IFLA_MTU in RTM_NEWLINK requests and
> > > overrides the MTU to 1472 bytes.  This commit works around the problem by
> > > following up a request to create a GRE device with a second request to set
> > > the MTU.
> > > 
> > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1488484
> > > Reported-by: Eric Garver <e@erig.me>
> > > Reported-by: James Page <james.page@ubuntu.com>
> > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > > ---
> > > This is not properly tested.  It needs to be tested before it makes sense
> > > to commit it.
> > > 
> > >  lib/dpif-netlink-rtnl.c | 13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > > 
> > > diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
> > > index fe9c8ed7104f..0bf965d29f41 100644
> > > --- a/lib/dpif-netlink-rtnl.c
> > > +++ b/lib/dpif-netlink-rtnl.c
> > > @@ -329,6 +329,19 @@ dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg,
> > >      nl_msg_end_nested(&request, linkinfo_off);
> > >  
> > >      err = nl_transact(NETLINK_ROUTE, &request, NULL);
> > > +    if (!err && type == OVS_VPORT_TYPE_GRE) {
> > > +        /* Work around a bug in kernel GRE driver, which ignores IFLA_MTU in
> > > +         * RTM_NEWLINK, by setting the MTU again.  See
> > > +         * https://bugzilla.redhat.com/show_bug.cgi?id=1488484. */
> > > +        ofpbuf_clear(&request);
> > > +        nl_msg_put_nlmsghdr(&request, 0, RTM_SETLINK,
> > > +                            NLM_F_REQUEST | NLM_F_ACK);
> > > +        ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
> > > +        nl_msg_put_string(&request, IFLA_IFNAME, name);
> > > +        nl_msg_put_u32(&request, IFLA_MTU, UINT16_MAX);
> > > +
> > > +        err = nl_transact(NETLINK_ROUTE, &request, NULL);
> > > +    }
> > >  
> > >  exit:
> > >      ofpbuf_uninit(&request);
> > > -- 
> > > 2.10.2
> > > 
> > 
> > This looks sane to me. It would be nice if James could test it though.
> 
> On second thought.. if SETLINK fails we'll report "err" that the tunnel
> didn't get created, but it actually did. We can either ignore the err
> from SETLINK or destroy the interface. Depends on if you prefer low MTU
> tunnel or no tunnel. :)

I guess it would be slightly better to report no error (and log the real
error), because there's no guarantee that DELLINK would succeed, and
what do we do if it doesn't?
Eric Garver Jan. 12, 2018, 11:02 p.m. | #4
On Fri, Jan 12, 2018 at 02:04:11PM -0800, Ben Pfaff wrote:
> On Fri, Jan 12, 2018 at 04:51:00PM -0500, Eric Garver wrote:
> > On Fri, Jan 12, 2018 at 04:35:06PM -0500, Eric Garver wrote:
> > > On Fri, Jan 12, 2018 at 12:44:59PM -0800, Ben Pfaff wrote:
> > > > The kernel GRE driver ignores IFLA_MTU in RTM_NEWLINK requests and
> > > > overrides the MTU to 1472 bytes.  This commit works around the problem by
> > > > following up a request to create a GRE device with a second request to set
> > > > the MTU.
> > > > 
> > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1488484
> > > > Reported-by: Eric Garver <e@erig.me>
> > > > Reported-by: James Page <james.page@ubuntu.com>
> > > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > > > ---
> > > > This is not properly tested.  It needs to be tested before it makes sense
> > > > to commit it.
> > > > 
> > > >  lib/dpif-netlink-rtnl.c | 13 +++++++++++++
> > > >  1 file changed, 13 insertions(+)
> > > > 
> > > > diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
> > > > index fe9c8ed7104f..0bf965d29f41 100644
> > > > --- a/lib/dpif-netlink-rtnl.c
> > > > +++ b/lib/dpif-netlink-rtnl.c
> > > > @@ -329,6 +329,19 @@ dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg,
> > > >      nl_msg_end_nested(&request, linkinfo_off);
> > > >  
> > > >      err = nl_transact(NETLINK_ROUTE, &request, NULL);
> > > > +    if (!err && type == OVS_VPORT_TYPE_GRE) {
> > > > +        /* Work around a bug in kernel GRE driver, which ignores IFLA_MTU in
> > > > +         * RTM_NEWLINK, by setting the MTU again.  See
> > > > +         * https://bugzilla.redhat.com/show_bug.cgi?id=1488484. */
> > > > +        ofpbuf_clear(&request);
> > > > +        nl_msg_put_nlmsghdr(&request, 0, RTM_SETLINK,
> > > > +                            NLM_F_REQUEST | NLM_F_ACK);
> > > > +        ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
> > > > +        nl_msg_put_string(&request, IFLA_IFNAME, name);
> > > > +        nl_msg_put_u32(&request, IFLA_MTU, UINT16_MAX);
> > > > +
> > > > +        err = nl_transact(NETLINK_ROUTE, &request, NULL);
> > > > +    }
> > > >  
> > > >  exit:
> > > >      ofpbuf_uninit(&request);
> > > > -- 
> > > > 2.10.2
> > > > 
> > > 
> > > This looks sane to me. It would be nice if James could test it though.
> > 
> > On second thought.. if SETLINK fails we'll report "err" that the tunnel
> > didn't get created, but it actually did. We can either ignore the err
> > from SETLINK or destroy the interface. Depends on if you prefer low MTU
> > tunnel or no tunnel. :)
> 
> I guess it would be slightly better to report no error (and log the real
> error), because there's no guarantee that DELLINK would succeed, and
> what do we do if it doesn't?

Right. I'm good with ignoring the SETLINK error.
Eric Garver Jan. 16, 2018, 3:42 p.m. | #5
On Fri, Jan 12, 2018 at 12:44:59PM -0800, Ben Pfaff wrote:
> The kernel GRE driver ignores IFLA_MTU in RTM_NEWLINK requests and
> overrides the MTU to 1472 bytes.  This commit works around the problem by
> following up a request to create a GRE device with a second request to set
> the MTU.
> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1488484
> Reported-by: Eric Garver <e@erig.me>
> Reported-by: James Page <james.page@ubuntu.com>
> Signed-off-by: Ben Pfaff <blp@ovn.org>
> ---
> This is not properly tested.  It needs to be tested before it makes sense
> to commit it.
> 
>  lib/dpif-netlink-rtnl.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
> index fe9c8ed7104f..0bf965d29f41 100644
> --- a/lib/dpif-netlink-rtnl.c
> +++ b/lib/dpif-netlink-rtnl.c
> @@ -329,6 +329,19 @@ dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg,
>      nl_msg_end_nested(&request, linkinfo_off);
>  
>      err = nl_transact(NETLINK_ROUTE, &request, NULL);
> +    if (!err && type == OVS_VPORT_TYPE_GRE) {
> +        /* Work around a bug in kernel GRE driver, which ignores IFLA_MTU in
> +         * RTM_NEWLINK, by setting the MTU again.  See
> +         * https://bugzilla.redhat.com/show_bug.cgi?id=1488484. */
> +        ofpbuf_clear(&request);
> +        nl_msg_put_nlmsghdr(&request, 0, RTM_SETLINK,
> +                            NLM_F_REQUEST | NLM_F_ACK);
> +        ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
> +        nl_msg_put_string(&request, IFLA_IFNAME, name);
> +        nl_msg_put_u32(&request, IFLA_MTU, UINT16_MAX);

James' testing found that this value will be rejected by the kernel. GRE
driver uses ip_tunnel_change_mtu() with the "strict" parameter, so it
will return an error instead of clamping the value. GENEVE/VXLAN always
clamp the value.

I think 65478 is the max_mtu for gretap accounting for optional csum,
key, seq in gre_calc_hlen(). It may be slightly higher for how OVS
configures gretap.

  0xFFF8 - eth(14) - ipv4(20) - gre(16) = 65478

> +
> +        err = nl_transact(NETLINK_ROUTE, &request, NULL);
> +    }
>  
>  exit:
>      ofpbuf_uninit(&request);
> -- 
> 2.10.2
Ben Pfaff Jan. 16, 2018, 5:16 p.m. | #6
On Tue, Jan 16, 2018 at 10:42:16AM -0500, Eric Garver wrote:
> On Fri, Jan 12, 2018 at 12:44:59PM -0800, Ben Pfaff wrote:
> > The kernel GRE driver ignores IFLA_MTU in RTM_NEWLINK requests and
> > overrides the MTU to 1472 bytes.  This commit works around the problem by
> > following up a request to create a GRE device with a second request to set
> > the MTU.
> > 
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1488484
> > Reported-by: Eric Garver <e@erig.me>
> > Reported-by: James Page <james.page@ubuntu.com>
> > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > ---
> > This is not properly tested.  It needs to be tested before it makes sense
> > to commit it.
> > 
> >  lib/dpif-netlink-rtnl.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
> > index fe9c8ed7104f..0bf965d29f41 100644
> > --- a/lib/dpif-netlink-rtnl.c
> > +++ b/lib/dpif-netlink-rtnl.c
> > @@ -329,6 +329,19 @@ dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg,
> >      nl_msg_end_nested(&request, linkinfo_off);
> >  
> >      err = nl_transact(NETLINK_ROUTE, &request, NULL);
> > +    if (!err && type == OVS_VPORT_TYPE_GRE) {
> > +        /* Work around a bug in kernel GRE driver, which ignores IFLA_MTU in
> > +         * RTM_NEWLINK, by setting the MTU again.  See
> > +         * https://bugzilla.redhat.com/show_bug.cgi?id=1488484. */
> > +        ofpbuf_clear(&request);
> > +        nl_msg_put_nlmsghdr(&request, 0, RTM_SETLINK,
> > +                            NLM_F_REQUEST | NLM_F_ACK);
> > +        ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
> > +        nl_msg_put_string(&request, IFLA_IFNAME, name);
> > +        nl_msg_put_u32(&request, IFLA_MTU, UINT16_MAX);
> 
> James' testing found that this value will be rejected by the kernel. GRE
> driver uses ip_tunnel_change_mtu() with the "strict" parameter, so it
> will return an error instead of clamping the value. GENEVE/VXLAN always
> clamp the value.
> 
> I think 65478 is the max_mtu for gretap accounting for optional csum,
> key, seq in gre_calc_hlen(). It may be slightly higher for how OVS
> configures gretap.
> 
>   0xFFF8 - eth(14) - ipv4(20) - gre(16) = 65478

Do you think that it's worthwhile using the maximum possible MTU, or
should we be a little conservative and use 65400 or 65000?  Either of
those values would be future-proof, I think.

Thanks,

Ben.
Eric Garver Jan. 16, 2018, 7:08 p.m. | #7
On Tue, Jan 16, 2018 at 09:16:47AM -0800, Ben Pfaff wrote:
> On Tue, Jan 16, 2018 at 10:42:16AM -0500, Eric Garver wrote:
> > On Fri, Jan 12, 2018 at 12:44:59PM -0800, Ben Pfaff wrote:
> > > The kernel GRE driver ignores IFLA_MTU in RTM_NEWLINK requests and
> > > overrides the MTU to 1472 bytes.  This commit works around the problem by
> > > following up a request to create a GRE device with a second request to set
> > > the MTU.
> > > 
> > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1488484
> > > Reported-by: Eric Garver <e@erig.me>
> > > Reported-by: James Page <james.page@ubuntu.com>
> > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > > ---
> > > This is not properly tested.  It needs to be tested before it makes sense
> > > to commit it.
> > > 
> > >  lib/dpif-netlink-rtnl.c | 13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > > 
> > > diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
> > > index fe9c8ed7104f..0bf965d29f41 100644
> > > --- a/lib/dpif-netlink-rtnl.c
> > > +++ b/lib/dpif-netlink-rtnl.c
> > > @@ -329,6 +329,19 @@ dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg,
> > >      nl_msg_end_nested(&request, linkinfo_off);
> > >  
> > >      err = nl_transact(NETLINK_ROUTE, &request, NULL);
> > > +    if (!err && type == OVS_VPORT_TYPE_GRE) {
> > > +        /* Work around a bug in kernel GRE driver, which ignores IFLA_MTU in
> > > +         * RTM_NEWLINK, by setting the MTU again.  See
> > > +         * https://bugzilla.redhat.com/show_bug.cgi?id=1488484. */
> > > +        ofpbuf_clear(&request);
> > > +        nl_msg_put_nlmsghdr(&request, 0, RTM_SETLINK,
> > > +                            NLM_F_REQUEST | NLM_F_ACK);
> > > +        ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
> > > +        nl_msg_put_string(&request, IFLA_IFNAME, name);
> > > +        nl_msg_put_u32(&request, IFLA_MTU, UINT16_MAX);
> > 
> > James' testing found that this value will be rejected by the kernel. GRE
> > driver uses ip_tunnel_change_mtu() with the "strict" parameter, so it
> > will return an error instead of clamping the value. GENEVE/VXLAN always
> > clamp the value.
> > 
> > I think 65478 is the max_mtu for gretap accounting for optional csum,
> > key, seq in gre_calc_hlen(). It may be slightly higher for how OVS
> > configures gretap.
> > 
> >   0xFFF8 - eth(14) - ipv4(20) - gre(16) = 65478
> 
> Do you think that it's worthwhile using the maximum possible MTU, or

I don't think it's worth possibly hitting another one of these issues.

> should we be a little conservative and use 65400 or 65000?  Either of
> those values would be future-proof, I think.

I agree 65000 would be okay, but if we're going to do it then lets do it
for everything (GENEVE/VXLAN/GRE) instead of assuming the kernel will
clamp the value for us on newlink/setlink.
Ben Pfaff Jan. 17, 2018, 6:04 p.m. | #8
On Tue, Jan 16, 2018 at 02:08:45PM -0500, Eric Garver wrote:
> On Tue, Jan 16, 2018 at 09:16:47AM -0800, Ben Pfaff wrote:
> > On Tue, Jan 16, 2018 at 10:42:16AM -0500, Eric Garver wrote:
> > > On Fri, Jan 12, 2018 at 12:44:59PM -0800, Ben Pfaff wrote:
> > > > The kernel GRE driver ignores IFLA_MTU in RTM_NEWLINK requests and
> > > > overrides the MTU to 1472 bytes.  This commit works around the problem by
> > > > following up a request to create a GRE device with a second request to set
> > > > the MTU.
> > > > 
> > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1488484
> > > > Reported-by: Eric Garver <e@erig.me>
> > > > Reported-by: James Page <james.page@ubuntu.com>
> > > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > > > ---
> > > > This is not properly tested.  It needs to be tested before it makes sense
> > > > to commit it.
> > > > 
> > > >  lib/dpif-netlink-rtnl.c | 13 +++++++++++++
> > > >  1 file changed, 13 insertions(+)
> > > > 
> > > > diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
> > > > index fe9c8ed7104f..0bf965d29f41 100644
> > > > --- a/lib/dpif-netlink-rtnl.c
> > > > +++ b/lib/dpif-netlink-rtnl.c
> > > > @@ -329,6 +329,19 @@ dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg,
> > > >      nl_msg_end_nested(&request, linkinfo_off);
> > > >  
> > > >      err = nl_transact(NETLINK_ROUTE, &request, NULL);
> > > > +    if (!err && type == OVS_VPORT_TYPE_GRE) {
> > > > +        /* Work around a bug in kernel GRE driver, which ignores IFLA_MTU in
> > > > +         * RTM_NEWLINK, by setting the MTU again.  See
> > > > +         * https://bugzilla.redhat.com/show_bug.cgi?id=1488484. */
> > > > +        ofpbuf_clear(&request);
> > > > +        nl_msg_put_nlmsghdr(&request, 0, RTM_SETLINK,
> > > > +                            NLM_F_REQUEST | NLM_F_ACK);
> > > > +        ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
> > > > +        nl_msg_put_string(&request, IFLA_IFNAME, name);
> > > > +        nl_msg_put_u32(&request, IFLA_MTU, UINT16_MAX);
> > > 
> > > James' testing found that this value will be rejected by the kernel. GRE
> > > driver uses ip_tunnel_change_mtu() with the "strict" parameter, so it
> > > will return an error instead of clamping the value. GENEVE/VXLAN always
> > > clamp the value.
> > > 
> > > I think 65478 is the max_mtu for gretap accounting for optional csum,
> > > key, seq in gre_calc_hlen(). It may be slightly higher for how OVS
> > > configures gretap.
> > > 
> > >   0xFFF8 - eth(14) - ipv4(20) - gre(16) = 65478
> > 
> > Do you think that it's worthwhile using the maximum possible MTU, or
> 
> I don't think it's worth possibly hitting another one of these issues.
> 
> > should we be a little conservative and use 65400 or 65000?  Either of
> > those values would be future-proof, I think.
> 
> I agree 65000 would be okay, but if we're going to do it then lets do it
> for everything (GENEVE/VXLAN/GRE) instead of assuming the kernel will
> clamp the value for us on newlink/setlink.

OK, great.

I posted v3 that does this and makes the error handling fix we agreed
on:
https://patchwork.ozlabs.org/project/openvswitch/list/?series=23965
James Page Jan. 18, 2018, 8:19 a.m. | #9
Hi Ben

On Wed, 17 Jan 2018 at 20:05 Ben Pfaff <blp@ovn.org> wrote:

> On Tue, Jan 16, 2018 at 02:08:45PM -0500, Eric Garver wrote:
> > On Tue, Jan 16, 2018 at 09:16:47AM -0800, Ben Pfaff wrote:
> > > On Tue, Jan 16, 2018 at 10:42:16AM -0500, Eric Garver wrote:
> > > > On Fri, Jan 12, 2018 at 12:44:59PM -0800, Ben Pfaff wrote:
> > > > > The kernel GRE driver ignores IFLA_MTU in RTM_NEWLINK requests and
> > > > > overrides the MTU to 1472 bytes.  This commit works around the
> problem by
> > > > > following up a request to create a GRE device with a second
> request to set
> > > > > the MTU.
> > > > >
> > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1488484
> > > > > Reported-by: Eric Garver <e@erig.me>
> > > > > Reported-by: James Page <james.page@ubuntu.com>
> > > > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > > > > ---
> > > > > This is not properly tested.  It needs to be tested before it
> makes sense
> > > > > to commit it.
> > > > >
> > > > >  lib/dpif-netlink-rtnl.c | 13 +++++++++++++
> > > > >  1 file changed, 13 insertions(+)
> > > > >
> > > > > diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
> > > > > index fe9c8ed7104f..0bf965d29f41 100644
> > > > > --- a/lib/dpif-netlink-rtnl.c
> > > > > +++ b/lib/dpif-netlink-rtnl.c
> > > > > @@ -329,6 +329,19 @@ dpif_netlink_rtnl_create(const struct
> netdev_tunnel_config *tnl_cfg,
> > > > >      nl_msg_end_nested(&request, linkinfo_off);
> > > > >
> > > > >      err = nl_transact(NETLINK_ROUTE, &request, NULL);
> > > > > +    if (!err && type == OVS_VPORT_TYPE_GRE) {
> > > > > +        /* Work around a bug in kernel GRE driver, which ignores
> IFLA_MTU in
> > > > > +         * RTM_NEWLINK, by setting the MTU again.  See
> > > > > +         * https://bugzilla.redhat.com/show_bug.cgi?id=1488484.
> */
> > > > > +        ofpbuf_clear(&request);
> > > > > +        nl_msg_put_nlmsghdr(&request, 0, RTM_SETLINK,
> > > > > +                            NLM_F_REQUEST | NLM_F_ACK);
> > > > > +        ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
> > > > > +        nl_msg_put_string(&request, IFLA_IFNAME, name);
> > > > > +        nl_msg_put_u32(&request, IFLA_MTU, UINT16_MAX);
> > > >
> > > > James' testing found that this value will be rejected by the kernel.
> GRE
> > > > driver uses ip_tunnel_change_mtu() with the "strict" parameter, so it
> > > > will return an error instead of clamping the value. GENEVE/VXLAN
> always
> > > > clamp the value.
> > > >
> > > > I think 65478 is the max_mtu for gretap accounting for optional csum,
> > > > key, seq in gre_calc_hlen(). It may be slightly higher for how OVS
> > > > configures gretap.
> > > >
> > > >   0xFFF8 - eth(14) - ipv4(20) - gre(16) = 65478
> > >
> > > Do you think that it's worthwhile using the maximum possible MTU, or
> >
> > I don't think it's worth possibly hitting another one of these issues.
> >
> > > should we be a little conservative and use 65400 or 65000?  Either of
> > > those values would be future-proof, I think.
> >
> > I agree 65000 would be okay, but if we're going to do it then lets do it
> > for everything (GENEVE/VXLAN/GRE) instead of assuming the kernel will
> > clamp the value for us on newlink/setlink.
>
> OK, great.
>
> I posted v3 that does this and makes the error handling fix we agreed
> on:
> https://patchwork.ozlabs.org/project/openvswitch/list/?series=23965


retested with your new patches; confirmed that MTU for GRE and VXLAN ports
are set consistently to 65000.

Cheers

James
Ben Pfaff Jan. 18, 2018, 10:55 p.m. | #10
On Thu, Jan 18, 2018 at 08:19:06AM +0000, James Page wrote:
> Hi Ben
> 
> On Wed, 17 Jan 2018 at 20:05 Ben Pfaff <blp@ovn.org> wrote:
> 
> > On Tue, Jan 16, 2018 at 02:08:45PM -0500, Eric Garver wrote:
> > > On Tue, Jan 16, 2018 at 09:16:47AM -0800, Ben Pfaff wrote:
> > > > On Tue, Jan 16, 2018 at 10:42:16AM -0500, Eric Garver wrote:
> > > > > On Fri, Jan 12, 2018 at 12:44:59PM -0800, Ben Pfaff wrote:
> > > > > > The kernel GRE driver ignores IFLA_MTU in RTM_NEWLINK requests and
> > > > > > overrides the MTU to 1472 bytes.  This commit works around the
> > problem by
> > > > > > following up a request to create a GRE device with a second
> > request to set
> > > > > > the MTU.
> > > > > >
> > > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1488484
> > > > > > Reported-by: Eric Garver <e@erig.me>
> > > > > > Reported-by: James Page <james.page@ubuntu.com>
> > > > > > Signed-off-by: Ben Pfaff <blp@ovn.org>
> > > > > > ---
> > > > > > This is not properly tested.  It needs to be tested before it
> > makes sense
> > > > > > to commit it.
> > > > > >
> > > > > >  lib/dpif-netlink-rtnl.c | 13 +++++++++++++
> > > > > >  1 file changed, 13 insertions(+)
> > > > > >
> > > > > > diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
> > > > > > index fe9c8ed7104f..0bf965d29f41 100644
> > > > > > --- a/lib/dpif-netlink-rtnl.c
> > > > > > +++ b/lib/dpif-netlink-rtnl.c
> > > > > > @@ -329,6 +329,19 @@ dpif_netlink_rtnl_create(const struct
> > netdev_tunnel_config *tnl_cfg,
> > > > > >      nl_msg_end_nested(&request, linkinfo_off);
> > > > > >
> > > > > >      err = nl_transact(NETLINK_ROUTE, &request, NULL);
> > > > > > +    if (!err && type == OVS_VPORT_TYPE_GRE) {
> > > > > > +        /* Work around a bug in kernel GRE driver, which ignores
> > IFLA_MTU in
> > > > > > +         * RTM_NEWLINK, by setting the MTU again.  See
> > > > > > +         * https://bugzilla.redhat.com/show_bug.cgi?id=1488484.
> > */
> > > > > > +        ofpbuf_clear(&request);
> > > > > > +        nl_msg_put_nlmsghdr(&request, 0, RTM_SETLINK,
> > > > > > +                            NLM_F_REQUEST | NLM_F_ACK);
> > > > > > +        ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
> > > > > > +        nl_msg_put_string(&request, IFLA_IFNAME, name);
> > > > > > +        nl_msg_put_u32(&request, IFLA_MTU, UINT16_MAX);
> > > > >
> > > > > James' testing found that this value will be rejected by the kernel.
> > GRE
> > > > > driver uses ip_tunnel_change_mtu() with the "strict" parameter, so it
> > > > > will return an error instead of clamping the value. GENEVE/VXLAN
> > always
> > > > > clamp the value.
> > > > >
> > > > > I think 65478 is the max_mtu for gretap accounting for optional csum,
> > > > > key, seq in gre_calc_hlen(). It may be slightly higher for how OVS
> > > > > configures gretap.
> > > > >
> > > > >   0xFFF8 - eth(14) - ipv4(20) - gre(16) = 65478
> > > >
> > > > Do you think that it's worthwhile using the maximum possible MTU, or
> > >
> > > I don't think it's worth possibly hitting another one of these issues.
> > >
> > > > should we be a little conservative and use 65400 or 65000?  Either of
> > > > those values would be future-proof, I think.
> > >
> > > I agree 65000 would be okay, but if we're going to do it then lets do it
> > > for everything (GENEVE/VXLAN/GRE) instead of assuming the kernel will
> > > clamp the value for us on newlink/setlink.
> >
> > OK, great.
> >
> > I posted v3 that does this and makes the error handling fix we agreed
> > on:
> > https://patchwork.ozlabs.org/project/openvswitch/list/?series=23965
> 
> 
> retested with your new patches; confirmed that MTU for GRE and VXLAN ports
> are set consistently to 65000.
> 
> Cheers
> 
> James

Thanks for the testing!  I applied these patches to master and
branch-2.9.

Patch

diff --git a/lib/dpif-netlink-rtnl.c b/lib/dpif-netlink-rtnl.c
index fe9c8ed7104f..0bf965d29f41 100644
--- a/lib/dpif-netlink-rtnl.c
+++ b/lib/dpif-netlink-rtnl.c
@@ -329,6 +329,19 @@  dpif_netlink_rtnl_create(const struct netdev_tunnel_config *tnl_cfg,
     nl_msg_end_nested(&request, linkinfo_off);
 
     err = nl_transact(NETLINK_ROUTE, &request, NULL);
+    if (!err && type == OVS_VPORT_TYPE_GRE) {
+        /* Work around a bug in kernel GRE driver, which ignores IFLA_MTU in
+         * RTM_NEWLINK, by setting the MTU again.  See
+         * https://bugzilla.redhat.com/show_bug.cgi?id=1488484. */
+        ofpbuf_clear(&request);
+        nl_msg_put_nlmsghdr(&request, 0, RTM_SETLINK,
+                            NLM_F_REQUEST | NLM_F_ACK);
+        ofpbuf_put_zeros(&request, sizeof(struct ifinfomsg));
+        nl_msg_put_string(&request, IFLA_IFNAME, name);
+        nl_msg_put_u32(&request, IFLA_MTU, UINT16_MAX);
+
+        err = nl_transact(NETLINK_ROUTE, &request, NULL);
+    }
 
 exit:
     ofpbuf_uninit(&request);