Message ID | 20230912032131.678266-2-khalid.elmously@canonical.com |
---|---|
State | New |
Headers | show |
Series | net: Avoid address overwrite in kernel_connect | expand |
On Mon, Sep 11, 2023 at 11:21:31PM -0400, Khalid Elmously wrote: > From: Jordan Rife <jrife@google.com> > > BugLink: https://bugs.launchpad.net/bugs/2035163 > > BPF programs that run on connect can rewrite the connect address. For > the connect system call this isn't a problem, because a copy of the address > is made when it is moved into kernel space. However, kernel_connect > simply passes through the address it is given, so the caller may observe > its address value unexpectedly change. > > A practical example where this is problematic is where NFS is combined > with a system such as Cilium which implements BPF-based load balancing. > A common pattern in software-defined storage systems is to have an NFS > mount that connects to a persistent virtual IP which in turn maps to an > ephemeral server IP. This is usually done to achieve high availability: > if your server goes down you can quickly spin up a replacement and remap > the virtual IP to that endpoint. With BPF-based load balancing, mounts > will forget the virtual IP address when the address rewrite occurs > because a pointer to the only copy of that address is passed down the > stack. Server failover then breaks, because clients have forgotten the > virtual IP address. Reconnects fail and mounts remain broken. This patch > was tested by setting up a scenario like this and ensuring that NFS > reconnects worked after applying the patch. > > Signed-off-by: Jordan Rife <jrife@google.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > (backported from commit 0bdf399342c5acbd817c9098b6c7ed21f1974312) > [ kmously: adjusted for lack of READ_ONCE() ] > Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> This should really be sent to generic, such that is propagates to all kernels. The fact that this has been sent for gcp/gke only now is likely due to their request to get it merged sooner. This will require a respin. What is the respin here? What cycle is it? What kernels are being requested to be respun sooner rather than wait until the next cycle? Otherwise, Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> > --- > net/socket.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/net/socket.c b/net/socket.c > index 5c49074ef7f2ae..7344dcc7cb1ccb 100644 > --- a/net/socket.c > +++ b/net/socket.c > @@ -3453,7 +3453,12 @@ EXPORT_SYMBOL(kernel_accept); > int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen, > int flags) > { > - return sock->ops->connect(sock, addr, addrlen, flags); > + struct sockaddr_storage address; > + > + memcpy(&address, addr, addrlen); > + > + return sock->ops->connect(sock, (struct sockaddr *)&address, > + addrlen, flags); > } > EXPORT_SYMBOL(kernel_connect); > > -- > 2.34.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
On 2023-09-12 09:36:41 , Thadeu Lima de Souza Cascardo wrote: > On Mon, Sep 11, 2023 at 11:21:31PM -0400, Khalid Elmously wrote: > > From: Jordan Rife <jrife@google.com> > > > > BugLink: https://bugs.launchpad.net/bugs/2035163 > > > > BPF programs that run on connect can rewrite the connect address. For > > the connect system call this isn't a problem, because a copy of the address > > is made when it is moved into kernel space. However, kernel_connect > > simply passes through the address it is given, so the caller may observe > > its address value unexpectedly change. > > > > A practical example where this is problematic is where NFS is combined > > with a system such as Cilium which implements BPF-based load balancing. > > A common pattern in software-defined storage systems is to have an NFS > > mount that connects to a persistent virtual IP which in turn maps to an > > ephemeral server IP. This is usually done to achieve high availability: > > if your server goes down you can quickly spin up a replacement and remap > > the virtual IP to that endpoint. With BPF-based load balancing, mounts > > will forget the virtual IP address when the address rewrite occurs > > because a pointer to the only copy of that address is passed down the > > stack. Server failover then breaks, because clients have forgotten the > > virtual IP address. Reconnects fail and mounts remain broken. This patch > > was tested by setting up a scenario like this and ensuring that NFS > > reconnects worked after applying the patch. > > > > Signed-off-by: Jordan Rife <jrife@google.com> > > Signed-off-by: David S. Miller <davem@davemloft.net> > > (backported from commit 0bdf399342c5acbd817c9098b6c7ed21f1974312) > > [ kmously: adjusted for lack of READ_ONCE() ] > > Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> > > This should really be sent to generic, such that is propagates to all kernels. > > The fact that this has been sent for gcp/gke only now is likely due to their > request to get it merged sooner. This will require a respin. What is the respin > here? What cycle is it? What kernels are being requested to be respun sooner > rather than wait until the next cycle? They did not indicate that they need the issue fixed that urgently. They were OK with an early October release. There is no respin for this issue afaik. I agree this should be applied to generic. I just didn't want to slow down the review by nominating it for generic (and no one else seemed to report this issue). I will resend it for the generic kernels. > > Otherwise, > Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> > > > --- > > net/socket.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/net/socket.c b/net/socket.c > > index 5c49074ef7f2ae..7344dcc7cb1ccb 100644 > > --- a/net/socket.c > > +++ b/net/socket.c > > @@ -3453,7 +3453,12 @@ EXPORT_SYMBOL(kernel_accept); > > int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen, > > int flags) > > { > > - return sock->ops->connect(sock, addr, addrlen, flags); > > + struct sockaddr_storage address; > > + > > + memcpy(&address, addr, addrlen); > > + > > + return sock->ops->connect(sock, (struct sockaddr *)&address, > > + addrlen, flags); > > } > > EXPORT_SYMBOL(kernel_connect); > > > > -- > > 2.34.1 > > > > > > -- > > kernel-team mailing list > > kernel-team@lists.ubuntu.com > > https://lists.ubuntu.com/mailman/listinfo/kernel-team
On Tue, Sep 12, 2023 at 03:41:39PM -0400, Khalid Elmously wrote: > On 2023-09-12 09:36:41 , Thadeu Lima de Souza Cascardo wrote: > > On Mon, Sep 11, 2023 at 11:21:31PM -0400, Khalid Elmously wrote: > > > From: Jordan Rife <jrife@google.com> > > > > > > BugLink: https://bugs.launchpad.net/bugs/2035163 > > > > > > BPF programs that run on connect can rewrite the connect address. For > > > the connect system call this isn't a problem, because a copy of the address > > > is made when it is moved into kernel space. However, kernel_connect > > > simply passes through the address it is given, so the caller may observe > > > its address value unexpectedly change. > > > > > > A practical example where this is problematic is where NFS is combined > > > with a system such as Cilium which implements BPF-based load balancing. > > > A common pattern in software-defined storage systems is to have an NFS > > > mount that connects to a persistent virtual IP which in turn maps to an > > > ephemeral server IP. This is usually done to achieve high availability: > > > if your server goes down you can quickly spin up a replacement and remap > > > the virtual IP to that endpoint. With BPF-based load balancing, mounts > > > will forget the virtual IP address when the address rewrite occurs > > > because a pointer to the only copy of that address is passed down the > > > stack. Server failover then breaks, because clients have forgotten the > > > virtual IP address. Reconnects fail and mounts remain broken. This patch > > > was tested by setting up a scenario like this and ensuring that NFS > > > reconnects worked after applying the patch. > > > > > > Signed-off-by: Jordan Rife <jrife@google.com> > > > Signed-off-by: David S. Miller <davem@davemloft.net> > > > (backported from commit 0bdf399342c5acbd817c9098b6c7ed21f1974312) > > > [ kmously: adjusted for lack of READ_ONCE() ] > > > Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> > > > > This should really be sent to generic, such that is propagates to all kernels. > > > > The fact that this has been sent for gcp/gke only now is likely due to their > > request to get it merged sooner. This will require a respin. What is the respin > > here? What cycle is it? What kernels are being requested to be respun sooner > > rather than wait until the next cycle? > > They did not indicate that they need the issue fixed that urgently. They were OK with an early October release. There is no respin for this issue afaik. Early October release would mean a respin of 2023.09.04 cycle. Can you work on getting that started? Or check if a November release is fine, because this is the release this would get into without respins or other mechanisms being involved. Piggybacking on the following security release is another possibility. We would be talking about releasing around mid October. > > I agree this should be applied to generic. I just didn't want to slow down the review by nominating it for generic (and no one else seemed to report this issue). > > I will resend it for the generic kernels. > Thanks. Cascardo. > > > > > Otherwise, > > Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> > > > > > --- > > > net/socket.c | 7 ++++++- > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/socket.c b/net/socket.c > > > index 5c49074ef7f2ae..7344dcc7cb1ccb 100644 > > > --- a/net/socket.c > > > +++ b/net/socket.c > > > @@ -3453,7 +3453,12 @@ EXPORT_SYMBOL(kernel_accept); > > > int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen, > > > int flags) > > > { > > > - return sock->ops->connect(sock, addr, addrlen, flags); > > > + struct sockaddr_storage address; > > > + > > > + memcpy(&address, addr, addrlen); > > > + > > > + return sock->ops->connect(sock, (struct sockaddr *)&address, > > > + addrlen, flags); > > > } > > > EXPORT_SYMBOL(kernel_connect); > > > > > > -- > > > 2.34.1 > > > > > > > > > -- > > > kernel-team mailing list > > > kernel-team@lists.ubuntu.com > > > https://lists.ubuntu.com/mailman/listinfo/kernel-team
diff --git a/net/socket.c b/net/socket.c index 5c49074ef7f2ae..7344dcc7cb1ccb 100644 --- a/net/socket.c +++ b/net/socket.c @@ -3453,7 +3453,12 @@ EXPORT_SYMBOL(kernel_accept); int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen, int flags) { - return sock->ops->connect(sock, addr, addrlen, flags); + struct sockaddr_storage address; + + memcpy(&address, addr, addrlen); + + return sock->ops->connect(sock, (struct sockaddr *)&address, + addrlen, flags); } EXPORT_SYMBOL(kernel_connect);