diff mbox series

[ovs-dev] netlink-socket: Check for null sock in nl_sock_recv__()

Message ID 20211025174509.198991-1-drc@linux.vnet.ibm.com
State Changes Requested
Headers show
Series [ovs-dev] netlink-socket: Check for null sock in nl_sock_recv__() | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test fail github build: failed

Commit Message

David Christensen Oct. 25, 2021, 5:45 p.m. UTC
In certain high load situations, such as when creating a large number of
ports on a switch, the parameter 'sock' may be passed to nl_sock_recv__()
as null, resulting in a segmentation fault when 'sock' is later
dereferenced, such as when calling recvmsg().

The ovs-vswitchd.log will display something like this:

    fatal_signal(revalidator138)|WARN|terminating with signal 11 (signal 11)

Tested this change under the same circumstances that originally generated
the segmentation fault and it ran successfully for four days without any
issue.

Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
IBM-BZ: #193057
---
 lib/netlink-socket.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

0-day Robot Oct. 25, 2021, 6:18 p.m. UTC | #1
Bleep bloop.  Greetings David Christensen, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Lines checked: 43, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Ilya Maximets Nov. 16, 2021, 10:31 p.m. UTC | #2
On 10/25/21 19:45, David Christensen wrote:
> In certain high load situations, such as when creating a large number of
> ports on a switch, the parameter 'sock' may be passed to nl_sock_recv__()
> as null, resulting in a segmentation fault when 'sock' is later
> dereferenced, such as when calling recvmsg().

Hi, David.  Thanks for the patch.

It's OK to check for a NULL pointer there, I guess.  However,
do you know from where it was actually called?  This function,
in general, should not be called without the actual socket,
so we, probably, should fix the caller instead.

Best regards, Ilya Maximets.

> 
> The ovs-vswitchd.log will display something like this:
> 
>     fatal_signal(revalidator138)|WARN|terminating with signal 11 (signal 11)
> 
> Tested this change under the same circumstances that originally generated
> the segmentation fault and it ran successfully for four days without any
> issue.
> 
> Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
> IBM-BZ: #193057
> ---
>  lib/netlink-socket.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
> index 5867de564..3ab4d8485 100644
> --- a/lib/netlink-socket.c
> +++ b/lib/netlink-socket.c
> @@ -653,6 +653,10 @@ nl_sock_recv__(struct nl_sock *sock, struct ofpbuf *buf, int *nsid, bool wait)
>      int *ptr;
>      int error;
>  
> +    if (sock == NULL) {
> +        return ECONNRESET;
> +    }
> +
>      ovs_assert(buf->allocated >= sizeof *nlmsghdr);
>      ofpbuf_clear(buf);
>  
>
Murilo Opsfelder Araújo Nov. 18, 2021, 7:56 p.m. UTC | #3
On 11/16/21 19:31, Ilya Maximets wrote:
> On 10/25/21 19:45, David Christensen wrote:
>> In certain high load situations, such as when creating a large number of
>> ports on a switch, the parameter 'sock' may be passed to nl_sock_recv__()
>> as null, resulting in a segmentation fault when 'sock' is later
>> dereferenced, such as when calling recvmsg().
> 
> Hi, David.  Thanks for the patch.
> 
> It's OK to check for a NULL pointer there, I guess.  However,
> do you know from where it was actually called?  This function,
> in general, should not be called without the actual socket,
> so we, probably, should fix the caller instead.
> 
> Best regards, Ilya Maximets.

Hi, Ilya Maximets.

When I looked at the coredump file, ch->sock was nil and was passed to nl_sock_recv():

(gdb) l
2701
2702        while (handler->event_offset < handler->n_events) {
2703            int idx = handler->epoll_events[handler->event_offset].data.u32;
2704            struct dpif_channel *ch = &dpif->channels[idx];

(gdb) p idx
$26 = 4
(gdb) p *dpif->channels@5
$27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, {sock = 0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 0x1001ae09510, last_poll = 60634070}, {sock = 0x1001a9dbb60, last_poll = 60756950}, {sock = 0x0,
     last_poll = 61340749}}


The above snippet is from lib/dpif-netlink.c and the caller is dpif_netlink_recv_vport_dispatch().

The channel at idx=4 had sock=0x0, which was passed to nl_sock_recv() via ch->sock parameter.
In that function, it tried to access sock->fd when calling recvmsg(), causing the segfault.

I'm not enough experienced in Open vSwitch to explain why sock was nil at that given index.
The fix seems worth, though.

Cheers!

> 
>>
>> The ovs-vswitchd.log will display something like this:
>>
>>      fatal_signal(revalidator138)|WARN|terminating with signal 11 (signal 11)
>>
>> Tested this change under the same circumstances that originally generated
>> the segmentation fault and it ran successfully for four days without any
>> issue.
>>
>> Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
>> Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
>> IBM-BZ: #193057
>> ---
>>   lib/netlink-socket.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
>> index 5867de564..3ab4d8485 100644
>> --- a/lib/netlink-socket.c
>> +++ b/lib/netlink-socket.c
>> @@ -653,6 +653,10 @@ nl_sock_recv__(struct nl_sock *sock, struct ofpbuf *buf, int *nsid, bool wait)
>>       int *ptr;
>>       int error;
>>   
>> +    if (sock == NULL) {
>> +        return ECONNRESET;
>> +    }
>> +
>>       ovs_assert(buf->allocated >= sizeof *nlmsghdr);
>>       ofpbuf_clear(buf);
>>   
>>
>
David Christensen Nov. 18, 2021, 9:19 p.m. UTC | #4
On 11/18/21 11:56 AM, Murilo Opsfelder Araújo wrote:
> On 11/16/21 19:31, Ilya Maximets wrote:
>> On 10/25/21 19:45, David Christensen wrote:
>>> In certain high load situations, such as when creating a large number of
>>> ports on a switch, the parameter 'sock' may be passed to 
>>> nl_sock_recv__()
>>> as null, resulting in a segmentation fault when 'sock' is later
>>> dereferenced, such as when calling recvmsg().
>>
>> Hi, David.  Thanks for the patch.
>>
>> It's OK to check for a NULL pointer there, I guess.  However,
>> do you know from where it was actually called?  This function,
>> in general, should not be called without the actual socket,
>> so we, probably, should fix the caller instead.
>>
>> Best regards, Ilya Maximets.
> 
> Hi, Ilya Maximets.
> 
> When I looked at the coredump file, ch->sock was nil and was passed to 
> nl_sock_recv():
> 
> (gdb) l
> 2701
> 2702        while (handler->event_offset < handler->n_events) {
> 2703            int idx = 
> handler->epoll_events[handler->event_offset].data.u32;
> 2704            struct dpif_channel *ch = &dpif->channels[idx];
> 
> (gdb) p idx
> $26 = 4
> (gdb) p *dpif->channels@5
> $27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, {sock = 
> 0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 0x1001ae09510, 
> last_poll = 60634070}, {sock = 0x1001a9dbb60, last_poll = 60756950}, 
> {sock = 0x0,
>      last_poll = 61340749}}
> 
> 
> The above snippet is from lib/dpif-netlink.c and the caller is 
> dpif_netlink_recv_vport_dispatch().
> 
> The channel at idx=4 had sock=0x0, which was passed to nl_sock_recv() 
> via ch->sock parameter.
> In that function, it tried to access sock->fd when calling recvmsg(), 
> causing the segfault.
> 
> I'm not enough experienced in Open vSwitch to explain why sock was nil 
> at that given index.
> The fix seems worth, though.

A few other points of note:

- Test system was a very large configuration (2K CPUs, > 1TB RAM)
- OVS Switch was configured with 6K ports as follows:

# b=br0; cmds=; for i in {1..6000}; do cmds+=" -- add-port $b p$i -- set 
interface p$i type=internal"; done
# time sudo ovs-vsctl $cmds

- OVS was installed from RHEL RPM.  Build from source did not exhibit 
the same problem.
- Unable to reproduce on a different system (128 CPUs, 256GB RAM), even 
with 10K ports.

Dave
Ilya Maximets Nov. 19, 2021, 4:23 p.m. UTC | #5
On 11/18/21 22:19, David Christensen wrote:
> 
> 
> On 11/18/21 11:56 AM, Murilo Opsfelder Araújo wrote:
>> On 11/16/21 19:31, Ilya Maximets wrote:
>>> On 10/25/21 19:45, David Christensen wrote:
>>>> In certain high load situations, such as when creating a large number of
>>>> ports on a switch, the parameter 'sock' may be passed to nl_sock_recv__()
>>>> as null, resulting in a segmentation fault when 'sock' is later
>>>> dereferenced, such as when calling recvmsg().
>>>
>>> Hi, David.  Thanks for the patch.
>>>
>>> It's OK to check for a NULL pointer there, I guess.  However,
>>> do you know from where it was actually called?  This function,
>>> in general, should not be called without the actual socket,
>>> so we, probably, should fix the caller instead.
>>>
>>> Best regards, Ilya Maximets.
>>
>> Hi, Ilya Maximets.
>>
>> When I looked at the coredump file, ch->sock was nil and was passed to nl_sock_recv():
>>
>> (gdb) l
>> 2701
>> 2702        while (handler->event_offset < handler->n_events) {
>> 2703            int idx = handler->epoll_events[handler->event_offset].data.u32;
>> 2704            struct dpif_channel *ch = &dpif->channels[idx];
>>
>> (gdb) p idx
>> $26 = 4
>> (gdb) p *dpif->channels@5
>> $27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, {sock = 0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 0x1001ae09510, last_poll = 60634070}, {sock = 0x1001a9dbb60, last_poll = 60756950}, {sock = 0x0,
>>      last_poll = 61340749}}
>>
>>
>> The above snippet is from lib/dpif-netlink.c and the caller is dpif_netlink_recv_vport_dispatch().
>>
>> The channel at idx=4 had sock=0x0, which was passed to nl_sock_recv() via ch->sock parameter.
>> In that function, it tried to access sock->fd when calling recvmsg(), causing the segfault.
>>
>> I'm not enough experienced in Open vSwitch to explain why sock was nil at that given index.
>> The fix seems worth, though.
> 
> A few other points of note:
> 
> - Test system was a very large configuration (2K CPUs, > 1TB RAM)
> - OVS Switch was configured with 6K ports as follows:
> 
> # b=br0; cmds=; for i in {1..6000}; do cmds+=" -- add-port $b p$i -- set interface p$i type=internal"; done
> # time sudo ovs-vsctl $cmds
> 
> - OVS was installed from RHEL RPM.  Build from source did not exhibit the same problem.
> - Unable to reproduce on a different system (128 CPUs, 256GB RAM), even with 10K ports.
> 
> Dave

Thanks for all the information.  Having a NULL socket in the 'channels'
array doesn't look a good sign.  This may mean that even if OVS will
not crash, it may not receive upcalls for certain ports, which is not
good eihter.

Could you provide more info on which RHEL package did you use and which
version you built from source?  Maybe we can find the difference.

Was there any suspicious warnings/errors in the OVS log file prior to
the failure?  Maybe we run out of file descriptors somewhere due to
the machine size...

Best regards, Ilya Maximets.
Murilo Opsfelder Araújo Nov. 22, 2021, 6:46 p.m. UTC | #6
Hi, Ilya Maximets.

On 11/19/21 13:23, Ilya Maximets wrote:
> On 11/18/21 22:19, David Christensen wrote:
>>
>>
>> On 11/18/21 11:56 AM, Murilo Opsfelder Araújo wrote:
>>> On 11/16/21 19:31, Ilya Maximets wrote:
>>>> On 10/25/21 19:45, David Christensen wrote:
>>>>> In certain high load situations, such as when creating a large number of
>>>>> ports on a switch, the parameter 'sock' may be passed to nl_sock_recv__()
>>>>> as null, resulting in a segmentation fault when 'sock' is later
>>>>> dereferenced, such as when calling recvmsg().
>>>>
>>>> Hi, David.  Thanks for the patch.
>>>>
>>>> It's OK to check for a NULL pointer there, I guess.  However,
>>>> do you know from where it was actually called?  This function,
>>>> in general, should not be called without the actual socket,
>>>> so we, probably, should fix the caller instead.
>>>>
>>>> Best regards, Ilya Maximets.
>>>
>>> Hi, Ilya Maximets.
>>>
>>> When I looked at the coredump file, ch->sock was nil and was passed to nl_sock_recv():
>>>
>>> (gdb) l
>>> 2701
>>> 2702        while (handler->event_offset < handler->n_events) {
>>> 2703            int idx = handler->epoll_events[handler->event_offset].data.u32;
>>> 2704            struct dpif_channel *ch = &dpif->channels[idx];
>>>
>>> (gdb) p idx
>>> $26 = 4
>>> (gdb) p *dpif->channels@5
>>> $27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, {sock = 0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 0x1001ae09510, last_poll = 60634070}, {sock = 0x1001a9dbb60, last_poll = 60756950}, {sock = 0x0,
>>>       last_poll = 61340749}}
>>>
>>>
>>> The above snippet is from lib/dpif-netlink.c and the caller is dpif_netlink_recv_vport_dispatch().
>>>
>>> The channel at idx=4 had sock=0x0, which was passed to nl_sock_recv() via ch->sock parameter.
>>> In that function, it tried to access sock->fd when calling recvmsg(), causing the segfault.
>>>
>>> I'm not enough experienced in Open vSwitch to explain why sock was nil at that given index.
>>> The fix seems worth, though.
>>
>> A few other points of note:
>>
>> - Test system was a very large configuration (2K CPUs, > 1TB RAM)
>> - OVS Switch was configured with 6K ports as follows:
>>
>> # b=br0; cmds=; for i in {1..6000}; do cmds+=" -- add-port $b p$i -- set interface p$i type=internal"; done
>> # time sudo ovs-vsctl $cmds
>>
>> - OVS was installed from RHEL RPM.  Build from source did not exhibit the same problem.
>> - Unable to reproduce on a different system (128 CPUs, 256GB RAM), even with 10K ports.
>>
>> Dave
> 
> Thanks for all the information.  Having a NULL socket in the 'channels'
> array doesn't look a good sign.  This may mean that even if OVS will
> not crash, it may not receive upcalls for certain ports, which is not
> good eihter.
> 
> Could you provide more info on which RHEL package did you use and which
> version you built from source?  Maybe we can find the difference.

When the bug was reported internally, the distro package was actually from Fedora 33 ppc64le.

One difference I've found at that time was regarding the configure options.

 From the logs at [0], the distro package was configured with these options:

     + ./configure --build=ppc64le-redhat-linux-gnu --host=ppc64le-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-libcapng --disable-static --enable-shared --enable-ssl --with-dpdk=shared --with-pkidir=/var/lib/openvswitch/pki

And the built-from-source binaries from Dave were built like this:

     git clone https://github.com/openvswitch/ovs.git
     cd ovs/
     git checkout v2.15.0
     ./boot.sh
     ./configure
     make
     sudo make install

At that time, I suspected that one (or a combination) of the configure options was contributing to the issue to manifest.

[0] https://kojipkgs.fedoraproject.org//packages/openvswitch/2.15.0/1.fc33/data/logs/ppc64le/build.log

> 
> Was there any suspicious warnings/errors in the OVS log file prior to
> the failure?  Maybe we run out of file descriptors somewhere due to
> the machine size...

None that I could tell.  Only the following in the ovs-vswitchd.log:

     fatal_signal(revalidator138)|WARN|terminating with signal 11 (signal 11)

> 
> Best regards, Ilya Maximets.
> 

Cheers!
Flavio Leitner Nov. 23, 2021, 6:32 p.m. UTC | #7
On Mon, Nov 22, 2021 at 03:46:28PM -0300, Murilo Opsfelder Araújo wrote:
> Hi, Ilya Maximets.
> 
> On 11/19/21 13:23, Ilya Maximets wrote:
> > On 11/18/21 22:19, David Christensen wrote:
> > > 
> > > 
> > > On 11/18/21 11:56 AM, Murilo Opsfelder Araújo wrote:
> > > > On 11/16/21 19:31, Ilya Maximets wrote:
> > > > > On 10/25/21 19:45, David Christensen wrote:
> > > > > > In certain high load situations, such as when creating a large number of
> > > > > > ports on a switch, the parameter 'sock' may be passed to nl_sock_recv__()
> > > > > > as null, resulting in a segmentation fault when 'sock' is later
> > > > > > dereferenced, such as when calling recvmsg().
> > > > > 
> > > > > Hi, David.  Thanks for the patch.
> > > > > 
> > > > > It's OK to check for a NULL pointer there, I guess.  However,
> > > > > do you know from where it was actually called?  This function,
> > > > > in general, should not be called without the actual socket,
> > > > > so we, probably, should fix the caller instead.
> > > > > 
> > > > > Best regards, Ilya Maximets.
> > > > 
> > > > Hi, Ilya Maximets.
> > > > 
> > > > When I looked at the coredump file, ch->sock was nil and was passed to nl_sock_recv():
> > > > 
> > > > (gdb) l
> > > > 2701
> > > > 2702        while (handler->event_offset < handler->n_events) {
> > > > 2703            int idx = handler->epoll_events[handler->event_offset].data.u32;
> > > > 2704            struct dpif_channel *ch = &dpif->channels[idx];
> > > > 
> > > > (gdb) p idx
> > > > $26 = 4
> > > > (gdb) p *dpif->channels@5
> > > > $27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, {sock = 0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 0x1001ae09510, last_poll = 60634070}, {sock = 0x1001a9dbb60, last_poll = 60756950}, {sock = 0x0,
> > > >       last_poll = 61340749}}
> > > > 
> > > > 
> > > > The above snippet is from lib/dpif-netlink.c and the caller is dpif_netlink_recv_vport_dispatch().
> > > > 
> > > > The channel at idx=4 had sock=0x0, which was passed to nl_sock_recv() via ch->sock parameter.
> > > > In that function, it tried to access sock->fd when calling recvmsg(), causing the segfault.
> > > > 
> > > > I'm not enough experienced in Open vSwitch to explain why sock was nil at that given index.
> > > > The fix seems worth, though.
> > > 
> > > A few other points of note:
> > > 
> > > - Test system was a very large configuration (2K CPUs, > 1TB RAM)
> > > - OVS Switch was configured with 6K ports as follows:
> > > 
> > > # b=br0; cmds=; for i in {1..6000}; do cmds+=" -- add-port $b p$i -- set interface p$i type=internal"; done
> > > # time sudo ovs-vsctl $cmds
> > > 
> > > - OVS was installed from RHEL RPM.  Build from source did not exhibit the same problem.
> > > - Unable to reproduce on a different system (128 CPUs, 256GB RAM), even with 10K ports.
> > > 
> > > Dave
> > 
> > Thanks for all the information.  Having a NULL socket in the 'channels'
> > array doesn't look a good sign.  This may mean that even if OVS will
> > not crash, it may not receive upcalls for certain ports, which is not
> > good eihter.
> > 
> > Could you provide more info on which RHEL package did you use and which
> > version you built from source?  Maybe we can find the difference.
> 
> When the bug was reported internally, the distro package was actually from Fedora 33 ppc64le.
> 
> One difference I've found at that time was regarding the configure options.
> 
> From the logs at [0], the distro package was configured with these options:
> 
>     + ./configure --build=ppc64le-redhat-linux-gnu --host=ppc64le-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-libcapng --disable-static --enable-shared --enable-ssl --with-dpdk=shared --with-pkidir=/var/lib/openvswitch/pki
> 
> And the built-from-source binaries from Dave were built like this:
> 
>     git clone https://github.com/openvswitch/ovs.git
>     cd ovs/
>     git checkout v2.15.0

Do you mean 2.16.0? Because the dpif_netlink_recv_vport_dispatch()
from the snippet above doesn't exist in 2.15.0.

And the FC33 uses 2.15.0:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1713250

Can you reproduce the issue? 
The reason I am asking is because in 2.16.0 we added another way to
handle upcalls, so I am wondering if the bug was introduced with
that change 2.16.0 or if it was already there in 2.15.0.

Also, could you provide the full backtrace of all threads?
(gdb) thread apply all bt

It would be nice to have a dump of the struct handler too.

Thanks,
fbl



>     ./boot.sh
>     ./configure
>     make
>     sudo make install
> 
> At that time, I suspected that one (or a combination) of the configure options was contributing to the issue to manifest.
> 
> [0] https://kojipkgs.fedoraproject.org//packages/openvswitch/2.15.0/1.fc33/data/logs/ppc64le/build.log
> 
> > 
> > Was there any suspicious warnings/errors in the OVS log file prior to
> > the failure?  Maybe we run out of file descriptors somewhere due to
> > the machine size...
> 
> None that I could tell.  Only the following in the ovs-vswitchd.log:
> 
>     fatal_signal(revalidator138)|WARN|terminating with signal 11 (signal 11)
> 
> > 
> > Best regards, Ilya Maximets.
> > 
> 
> Cheers!
> 
> -- 
> Murilo
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Murilo Opsfelder Araújo Nov. 23, 2021, 10:58 p.m. UTC | #8
Hi, Flavio.

On 11/23/21 15:32, Flavio Leitner wrote:
> On Mon, Nov 22, 2021 at 03:46:28PM -0300, Murilo Opsfelder Araújo wrote:
>> Hi, Ilya Maximets.
>>
>> On 11/19/21 13:23, Ilya Maximets wrote:
>>> On 11/18/21 22:19, David Christensen wrote:
>>>>
>>>>
>>>> On 11/18/21 11:56 AM, Murilo Opsfelder Araújo wrote:
>>>>> On 11/16/21 19:31, Ilya Maximets wrote:
>>>>>> On 10/25/21 19:45, David Christensen wrote:
>>>>>>> In certain high load situations, such as when creating a large number of
>>>>>>> ports on a switch, the parameter 'sock' may be passed to nl_sock_recv__()
>>>>>>> as null, resulting in a segmentation fault when 'sock' is later
>>>>>>> dereferenced, such as when calling recvmsg().
>>>>>>
>>>>>> Hi, David.  Thanks for the patch.
>>>>>>
>>>>>> It's OK to check for a NULL pointer there, I guess.  However,
>>>>>> do you know from where it was actually called?  This function,
>>>>>> in general, should not be called without the actual socket,
>>>>>> so we, probably, should fix the caller instead.
>>>>>>
>>>>>> Best regards, Ilya Maximets.
>>>>>
>>>>> Hi, Ilya Maximets.
>>>>>
>>>>> When I looked at the coredump file, ch->sock was nil and was passed to nl_sock_recv():
>>>>>
>>>>> (gdb) l
>>>>> 2701
>>>>> 2702        while (handler->event_offset < handler->n_events) {
>>>>> 2703            int idx = handler->epoll_events[handler->event_offset].data.u32;
>>>>> 2704            struct dpif_channel *ch = &dpif->channels[idx];
>>>>>
>>>>> (gdb) p idx
>>>>> $26 = 4
>>>>> (gdb) p *dpif->channels@5
>>>>> $27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, {sock = 0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 0x1001ae09510, last_poll = 60634070}, {sock = 0x1001a9dbb60, last_poll = 60756950}, {sock = 0x0,
>>>>>        last_poll = 61340749}}
>>>>>
>>>>>
>>>>> The above snippet is from lib/dpif-netlink.c and the caller is dpif_netlink_recv_vport_dispatch().
>>>>>
>>>>> The channel at idx=4 had sock=0x0, which was passed to nl_sock_recv() via ch->sock parameter.
>>>>> In that function, it tried to access sock->fd when calling recvmsg(), causing the segfault.
>>>>>
>>>>> I'm not enough experienced in Open vSwitch to explain why sock was nil at that given index.
>>>>> The fix seems worth, though.
>>>>
>>>> A few other points of note:
>>>>
>>>> - Test system was a very large configuration (2K CPUs, > 1TB RAM)
>>>> - OVS Switch was configured with 6K ports as follows:
>>>>
>>>> # b=br0; cmds=; for i in {1..6000}; do cmds+=" -- add-port $b p$i -- set interface p$i type=internal"; done
>>>> # time sudo ovs-vsctl $cmds
>>>>
>>>> - OVS was installed from RHEL RPM.  Build from source did not exhibit the same problem.
>>>> - Unable to reproduce on a different system (128 CPUs, 256GB RAM), even with 10K ports.
>>>>
>>>> Dave
>>>
>>> Thanks for all the information.  Having a NULL socket in the 'channels'
>>> array doesn't look a good sign.  This may mean that even if OVS will
>>> not crash, it may not receive upcalls for certain ports, which is not
>>> good eihter.
>>>
>>> Could you provide more info on which RHEL package did you use and which
>>> version you built from source?  Maybe we can find the difference.
>>
>> When the bug was reported internally, the distro package was actually from Fedora 33 ppc64le.
>>
>> One difference I've found at that time was regarding the configure options.
>>
>>  From the logs at [0], the distro package was configured with these options:
>>
>>      + ./configure --build=ppc64le-redhat-linux-gnu --host=ppc64le-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-libcapng --disable-static --enable-shared --enable-ssl --with-dpdk=shared --with-pkidir=/var/lib/openvswitch/pki
>>
>> And the built-from-source binaries from Dave were built like this:
>>
>>      git clone https://github.com/openvswitch/ovs.git
>>      cd ovs/
>>      git checkout v2.15.0
> 
> Do you mean 2.16.0? Because the dpif_netlink_recv_vport_dispatch()
> from the snippet above doesn't exist in 2.15.0.
> 
> And the FC33 uses 2.15.0:
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1713250
> 

I'm sorry.  I'd looked into the wrong revision.
The correct name of the caller is dpif_netlink_recv__(), in lib/dpif-netlink.c.

The version tag is correct: v2.15.0
It matches the version from Fedora 33.

> Can you reproduce the issue?
> The reason I am asking is because in 2.16.0 we added another way to
> handle upcalls, so I am wondering if the bug was introduced with
> that change 2.16.0 or if it was already there in 2.15.0.

The system where this issue happened is being used for other purposes now.
I'll see if we can manage to reproduce on another one.

We may also want to use latest bits of OVS and Fedora.

> 
> Also, could you provide the full backtrace of all threads?
> (gdb) thread apply all bt
> 
> It would be nice to have a dump of the struct handler too.

Absolutely!  We'll make sure to capture these data next time.

> 
> Thanks,
> fbl

Cheers!
Flavio Leitner Nov. 24, 2021, 1:16 p.m. UTC | #9
On Tue, Nov 23, 2021 at 07:58:52PM -0300, Murilo Opsfelder Araújo wrote:
> Hi, Flavio.
> 
> On 11/23/21 15:32, Flavio Leitner wrote:
> > On Mon, Nov 22, 2021 at 03:46:28PM -0300, Murilo Opsfelder Araújo wrote:
> > > Hi, Ilya Maximets.
> > > 
> > > On 11/19/21 13:23, Ilya Maximets wrote:
> > > > On 11/18/21 22:19, David Christensen wrote:
> > > > > 
> > > > > 
> > > > > On 11/18/21 11:56 AM, Murilo Opsfelder Araújo wrote:
> > > > > > On 11/16/21 19:31, Ilya Maximets wrote:
> > > > > > > On 10/25/21 19:45, David Christensen wrote:
> > > > > > > > In certain high load situations, such as when creating a large number of
> > > > > > > > ports on a switch, the parameter 'sock' may be passed to nl_sock_recv__()
> > > > > > > > as null, resulting in a segmentation fault when 'sock' is later
> > > > > > > > dereferenced, such as when calling recvmsg().
> > > > > > > 
> > > > > > > Hi, David.  Thanks for the patch.
> > > > > > > 
> > > > > > > It's OK to check for a NULL pointer there, I guess.  However,
> > > > > > > do you know from where it was actually called?  This function,
> > > > > > > in general, should not be called without the actual socket,
> > > > > > > so we, probably, should fix the caller instead.
> > > > > > > 
> > > > > > > Best regards, Ilya Maximets.
> > > > > > 
> > > > > > Hi, Ilya Maximets.
> > > > > > 
> > > > > > When I looked at the coredump file, ch->sock was nil and was passed to nl_sock_recv():
> > > > > > 
> > > > > > (gdb) l
> > > > > > 2701
> > > > > > 2702        while (handler->event_offset < handler->n_events) {
> > > > > > 2703            int idx = handler->epoll_events[handler->event_offset].data.u32;
> > > > > > 2704            struct dpif_channel *ch = &dpif->channels[idx];
> > > > > > 
> > > > > > (gdb) p idx
> > > > > > $26 = 4
> > > > > > (gdb) p *dpif->channels@5
> > > > > > $27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, {sock = 0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 0x1001ae09510, last_poll = 60634070}, {sock = 0x1001a9dbb60, last_poll = 60756950}, {sock = 0x0,
> > > > > >        last_poll = 61340749}}
> > > > > > 
> > > > > > 
> > > > > > The above snippet is from lib/dpif-netlink.c and the caller is dpif_netlink_recv_vport_dispatch().
> > > > > > 
> > > > > > The channel at idx=4 had sock=0x0, which was passed to nl_sock_recv() via ch->sock parameter.
> > > > > > In that function, it tried to access sock->fd when calling recvmsg(), causing the segfault.
> > > > > > 
> > > > > > I'm not enough experienced in Open vSwitch to explain why sock was nil at that given index.
> > > > > > The fix seems worth, though.
> > > > > 
> > > > > A few other points of note:
> > > > > 
> > > > > - Test system was a very large configuration (2K CPUs, > 1TB RAM)
> > > > > - OVS Switch was configured with 6K ports as follows:
> > > > > 
> > > > > # b=br0; cmds=; for i in {1..6000}; do cmds+=" -- add-port $b p$i -- set interface p$i type=internal"; done
> > > > > # time sudo ovs-vsctl $cmds
> > > > > 
> > > > > - OVS was installed from RHEL RPM.  Build from source did not exhibit the same problem.
> > > > > - Unable to reproduce on a different system (128 CPUs, 256GB RAM), even with 10K ports.
> > > > > 
> > > > > Dave
> > > > 
> > > > Thanks for all the information.  Having a NULL socket in the 'channels'
> > > > array doesn't look a good sign.  This may mean that even if OVS will
> > > > not crash, it may not receive upcalls for certain ports, which is not
> > > > good eihter.
> > > > 
> > > > Could you provide more info on which RHEL package did you use and which
> > > > version you built from source?  Maybe we can find the difference.
> > > 
> > > When the bug was reported internally, the distro package was actually from Fedora 33 ppc64le.
> > > 
> > > One difference I've found at that time was regarding the configure options.
> > > 
> > >  From the logs at [0], the distro package was configured with these options:
> > > 
> > >      + ./configure --build=ppc64le-redhat-linux-gnu --host=ppc64le-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-libcapng --disable-static --enable-shared --enable-ssl --with-dpdk=shared --with-pkidir=/var/lib/openvswitch/pki
> > > 
> > > And the built-from-source binaries from Dave were built like this:
> > > 
> > >      git clone https://github.com/openvswitch/ovs.git
> > >      cd ovs/
> > >      git checkout v2.15.0
> > 
> > Do you mean 2.16.0? Because the dpif_netlink_recv_vport_dispatch()
> > from the snippet above doesn't exist in 2.15.0.
> > 
> > And the FC33 uses 2.15.0:
> > https://koji.fedoraproject.org/koji/buildinfo?buildID=1713250
> > 
> 
> I'm sorry.  I'd looked into the wrong revision.
> The correct name of the caller is dpif_netlink_recv__(), in lib/dpif-netlink.c.
> 
> The version tag is correct: v2.15.0
> It matches the version from Fedora 33.

OK, that makes sense.

> > Can you reproduce the issue?
> > The reason I am asking is because in 2.16.0 we added another way to
> > handle upcalls, so I am wondering if the bug was introduced with
> > that change 2.16.0 or if it was already there in 2.15.0.
> 
> The system where this issue happened is being used for other purposes now.
> I'll see if we can manage to reproduce on another one.

At least we know if there is a bug, it's prior to the big change
adding the new upcall dispatching. 
>
> We may also want to use latest bits of OVS and Fedora.
> 
> > 
> > Also, could you provide the full backtrace of all threads?
> > (gdb) thread apply all bt
> > 
> > It would be nice to have a dump of the struct handler too.
> 
> Absolutely!  We'll make sure to capture these data next time.

That might be available in the core you already have.
Actually, if you can put the core somewhere maybe I can
take a look here using fedora debuginfo package.

fbl
Murilo Opsfelder Araújo Dec. 20, 2021, 12:53 p.m. UTC | #10
Hi, Flavio.

On 11/24/21 10:16, Flavio Leitner wrote:
> On Tue, Nov 23, 2021 at 07:58:52PM -0300, Murilo Opsfelder Araújo wrote:
>> Hi, Flavio.
>>
>> On 11/23/21 15:32, Flavio Leitner wrote:
>>> On Mon, Nov 22, 2021 at 03:46:28PM -0300, Murilo Opsfelder Araújo wrote:
>>>> Hi, Ilya Maximets.
>>>>
>>>> On 11/19/21 13:23, Ilya Maximets wrote:
>>>>> On 11/18/21 22:19, David Christensen wrote:
>>>>>>
>>>>>>
>>>>>> On 11/18/21 11:56 AM, Murilo Opsfelder Araújo wrote:
>>>>>>> On 11/16/21 19:31, Ilya Maximets wrote:
>>>>>>>> On 10/25/21 19:45, David Christensen wrote:
>>>>>>>>> In certain high load situations, such as when creating a large number of
>>>>>>>>> ports on a switch, the parameter 'sock' may be passed to nl_sock_recv__()
>>>>>>>>> as null, resulting in a segmentation fault when 'sock' is later
>>>>>>>>> dereferenced, such as when calling recvmsg().
>>>>>>>>
>>>>>>>> Hi, David.  Thanks for the patch.
>>>>>>>>
>>>>>>>> It's OK to check for a NULL pointer there, I guess.  However,
>>>>>>>> do you know from where it was actually called?  This function,
>>>>>>>> in general, should not be called without the actual socket,
>>>>>>>> so we, probably, should fix the caller instead.
>>>>>>>>
>>>>>>>> Best regards, Ilya Maximets.
>>>>>>>
>>>>>>> Hi, Ilya Maximets.
>>>>>>>
>>>>>>> When I looked at the coredump file, ch->sock was nil and was passed to nl_sock_recv():
>>>>>>>
>>>>>>> (gdb) l
>>>>>>> 2701
>>>>>>> 2702        while (handler->event_offset < handler->n_events) {
>>>>>>> 2703            int idx = handler->epoll_events[handler->event_offset].data.u32;
>>>>>>> 2704            struct dpif_channel *ch = &dpif->channels[idx];
>>>>>>>
>>>>>>> (gdb) p idx
>>>>>>> $26 = 4
>>>>>>> (gdb) p *dpif->channels@5
>>>>>>> $27 = {{sock = 0x1001ae88240, last_poll = -9223372036854775808}, {sock = 0x1001aa9a8a0, last_poll = -9223372036854775808}, {sock = 0x1001ae09510, last_poll = 60634070}, {sock = 0x1001a9dbb60, last_poll = 60756950}, {sock = 0x0,
>>>>>>>         last_poll = 61340749}}
>>>>>>>
>>>>>>>
>>>>>>> The above snippet is from lib/dpif-netlink.c and the caller is dpif_netlink_recv_vport_dispatch().
>>>>>>>
>>>>>>> The channel at idx=4 had sock=0x0, which was passed to nl_sock_recv() via ch->sock parameter.
>>>>>>> In that function, it tried to access sock->fd when calling recvmsg(), causing the segfault.
>>>>>>>
>>>>>>> I'm not enough experienced in Open vSwitch to explain why sock was nil at that given index.
>>>>>>> The fix seems worth, though.
>>>>>>
>>>>>> A few other points of note:
>>>>>>
>>>>>> - Test system was a very large configuration (2K CPUs, > 1TB RAM)
>>>>>> - OVS Switch was configured with 6K ports as follows:
>>>>>>
>>>>>> # b=br0; cmds=; for i in {1..6000}; do cmds+=" -- add-port $b p$i -- set interface p$i type=internal"; done
>>>>>> # time sudo ovs-vsctl $cmds
>>>>>>
>>>>>> - OVS was installed from RHEL RPM.  Build from source did not exhibit the same problem.
>>>>>> - Unable to reproduce on a different system (128 CPUs, 256GB RAM), even with 10K ports.
>>>>>>
>>>>>> Dave
>>>>>
>>>>> Thanks for all the information.  Having a NULL socket in the 'channels'
>>>>> array doesn't look a good sign.  This may mean that even if OVS will
>>>>> not crash, it may not receive upcalls for certain ports, which is not
>>>>> good eihter.
>>>>>
>>>>> Could you provide more info on which RHEL package did you use and which
>>>>> version you built from source?  Maybe we can find the difference.
>>>>
>>>> When the bug was reported internally, the distro package was actually from Fedora 33 ppc64le.
>>>>
>>>> One difference I've found at that time was regarding the configure options.
>>>>
>>>>   From the logs at [0], the distro package was configured with these options:
>>>>
>>>>       + ./configure --build=ppc64le-redhat-linux-gnu --host=ppc64le-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-libcapng --disable-static --enable-shared --enable-ssl --with-dpdk=shared --with-pkidir=/var/lib/openvswitch/pki
>>>>
>>>> And the built-from-source binaries from Dave were built like this:
>>>>
>>>>       git clone https://github.com/openvswitch/ovs.git
>>>>       cd ovs/
>>>>       git checkout v2.15.0
>>>
>>> Do you mean 2.16.0? Because the dpif_netlink_recv_vport_dispatch()
>>> from the snippet above doesn't exist in 2.15.0.
>>>
>>> And the FC33 uses 2.15.0:
>>> https://koji.fedoraproject.org/koji/buildinfo?buildID=1713250
>>>
>>
>> I'm sorry.  I'd looked into the wrong revision.
>> The correct name of the caller is dpif_netlink_recv__(), in lib/dpif-netlink.c.
>>
>> The version tag is correct: v2.15.0
>> It matches the version from Fedora 33.
> 
> OK, that makes sense.
> 
>>> Can you reproduce the issue?
>>> The reason I am asking is because in 2.16.0 we added another way to
>>> handle upcalls, so I am wondering if the bug was introduced with
>>> that change 2.16.0 or if it was already there in 2.15.0.
>>
>> The system where this issue happened is being used for other purposes now.
>> I'll see if we can manage to reproduce on another one.
> 
> At least we know if there is a bug, it's prior to the big change
> adding the new upcall dispatching.
>>
>> We may also want to use latest bits of OVS and Fedora.
>>
>>>
>>> Also, could you provide the full backtrace of all threads?
>>> (gdb) thread apply all bt
>>>
>>> It would be nice to have a dump of the struct handler too.
>>
>> Absolutely!  We'll make sure to capture these data next time.
> 
> That might be available in the core you already have.
> Actually, if you can put the core somewhere maybe I can
> take a look here using fedora debuginfo package.
> 
> fbl

I just wanted to let you know that I tried Open vSwitch at commit

commit 1f5749c790accd98dbcafdaefc40bf5e52d7c672 ("flow: Consider dataofs when parsing TCP packets.")

on a different system (a KVM guest) and I wasn't able to reproduce the issue on that system.
I also tried using Fedora 35 package openvswitch-2.15.0-7.fc35.ppc64le but the issue didn't manifest either.

If we happen to hit the problem again on the larger system (the one with 1TB of RAM), I'll let you know and will capture requested data.

Thanks a lot for your help!  We did appreciate!

Cheers!
diff mbox series

Patch

diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
index 5867de564..3ab4d8485 100644
--- a/lib/netlink-socket.c
+++ b/lib/netlink-socket.c
@@ -653,6 +653,10 @@  nl_sock_recv__(struct nl_sock *sock, struct ofpbuf *buf, int *nsid, bool wait)
     int *ptr;
     int error;
 
+    if (sock == NULL) {
+        return ECONNRESET;
+    }
+
     ovs_assert(buf->allocated >= sizeof *nlmsghdr);
     ofpbuf_clear(buf);