diff mbox series

[PATCH-for-6.0] net: tap: fix crash on hotplug

Message ID 3f6be9c84782a0943ea21a8a6f8a5d055b65f2d5.1619018363.git.crobinso@redhat.com
State New
Headers show
Series [PATCH-for-6.0] net: tap: fix crash on hotplug | expand

Commit Message

Cole Robinson April 21, 2021, 3:22 p.m. UTC
Attempting to hotplug a tap nic with libvirt will crash qemu:

$ sudo virsh attach-interface f32 network default
error: Failed to attach interface
error: Unable to read from monitor: Connection reset by peer

0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206
206	        if (!s->nc.peer->do_not_pad) {
gdb$ bt

s->nc.peer may not be set at this point. This seems to be an
expected case, as qemu_send_packet_* explicitly checks for NULL
s->nc.peer later.

Fix it by checking for s->nc.peer here too. Padding is applied if
s->nc.peer is not set.

https://bugzilla.redhat.com/show_bug.cgi?id=1949786
Fixes: 969e50b61a2

Signed-off-by: Cole Robinson <crobinso@redhat.com>
---
* Or should we skip padding if nc.peer is unset? I didn't dig into it
* tap-win3.c and slirp.c may need a similar fix, but the slirp case
  didn't crash in a simple test.

 net/tap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Philippe Mathieu-Daudé April 21, 2021, 4:36 p.m. UTC | #1
Cc'ing Bin.

On 4/21/21 5:22 PM, Cole Robinson wrote:
> Attempting to hotplug a tap nic with libvirt will crash qemu:
> 
> $ sudo virsh attach-interface f32 network default
> error: Failed to attach interface
> error: Unable to read from monitor: Connection reset by peer
> 
> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206
> 206	        if (!s->nc.peer->do_not_pad) {
> gdb$ bt
> 
> s->nc.peer may not be set at this point. This seems to be an
> expected case, as qemu_send_packet_* explicitly checks for NULL
> s->nc.peer later.
> 
> Fix it by checking for s->nc.peer here too. Padding is applied if
> s->nc.peer is not set.
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1949786
> Fixes: 969e50b61a2
> 
> Signed-off-by: Cole Robinson <crobinso@redhat.com>
> ---
> * Or should we skip padding if nc.peer is unset? I didn't dig into it
> * tap-win3.c and slirp.c may need a similar fix, but the slirp case
>   didn't crash in a simple test.
> 
>  net/tap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/tap.c b/net/tap.c
> index dd42ac6134..937559dbb8 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -203,7 +203,7 @@ static void tap_send(void *opaque)
>              size -= s->host_vnet_hdr_len;
>          }
>  
> -        if (!s->nc.peer->do_not_pad) {
> +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {
>              if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) {
>                  buf = min_pkt;
>                  size = min_pktsz;
>
Peter Maydell April 21, 2021, 7:54 p.m. UTC | #2
On Wed, 21 Apr 2021 at 16:24, Cole Robinson <crobinso@redhat.com> wrote:
>
> Attempting to hotplug a tap nic with libvirt will crash qemu:
>
> $ sudo virsh attach-interface f32 network default
> error: Failed to attach interface
> error: Unable to read from monitor: Connection reset by peer
>
> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206
> 206             if (!s->nc.peer->do_not_pad) {
> gdb$ bt
>
> s->nc.peer may not be set at this point. This seems to be an
> expected case, as qemu_send_packet_* explicitly checks for NULL
> s->nc.peer later.
>
> Fix it by checking for s->nc.peer here too. Padding is applied if
> s->nc.peer is not set.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1949786
> Fixes: 969e50b61a2

Is this a regression since 5.2 ? (I guess so given the Fixes tag.)

Also, I'm kind of irritated that this was reported to RH on the
15th and we only get a patch now after rc4. I really really don't
want to have to roll an rc5, so this now has a much higher
hill to climb to get into 6.0 than if it had been reported
(eg on the "Planning" wiki page) as a for-6.0 issue before rc4
was tagged...

thanks
-- PMM
Cole Robinson April 21, 2021, 10:27 p.m. UTC | #3
On 4/21/21 3:54 PM, Peter Maydell wrote:
> On Wed, 21 Apr 2021 at 16:24, Cole Robinson <crobinso@redhat.com> wrote:
>>
>> Attempting to hotplug a tap nic with libvirt will crash qemu:
>>
>> $ sudo virsh attach-interface f32 network default
>> error: Failed to attach interface
>> error: Unable to read from monitor: Connection reset by peer
>>
>> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206
>> 206             if (!s->nc.peer->do_not_pad) {
>> gdb$ bt
>>
>> s->nc.peer may not be set at this point. This seems to be an
>> expected case, as qemu_send_packet_* explicitly checks for NULL
>> s->nc.peer later.
>>
>> Fix it by checking for s->nc.peer here too. Padding is applied if
>> s->nc.peer is not set.
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1949786
>> Fixes: 969e50b61a2
> 
> Is this a regression since 5.2 ? (I guess so given the Fixes tag.)
> 

Yes

> Also, I'm kind of irritated that this was reported to RH on the
> 15th and we only get a patch now after rc4.

Sorry about that, I was slow attempting the reproducer, only gave it a
spin today. I saw Jason had some reverts in rc3 so I guessed that would
fix things, I was surprised to see it still reproduced with rc4.

 I really really don't
> want to have to roll an rc5, so this now has a much higher
> hill to climb to get into 6.0 than if it had been reported
> (eg on the "Planning" wiki page) as a for-6.0 issue before rc4
> was tagged..
I'm not too in tune to rules of the rc releases TBH, I used the subject
prefix just to ensure this got attention. For Fedora's needs it's not a
big deal if this isn't in 6.0.0 GA. But AFAICT most nic hotplug via
libvirt will crash qemu 100% of the time so I imagine every distro will
want to immediately backport this patch.

Thanks,
Cole
Jason Wang April 22, 2021, 2:25 a.m. UTC | #4
在 2021/4/21 下午11:22, Cole Robinson 写道:
> Attempting to hotplug a tap nic with libvirt will crash qemu:
>
> $ sudo virsh attach-interface f32 network default
> error: Failed to attach interface
> error: Unable to read from monitor: Connection reset by peer
>
> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206
> 206	        if (!s->nc.peer->do_not_pad) {
> gdb$ bt
>
> s->nc.peer may not be set at this point. This seems to be an
> expected case, as qemu_send_packet_* explicitly checks for NULL
> s->nc.peer later.
>
> Fix it by checking for s->nc.peer here too. Padding is applied if
> s->nc.peer is not set.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1949786
> Fixes: 969e50b61a2
>
> Signed-off-by: Cole Robinson <crobinso@redhat.com>
> ---
> * Or should we skip padding if nc.peer is unset?


I think so, the padding is for the peer.


> I didn't dig into it
> * tap-win3.c and slirp.c may need a similar fix, but the slirp case
>    didn't crash in a simple test.


Yes, the reason is because there's no packet go through slirp I think.

Thanks.


>
>   net/tap.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/tap.c b/net/tap.c
> index dd42ac6134..937559dbb8 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -203,7 +203,7 @@ static void tap_send(void *opaque)
>               size -= s->host_vnet_hdr_len;
>           }
>   
> -        if (!s->nc.peer->do_not_pad) {
> +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {
>               if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) {
>                   buf = min_pkt;
>                   size = min_pktsz;
Bin Meng April 22, 2021, 4:29 a.m. UTC | #5
On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé
<philmd@redhat.com> wrote:
>
> Cc'ing Bin.
>
> On 4/21/21 5:22 PM, Cole Robinson wrote:
> > Attempting to hotplug a tap nic with libvirt will crash qemu:
> >
> > $ sudo virsh attach-interface f32 network default
> > error: Failed to attach interface
> > error: Unable to read from monitor: Connection reset by peer
> >
> > 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206
> > 206           if (!s->nc.peer->do_not_pad) {
> > gdb$ bt
> >
> > s->nc.peer may not be set at this point. This seems to be an
> > expected case, as qemu_send_packet_* explicitly checks for NULL
> > s->nc.peer later.
> >
> > Fix it by checking for s->nc.peer here too. Padding is applied if
> > s->nc.peer is not set.
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1949786
> > Fixes: 969e50b61a2
> >
> > Signed-off-by: Cole Robinson <crobinso@redhat.com>
> > ---
> > * Or should we skip padding if nc.peer is unset? I didn't dig into it
> > * tap-win3.c and slirp.c may need a similar fix, but the slirp case
> >   didn't crash in a simple test.
> >
> >  net/tap.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/tap.c b/net/tap.c
> > index dd42ac6134..937559dbb8 100644
> > --- a/net/tap.c
> > +++ b/net/tap.c
> > @@ -203,7 +203,7 @@ static void tap_send(void *opaque)
> >              size -= s->host_vnet_hdr_len;
> >          }
> >
> > -        if (!s->nc.peer->do_not_pad) {
> > +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {

I think we should do:

if (s->nc.peer && !s->nc.peer->do_not_pad)

> >              if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) {
> >                  buf = min_pkt;
> >                  size = min_pktsz;
> >

And do the similar fix on tap-win32 and slirp codes.

Regards,
Bin
Peter Maydell April 22, 2021, 9:36 a.m. UTC | #6
On Thu, 22 Apr 2021 at 05:29, Bin Meng <bmeng.cn@gmail.com> wrote:
>
> On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé
> <philmd@redhat.com> wrote:
> >
> > Cc'ing Bin.
> >
> > On 4/21/21 5:22 PM, Cole Robinson wrote:
> > > Attempting to hotplug a tap nic with libvirt will crash qemu:
> > >
> > > $ sudo virsh attach-interface f32 network default
> > > error: Failed to attach interface
> > > error: Unable to read from monitor: Connection reset by peer
> > >
> > > 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206
> > > 206           if (!s->nc.peer->do_not_pad) {
> > > gdb$ bt
> > >
> > > s->nc.peer may not be set at this point. This seems to be an
> > > expected case, as qemu_send_packet_* explicitly checks for NULL
> > > s->nc.peer later.
> > >
> > > Fix it by checking for s->nc.peer here too. Padding is applied if
> > > s->nc.peer is not set.
> > >
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1949786
> > > Fixes: 969e50b61a2
> > >
> > > Signed-off-by: Cole Robinson <crobinso@redhat.com>
> > > ---
> > > * Or should we skip padding if nc.peer is unset? I didn't dig into it
> > > * tap-win3.c and slirp.c may need a similar fix, but the slirp case
> > >   didn't crash in a simple test.
> > >
> > >  net/tap.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/net/tap.c b/net/tap.c
> > > index dd42ac6134..937559dbb8 100644
> > > --- a/net/tap.c
> > > +++ b/net/tap.c
> > > @@ -203,7 +203,7 @@ static void tap_send(void *opaque)
> > >              size -= s->host_vnet_hdr_len;
> > >          }
> > >
> > > -        if (!s->nc.peer->do_not_pad) {
> > > +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {
>
> I think we should do:
>
> if (s->nc.peer && !s->nc.peer->do_not_pad)

Yes. If there is no peer then the qemu_send_packet() that we're about
to do is going to discard the packet anyway, so there's no point in
padding it.

Maybe consider

static inline bool net_peer_needs_padding(NetClientState *nc)
{
    return nc->peer && !nc->peer->do_not_pad;
}

since we want the same check in three places ?

thanks
-- PMM
Bin Meng April 22, 2021, 9:42 a.m. UTC | #7
On Thu, Apr 22, 2021 at 5:36 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Thu, 22 Apr 2021 at 05:29, Bin Meng <bmeng.cn@gmail.com> wrote:
> >
> > On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé
> > <philmd@redhat.com> wrote:
> > >
> > > Cc'ing Bin.
> > >
> > > On 4/21/21 5:22 PM, Cole Robinson wrote:
> > > > Attempting to hotplug a tap nic with libvirt will crash qemu:
> > > >
> > > > $ sudo virsh attach-interface f32 network default
> > > > error: Failed to attach interface
> > > > error: Unable to read from monitor: Connection reset by peer
> > > >
> > > > 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206
> > > > 206           if (!s->nc.peer->do_not_pad) {
> > > > gdb$ bt
> > > >
> > > > s->nc.peer may not be set at this point. This seems to be an
> > > > expected case, as qemu_send_packet_* explicitly checks for NULL
> > > > s->nc.peer later.
> > > >
> > > > Fix it by checking for s->nc.peer here too. Padding is applied if
> > > > s->nc.peer is not set.
> > > >
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=1949786
> > > > Fixes: 969e50b61a2
> > > >
> > > > Signed-off-by: Cole Robinson <crobinso@redhat.com>
> > > > ---
> > > > * Or should we skip padding if nc.peer is unset? I didn't dig into it
> > > > * tap-win3.c and slirp.c may need a similar fix, but the slirp case
> > > >   didn't crash in a simple test.
> > > >
> > > >  net/tap.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/net/tap.c b/net/tap.c
> > > > index dd42ac6134..937559dbb8 100644
> > > > --- a/net/tap.c
> > > > +++ b/net/tap.c
> > > > @@ -203,7 +203,7 @@ static void tap_send(void *opaque)
> > > >              size -= s->host_vnet_hdr_len;
> > > >          }
> > > >
> > > > -        if (!s->nc.peer->do_not_pad) {
> > > > +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {
> >
> > I think we should do:
> >
> > if (s->nc.peer && !s->nc.peer->do_not_pad)
>
> Yes. If there is no peer then the qemu_send_packet() that we're about
> to do is going to discard the packet anyway, so there's no point in
> padding it.
>
> Maybe consider
>
> static inline bool net_peer_needs_padding(NetClientState *nc)
> {
>     return nc->peer && !nc->peer->do_not_pad;
> }
>
> since we want the same check in three places ?

Sounds good to me.

Regards,
Bin
Cole Robinson April 22, 2021, 9:34 p.m. UTC | #8
On 4/22/21 5:42 AM, Bin Meng wrote:
> On Thu, Apr 22, 2021 at 5:36 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>> On Thu, 22 Apr 2021 at 05:29, Bin Meng <bmeng.cn@gmail.com> wrote:
>>>
>>> On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé
>>> <philmd@redhat.com> wrote:
>>>>
>>>> Cc'ing Bin.
>>>>
>>>> On 4/21/21 5:22 PM, Cole Robinson wrote:
>>>>> Attempting to hotplug a tap nic with libvirt will crash qemu:
>>>>>
>>>>> $ sudo virsh attach-interface f32 network default
>>>>> error: Failed to attach interface
>>>>> error: Unable to read from monitor: Connection reset by peer
>>>>>
>>>>> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206
>>>>> 206           if (!s->nc.peer->do_not_pad) {
>>>>> gdb$ bt
>>>>>
>>>>> s->nc.peer may not be set at this point. This seems to be an
>>>>> expected case, as qemu_send_packet_* explicitly checks for NULL
>>>>> s->nc.peer later.
>>>>>
>>>>> Fix it by checking for s->nc.peer here too. Padding is applied if
>>>>> s->nc.peer is not set.
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1949786
>>>>> Fixes: 969e50b61a2
>>>>>
>>>>> Signed-off-by: Cole Robinson <crobinso@redhat.com>
>>>>> ---
>>>>> * Or should we skip padding if nc.peer is unset? I didn't dig into it
>>>>> * tap-win3.c and slirp.c may need a similar fix, but the slirp case
>>>>>   didn't crash in a simple test.
>>>>>
>>>>>  net/tap.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/net/tap.c b/net/tap.c
>>>>> index dd42ac6134..937559dbb8 100644
>>>>> --- a/net/tap.c
>>>>> +++ b/net/tap.c
>>>>> @@ -203,7 +203,7 @@ static void tap_send(void *opaque)
>>>>>              size -= s->host_vnet_hdr_len;
>>>>>          }
>>>>>
>>>>> -        if (!s->nc.peer->do_not_pad) {
>>>>> +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {
>>>
>>> I think we should do:
>>>
>>> if (s->nc.peer && !s->nc.peer->do_not_pad)
>>
>> Yes. If there is no peer then the qemu_send_packet() that we're about
>> to do is going to discard the packet anyway, so there's no point in
>> padding it.
>>
>> Maybe consider
>>
>> static inline bool net_peer_needs_padding(NetClientState *nc)
>> {
>>     return nc->peer && !nc->peer->do_not_pad;
>> }
>>
>> since we want the same check in three places ?
> 
> Sounds good to me.
> 

I did not get to this today. Bin/Jason/anyone want to write the patch, I
will test it tomorrow (US EDT time). If not I'll write the patch tomorrow.

Thanks,
Cole
Jason Wang April 23, 2021, 1:43 a.m. UTC | #9
在 2021/4/23 上午5:34, Cole Robinson 写道:
> On 4/22/21 5:42 AM, Bin Meng wrote:
>> On Thu, Apr 22, 2021 at 5:36 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>>> On Thu, 22 Apr 2021 at 05:29, Bin Meng <bmeng.cn@gmail.com> wrote:
>>>> On Thu, Apr 22, 2021 at 12:36 AM Philippe Mathieu-Daudé
>>>> <philmd@redhat.com> wrote:
>>>>> Cc'ing Bin.
>>>>>
>>>>> On 4/21/21 5:22 PM, Cole Robinson wrote:
>>>>>> Attempting to hotplug a tap nic with libvirt will crash qemu:
>>>>>>
>>>>>> $ sudo virsh attach-interface f32 network default
>>>>>> error: Failed to attach interface
>>>>>> error: Unable to read from monitor: Connection reset by peer
>>>>>>
>>>>>> 0x000055875b7f3a99 in tap_send (opaque=0x55875e39eae0) at ../net/tap.c:206
>>>>>> 206           if (!s->nc.peer->do_not_pad) {
>>>>>> gdb$ bt
>>>>>>
>>>>>> s->nc.peer may not be set at this point. This seems to be an
>>>>>> expected case, as qemu_send_packet_* explicitly checks for NULL
>>>>>> s->nc.peer later.
>>>>>>
>>>>>> Fix it by checking for s->nc.peer here too. Padding is applied if
>>>>>> s->nc.peer is not set.
>>>>>>
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1949786
>>>>>> Fixes: 969e50b61a2
>>>>>>
>>>>>> Signed-off-by: Cole Robinson <crobinso@redhat.com>
>>>>>> ---
>>>>>> * Or should we skip padding if nc.peer is unset? I didn't dig into it
>>>>>> * tap-win3.c and slirp.c may need a similar fix, but the slirp case
>>>>>>    didn't crash in a simple test.
>>>>>>
>>>>>>   net/tap.c | 2 +-
>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/net/tap.c b/net/tap.c
>>>>>> index dd42ac6134..937559dbb8 100644
>>>>>> --- a/net/tap.c
>>>>>> +++ b/net/tap.c
>>>>>> @@ -203,7 +203,7 @@ static void tap_send(void *opaque)
>>>>>>               size -= s->host_vnet_hdr_len;
>>>>>>           }
>>>>>>
>>>>>> -        if (!s->nc.peer->do_not_pad) {
>>>>>> +        if (!s->nc.peer || !s->nc.peer->do_not_pad) {
>>>> I think we should do:
>>>>
>>>> if (s->nc.peer && !s->nc.peer->do_not_pad)
>>> Yes. If there is no peer then the qemu_send_packet() that we're about
>>> to do is going to discard the packet anyway, so there's no point in
>>> padding it.
>>>
>>> Maybe consider
>>>
>>> static inline bool net_peer_needs_padding(NetClientState *nc)
>>> {
>>>      return nc->peer && !nc->peer->do_not_pad;
>>> }
>>>
>>> since we want the same check in three places ?
>> Sounds good to me.
>>
> I did not get to this today. Bin/Jason/anyone want to write the patch,


I will send a patch soon.

Thanks


> I
> will test it tomorrow (US EDT time). If not I'll write the patch tomorrow.
>
> Thanks,
> Cole
>
>
diff mbox series

Patch

diff --git a/net/tap.c b/net/tap.c
index dd42ac6134..937559dbb8 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -203,7 +203,7 @@  static void tap_send(void *opaque)
             size -= s->host_vnet_hdr_len;
         }
 
-        if (!s->nc.peer->do_not_pad) {
+        if (!s->nc.peer || !s->nc.peer->do_not_pad) {
             if (eth_pad_short_frame(min_pkt, &min_pktsz, buf, size)) {
                 buf = min_pkt;
                 size = min_pktsz;