diff mbox series

[net] tuntap: raise EPOLLOUT on device up

Message ID 1526648443-24128-1-git-send-email-jasowang@redhat.com
State Deferred, archived
Delegated to: David Miller
Headers show
Series [net] tuntap: raise EPOLLOUT on device up | expand

Commit Message

Jason Wang May 18, 2018, 1 p.m. UTC
We return -EIO on device down but can not raise EPOLLOUT after it was
up. This may confuse user like vhost which expects tuntap to raise
EPOLLOUT to re-enable its TX routine after tuntap is down. This could
be easily reproduced by transmitting packets from VM while down and up
the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <edumazet@google.com>
Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Michael S. Tsirkin May 18, 2018, 1:13 p.m. UTC | #1
On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:
> We return -EIO on device down but can not raise EPOLLOUT after it was
> up. This may confuse user like vhost which expects tuntap to raise
> EPOLLOUT to re-enable its TX routine after tuntap is down. This could
> be easily reproduced by transmitting packets from VM while down and up
> the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
> 
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/tun.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index d45ac37..1b29761 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>  	int skb_xdp = 1;
>  	bool frags = tun_napi_frags_enabled(tun);
>  
> -	if (!(tun->dev->flags & IFF_UP))
> +	if (!(tun->dev->flags & IFF_UP)) {

Isn't this racy?  What if flag is cleared at this point?

> +		set_bit(SOCKWQ_ASYNC_NOSPACE, &tfile->socket.flags);
>  		return -EIO;
> +	}
>  
>  	if (!(tun->flags & IFF_NO_PI)) {
>  		if (len < sizeof(pi))
> -- 
> 2.7.4
Jason Wang May 18, 2018, 1:26 p.m. UTC | #2
On 2018年05月18日 21:13, Michael S. Tsirkin wrote:
> On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:
>> We return -EIO on device down but can not raise EPOLLOUT after it was
>> up. This may confuse user like vhost which expects tuntap to raise
>> EPOLLOUT to re-enable its TX routine after tuntap is down. This could
>> be easily reproduced by transmitting packets from VM while down and up
>> the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
>>
>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   drivers/net/tun.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index d45ac37..1b29761 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
>>   	int skb_xdp = 1;
>>   	bool frags = tun_napi_frags_enabled(tun);
>>   
>> -	if (!(tun->dev->flags & IFF_UP))
>> +	if (!(tun->dev->flags & IFF_UP)) {
> Isn't this racy?  What if flag is cleared at this point?

I think you mean "set at this point"? Then yes, so we probably need to 
set the bit during tun_net_close().

Thanks

>> +		set_bit(SOCKWQ_ASYNC_NOSPACE, &tfile->socket.flags);
>>   		return -EIO;
>> +	}
>>   
>>   	if (!(tun->flags & IFF_NO_PI)) {
>>   		if (len < sizeof(pi))
>> -- 
>> 2.7.4
Jason Wang May 18, 2018, 2 p.m. UTC | #3
On 2018年05月18日 21:26, Jason Wang wrote:
>
>
> On 2018年05月18日 21:13, Michael S. Tsirkin wrote:
>> On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:
>>> We return -EIO on device down but can not raise EPOLLOUT after it was
>>> up. This may confuse user like vhost which expects tuntap to raise
>>> EPOLLOUT to re-enable its TX routine after tuntap is down. This could
>>> be easily reproduced by transmitting packets from VM while down and up
>>> the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
>>>
>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> ---
>>>   drivers/net/tun.c | 4 +++-
>>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>> index d45ac37..1b29761 100644
>>> --- a/drivers/net/tun.c
>>> +++ b/drivers/net/tun.c
>>> @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct tun_struct 
>>> *tun, struct tun_file *tfile,
>>>       int skb_xdp = 1;
>>>       bool frags = tun_napi_frags_enabled(tun);
>>>   -    if (!(tun->dev->flags & IFF_UP))
>>> +    if (!(tun->dev->flags & IFF_UP)) {
>> Isn't this racy?  What if flag is cleared at this point?
>
> I think you mean "set at this point"? Then yes, so we probably need to 
> set the bit during tun_net_close().
>
> Thanks 

Looks no need, vhost will poll socket after it see EIO. So we are ok here?

Thanks
Michael S. Tsirkin May 18, 2018, 2:06 p.m. UTC | #4
On Fri, May 18, 2018 at 10:00:31PM +0800, Jason Wang wrote:
> 
> 
> On 2018年05月18日 21:26, Jason Wang wrote:
> > 
> > 
> > On 2018年05月18日 21:13, Michael S. Tsirkin wrote:
> > > On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:
> > > > We return -EIO on device down but can not raise EPOLLOUT after it was
> > > > up. This may confuse user like vhost which expects tuntap to raise
> > > > EPOLLOUT to re-enable its TX routine after tuntap is down. This could
> > > > be easily reproduced by transmitting packets from VM while down and up
> > > > the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
> > > > 
> > > > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > >   drivers/net/tun.c | 4 +++-
> > > >   1 file changed, 3 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > > index d45ac37..1b29761 100644
> > > > --- a/drivers/net/tun.c
> > > > +++ b/drivers/net/tun.c
> > > > @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct
> > > > tun_struct *tun, struct tun_file *tfile,
> > > >       int skb_xdp = 1;
> > > >       bool frags = tun_napi_frags_enabled(tun);
> > > >   -    if (!(tun->dev->flags & IFF_UP))
> > > > +    if (!(tun->dev->flags & IFF_UP)) {
> > > Isn't this racy?  What if flag is cleared at this point?
> > 
> > I think you mean "set at this point"? Then yes, so we probably need to
> > set the bit during tun_net_close().
> > 
> > Thanks
> 
> Looks no need, vhost will poll socket after it see EIO. So we are ok here?
> 
> Thanks

In fact I don't even understand why does this help any longer.
Jason Wang May 18, 2018, 2:11 p.m. UTC | #5
On 2018年05月18日 22:06, Michael S. Tsirkin wrote:
> On Fri, May 18, 2018 at 10:00:31PM +0800, Jason Wang wrote:
>>
>> On 2018年05月18日 21:26, Jason Wang wrote:
>>>
>>> On 2018年05月18日 21:13, Michael S. Tsirkin wrote:
>>>> On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:
>>>>> We return -EIO on device down but can not raise EPOLLOUT after it was
>>>>> up. This may confuse user like vhost which expects tuntap to raise
>>>>> EPOLLOUT to re-enable its TX routine after tuntap is down. This could
>>>>> be easily reproduced by transmitting packets from VM while down and up
>>>>> the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
>>>>>
>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>>> Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> ---
>>>>>    drivers/net/tun.c | 4 +++-
>>>>>    1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>> index d45ac37..1b29761 100644
>>>>> --- a/drivers/net/tun.c
>>>>> +++ b/drivers/net/tun.c
>>>>> @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct
>>>>> tun_struct *tun, struct tun_file *tfile,
>>>>>        int skb_xdp = 1;
>>>>>        bool frags = tun_napi_frags_enabled(tun);
>>>>>    -    if (!(tun->dev->flags & IFF_UP))
>>>>> +    if (!(tun->dev->flags & IFF_UP)) {
>>>> Isn't this racy?  What if flag is cleared at this point?
>>> I think you mean "set at this point"? Then yes, so we probably need to
>>> set the bit during tun_net_close().
>>>
>>> Thanks
>> Looks no need, vhost will poll socket after it see EIO. So we are ok here?
>>
>> Thanks
> In fact I don't even understand why does this help any longer.
>

We disable tx polling and only enable it on demand for a better rx 
performance. You may want to have a look at :

commit feb8892cb441c742d4220cf7ced001e7fa070731
Author: Jason Wang <jasowang@redhat.com>
Date:   Mon Nov 13 11:45:34 2017 +0800

     vhost_net: conditionally enable tx polling

Thanks
Michael S. Tsirkin May 18, 2018, 2:46 p.m. UTC | #6
On Fri, May 18, 2018 at 10:11:54PM +0800, Jason Wang wrote:
> 
> 
> On 2018年05月18日 22:06, Michael S. Tsirkin wrote:
> > On Fri, May 18, 2018 at 10:00:31PM +0800, Jason Wang wrote:
> > > 
> > > On 2018年05月18日 21:26, Jason Wang wrote:
> > > > 
> > > > On 2018年05月18日 21:13, Michael S. Tsirkin wrote:
> > > > > On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:
> > > > > > We return -EIO on device down but can not raise EPOLLOUT after it was
> > > > > > up. This may confuse user like vhost which expects tuntap to raise
> > > > > > EPOLLOUT to re-enable its TX routine after tuntap is down. This could
> > > > > > be easily reproduced by transmitting packets from VM while down and up
> > > > > > the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
> > > > > > 
> > > > > > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > > > Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > ---
> > > > > >    drivers/net/tun.c | 4 +++-
> > > > > >    1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > > > > index d45ac37..1b29761 100644
> > > > > > --- a/drivers/net/tun.c
> > > > > > +++ b/drivers/net/tun.c
> > > > > > @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct
> > > > > > tun_struct *tun, struct tun_file *tfile,
> > > > > >        int skb_xdp = 1;
> > > > > >        bool frags = tun_napi_frags_enabled(tun);
> > > > > >    -    if (!(tun->dev->flags & IFF_UP))
> > > > > > +    if (!(tun->dev->flags & IFF_UP)) {
> > > > > Isn't this racy?  What if flag is cleared at this point?
> > > > I think you mean "set at this point"? Then yes, so we probably need to
> > > > set the bit during tun_net_close().
> > > > 
> > > > Thanks
> > > Looks no need, vhost will poll socket after it see EIO. So we are ok here?
> > > 
> > > Thanks
> > In fact I don't even understand why does this help any longer.
> > 
> 
> We disable tx polling and only enable it on demand for a better rx
> performance. You may want to have a look at :
> 
> commit feb8892cb441c742d4220cf7ced001e7fa070731
> Author: Jason Wang <jasowang@redhat.com>
> Date:   Mon Nov 13 11:45:34 2017 +0800
> 
>     vhost_net: conditionally enable tx polling
> 
> Thanks


Question is, what looks at SOCKWQ_ASYNC_NOSPACE.
I think it's tested when packet is transmitted,
but there is no guarantee here any packet will
ever be transmitted.
Jason Wang May 19, 2018, 1:09 a.m. UTC | #7
On 2018年05月18日 22:46, Michael S. Tsirkin wrote:
> On Fri, May 18, 2018 at 10:11:54PM +0800, Jason Wang wrote:
>>
>> On 2018年05月18日 22:06, Michael S. Tsirkin wrote:
>>> On Fri, May 18, 2018 at 10:00:31PM +0800, Jason Wang wrote:
>>>> On 2018年05月18日 21:26, Jason Wang wrote:
>>>>> On 2018年05月18日 21:13, Michael S. Tsirkin wrote:
>>>>>> On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:
>>>>>>> We return -EIO on device down but can not raise EPOLLOUT after it was
>>>>>>> up. This may confuse user like vhost which expects tuntap to raise
>>>>>>> EPOLLOUT to re-enable its TX routine after tuntap is down. This could
>>>>>>> be easily reproduced by transmitting packets from VM while down and up
>>>>>>> the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
>>>>>>>
>>>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>>>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>>>>> Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
>>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>>>> ---
>>>>>>>     drivers/net/tun.c | 4 +++-
>>>>>>>     1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>>>>>>> index d45ac37..1b29761 100644
>>>>>>> --- a/drivers/net/tun.c
>>>>>>> +++ b/drivers/net/tun.c
>>>>>>> @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct
>>>>>>> tun_struct *tun, struct tun_file *tfile,
>>>>>>>         int skb_xdp = 1;
>>>>>>>         bool frags = tun_napi_frags_enabled(tun);
>>>>>>>     -    if (!(tun->dev->flags & IFF_UP))
>>>>>>> +    if (!(tun->dev->flags & IFF_UP)) {
>>>>>> Isn't this racy?  What if flag is cleared at this point?
>>>>> I think you mean "set at this point"? Then yes, so we probably need to
>>>>> set the bit during tun_net_close().
>>>>>
>>>>> Thanks
>>>> Looks no need, vhost will poll socket after it see EIO. So we are ok here?
>>>>
>>>> Thanks
>>> In fact I don't even understand why does this help any longer.
>>>
>> We disable tx polling and only enable it on demand for a better rx
>> performance. You may want to have a look at :
>>
>> commit feb8892cb441c742d4220cf7ced001e7fa070731
>> Author: Jason Wang <jasowang@redhat.com>
>> Date:   Mon Nov 13 11:45:34 2017 +0800
>>
>>      vhost_net: conditionally enable tx polling
>>
>> Thanks
>
> Question is, what looks at SOCKWQ_ASYNC_NOSPACE.
> I think it's tested when packet is transmitted,
> but there is no guarantee here any packet will
> ever be transmitted.
>

Well, actually, I do plan to disable vq polling from the beginning. But 
looks like you do not want this:

See https://patchwork.kernel.org/patch/10034025/

Thanks
David Miller May 21, 2018, 3:47 p.m. UTC | #8
From: Jason Wang <jasowang@redhat.com>
Date: Fri, 18 May 2018 21:00:43 +0800

> We return -EIO on device down but can not raise EPOLLOUT after it was
> up. This may confuse user like vhost which expects tuntap to raise
> EPOLLOUT to re-enable its TX routine after tuntap is down. This could
> be easily reproduced by transmitting packets from VM while down and up
> the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
> 
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I'm no so sure what to do with this patch.

Like Michael says, this flag bit is only checks upon transmit which
may or may not happen after this point.  It doesn't seem to be
guaranteed.
Michael S. Tsirkin May 21, 2018, 10:06 p.m. UTC | #9
On Sat, May 19, 2018 at 09:09:11AM +0800, Jason Wang wrote:
> 
> 
> On 2018年05月18日 22:46, Michael S. Tsirkin wrote:
> > On Fri, May 18, 2018 at 10:11:54PM +0800, Jason Wang wrote:
> > > 
> > > On 2018年05月18日 22:06, Michael S. Tsirkin wrote:
> > > > On Fri, May 18, 2018 at 10:00:31PM +0800, Jason Wang wrote:
> > > > > On 2018年05月18日 21:26, Jason Wang wrote:
> > > > > > On 2018年05月18日 21:13, Michael S. Tsirkin wrote:
> > > > > > > On Fri, May 18, 2018 at 09:00:43PM +0800, Jason Wang wrote:
> > > > > > > > We return -EIO on device down but can not raise EPOLLOUT after it was
> > > > > > > > up. This may confuse user like vhost which expects tuntap to raise
> > > > > > > > EPOLLOUT to re-enable its TX routine after tuntap is down. This could
> > > > > > > > be easily reproduced by transmitting packets from VM while down and up
> > > > > > > > the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
> > > > > > > > 
> > > > > > > > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > > > > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > > > > > Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
> > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > > > ---
> > > > > > > >     drivers/net/tun.c | 4 +++-
> > > > > > > >     1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > > > > > > index d45ac37..1b29761 100644
> > > > > > > > --- a/drivers/net/tun.c
> > > > > > > > +++ b/drivers/net/tun.c
> > > > > > > > @@ -1734,8 +1734,10 @@ static ssize_t tun_get_user(struct
> > > > > > > > tun_struct *tun, struct tun_file *tfile,
> > > > > > > >         int skb_xdp = 1;
> > > > > > > >         bool frags = tun_napi_frags_enabled(tun);
> > > > > > > >     -    if (!(tun->dev->flags & IFF_UP))
> > > > > > > > +    if (!(tun->dev->flags & IFF_UP)) {
> > > > > > > Isn't this racy?  What if flag is cleared at this point?
> > > > > > I think you mean "set at this point"? Then yes, so we probably need to
> > > > > > set the bit during tun_net_close().
> > > > > > 
> > > > > > Thanks
> > > > > Looks no need, vhost will poll socket after it see EIO. So we are ok here?
> > > > > 
> > > > > Thanks
> > > > In fact I don't even understand why does this help any longer.
> > > > 
> > > We disable tx polling and only enable it on demand for a better rx
> > > performance. You may want to have a look at :
> > > 
> > > commit feb8892cb441c742d4220cf7ced001e7fa070731
> > > Author: Jason Wang <jasowang@redhat.com>
> > > Date:   Mon Nov 13 11:45:34 2017 +0800
> > > 
> > >      vhost_net: conditionally enable tx polling
> > > 
> > > Thanks
> > 
> > Question is, what looks at SOCKWQ_ASYNC_NOSPACE.
> > I think it's tested when packet is transmitted,
> > but there is no guarantee here any packet will
> > ever be transmitted.
> > 
> 
> Well, actually, I do plan to disable vq polling from the beginning. But
> looks like you do not want this:
> 
> See https://patchwork.kernel.org/patch/10034025/
> 
> Thanks

Not sure I understand what you are saying, it's enabling polling we are
talking about.
Michael S. Tsirkin May 21, 2018, 10:08 p.m. UTC | #10
On Mon, May 21, 2018 at 11:47:42AM -0400, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Fri, 18 May 2018 21:00:43 +0800
> 
> > We return -EIO on device down but can not raise EPOLLOUT after it was
> > up. This may confuse user like vhost which expects tuntap to raise
> > EPOLLOUT to re-enable its TX routine after tuntap is down. This could
> > be easily reproduced by transmitting packets from VM while down and up
> > the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
> > 
> > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> I'm no so sure what to do with this patch.
> 
> Like Michael says, this flag bit is only checks upon transmit which
> may or may not happen after this point.  It doesn't seem to be
> guaranteed.

Jason, can't we detect a link up transition and respond accordingly?
What do you think?
Jason Wang May 22, 2018, 3:22 a.m. UTC | #11
On 2018年05月22日 06:08, Michael S. Tsirkin wrote:
> On Mon, May 21, 2018 at 11:47:42AM -0400, David Miller wrote:
>> From: Jason Wang <jasowang@redhat.com>
>> Date: Fri, 18 May 2018 21:00:43 +0800
>>
>>> We return -EIO on device down but can not raise EPOLLOUT after it was
>>> up. This may confuse user like vhost which expects tuntap to raise
>>> EPOLLOUT to re-enable its TX routine after tuntap is down. This could
>>> be easily reproduced by transmitting packets from VM while down and up
>>> the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
>>>
>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> I'm no so sure what to do with this patch.
>>
>> Like Michael says, this flag bit is only checks upon transmit which
>> may or may not happen after this point.  It doesn't seem to be
>> guaranteed.

The flag is checked in tun_chr_poll() as well.

> Jason, can't we detect a link up transition and respond accordingly?
> What do you think?
>

I think we've already tried to do this, in tun_net_open() we call 
write_space(). But the problem is the bit may not be set at that time.

A second thought is to set the bit in tun_chr_poll() instead of -EIO like:

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d45ac37..46a1573 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1423,6 +1423,13 @@ static void tun_net_init(struct net_device *dev)
         dev->max_mtu = MAX_MTU - dev->hard_header_len;
  }

+static bool tun_sock_writeable(struct tun_struct *tun, struct tun_file 
*tfile)
+{
+       struct sock *sk = tfile->socket.sk;
+
+       return (tun->dev->flags & IFF_UP) && sock_writeable(sk);
+}
+
  /* Character device part */

  /* Poll */
@@ -1445,10 +1452,9 @@ static __poll_t tun_chr_poll(struct file *file, 
poll_table *wait)
         if (!ptr_ring_empty(&tfile->tx_ring))
                 mask |= EPOLLIN | EPOLLRDNORM;

-       if (tun->dev->flags & IFF_UP &&
-           (sock_writeable(sk) ||
-            (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, 
&sk->sk_socket->flags) &&
-             sock_writeable(sk))))
+       if (tun_sock_writeable(tun, tfile) ||
+           (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, 
&sk->sk_socket->flags) &&
+            tun_sock_writeable(tun, tfile)));
                 mask |= EPOLLOUT | EPOLLWRNORM;

         if (tun->dev->reg_state != NETREG_REGISTERED)

Does this make more sense?

Thanks
Michael S. Tsirkin May 22, 2018, 3:45 a.m. UTC | #12
On Tue, May 22, 2018 at 11:22:11AM +0800, Jason Wang wrote:
> 
> 
> On 2018年05月22日 06:08, Michael S. Tsirkin wrote:
> > On Mon, May 21, 2018 at 11:47:42AM -0400, David Miller wrote:
> > > From: Jason Wang <jasowang@redhat.com>
> > > Date: Fri, 18 May 2018 21:00:43 +0800
> > > 
> > > > We return -EIO on device down but can not raise EPOLLOUT after it was
> > > > up. This may confuse user like vhost which expects tuntap to raise
> > > > EPOLLOUT to re-enable its TX routine after tuntap is down. This could
> > > > be easily reproduced by transmitting packets from VM while down and up
> > > > the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
> > > > 
> > > > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > I'm no so sure what to do with this patch.
> > > 
> > > Like Michael says, this flag bit is only checks upon transmit which
> > > may or may not happen after this point.  It doesn't seem to be
> > > guaranteed.
> 
> The flag is checked in tun_chr_poll() as well.
> 
> > Jason, can't we detect a link up transition and respond accordingly?
> > What do you think?
> > 
> 
> I think we've already tried to do this, in tun_net_open() we call
> write_space(). But the problem is the bit may not be set at that time.

Which bit? __dev_change_flags seems to set IFF_UP before calling
ndo_open. The issue  I think is that tun_sock_write_space
exits if SOCKWQ_ASYNC_NOSPACE is clear.

And now I think I understand what is going on:

	When link is down, writes to the device might fail with -EIO.
	Userspace needs an indication when the status is resolved.  As a fix,
	tun_net_open attempts to wake up writers - but that is only effective if
	SOCKWQ_ASYNC_NOSPACE has been set in the past.  As a quick hack, set
	SOCKWQ_ASYNC_NOSPACE when write fails because of the link down status.
	If no writes failed, userspace does not know that interface
	was down so should not care that it's going up.


does this describe what this line of code does?
If yes feel free to include this info in a code comment and commit log.



> A second thought is to set the bit in tun_chr_poll() instead of -EIO like:
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index d45ac37..46a1573 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1423,6 +1423,13 @@ static void tun_net_init(struct net_device *dev)
>         dev->max_mtu = MAX_MTU - dev->hard_header_len;
>  }
> 
> +static bool tun_sock_writeable(struct tun_struct *tun, struct tun_file
> *tfile)
> +{
> +       struct sock *sk = tfile->socket.sk;
> +
> +       return (tun->dev->flags & IFF_UP) && sock_writeable(sk);
> +}
> +
>  /* Character device part */
> 
>  /* Poll */
> @@ -1445,10 +1452,9 @@ static __poll_t tun_chr_poll(struct file *file,
> poll_table *wait)
>         if (!ptr_ring_empty(&tfile->tx_ring))
>                 mask |= EPOLLIN | EPOLLRDNORM;
> 
> -       if (tun->dev->flags & IFF_UP &&
> -           (sock_writeable(sk) ||
> -            (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags)
> &&
> -             sock_writeable(sk))))
> +       if (tun_sock_writeable(tun, tfile) ||
> +           (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags)
> &&
> +            tun_sock_writeable(tun, tfile)));
>                 mask |= EPOLLOUT | EPOLLWRNORM;
> 
>         if (tun->dev->reg_state != NETREG_REGISTERED)
> 
> Does this make more sense?
> 
> Thanks
Michael S. Tsirkin May 22, 2018, 3:46 a.m. UTC | #13
On Tue, May 22, 2018 at 11:22:11AM +0800, Jason Wang wrote:
> 
> 
> On 2018年05月22日 06:08, Michael S. Tsirkin wrote:
> > On Mon, May 21, 2018 at 11:47:42AM -0400, David Miller wrote:
> > > From: Jason Wang <jasowang@redhat.com>
> > > Date: Fri, 18 May 2018 21:00:43 +0800
> > > 
> > > > We return -EIO on device down but can not raise EPOLLOUT after it was
> > > > up. This may confuse user like vhost which expects tuntap to raise
> > > > EPOLLOUT to re-enable its TX routine after tuntap is down. This could
> > > > be easily reproduced by transmitting packets from VM while down and up
> > > > the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
> > > > 
> > > > Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > > > Cc: Eric Dumazet <edumazet@google.com>
> > > > Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > I'm no so sure what to do with this patch.
> > > 
> > > Like Michael says, this flag bit is only checks upon transmit which
> > > may or may not happen after this point.  It doesn't seem to be
> > > guaranteed.
> 
> The flag is checked in tun_chr_poll() as well.
> 
> > Jason, can't we detect a link up transition and respond accordingly?
> > What do you think?
> > 
> 
> I think we've already tried to do this, in tun_net_open() we call
> write_space(). But the problem is the bit may not be set at that time.
> 
> A second thought is to set the bit in tun_chr_poll() instead of -EIO like:
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index d45ac37..46a1573 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1423,6 +1423,13 @@ static void tun_net_init(struct net_device *dev)
>         dev->max_mtu = MAX_MTU - dev->hard_header_len;
>  }
> 
> +static bool tun_sock_writeable(struct tun_struct *tun, struct tun_file
> *tfile)
> +{
> +       struct sock *sk = tfile->socket.sk;
> +
> +       return (tun->dev->flags & IFF_UP) && sock_writeable(sk);
> +}
> +
>  /* Character device part */
> 
>  /* Poll */
> @@ -1445,10 +1452,9 @@ static __poll_t tun_chr_poll(struct file *file,
> poll_table *wait)
>         if (!ptr_ring_empty(&tfile->tx_ring))
>                 mask |= EPOLLIN | EPOLLRDNORM;
> 
> -       if (tun->dev->flags & IFF_UP &&
> -           (sock_writeable(sk) ||
> -            (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags)
> &&
> -             sock_writeable(sk))))
> +       if (tun_sock_writeable(tun, tfile) ||
> +           (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags)
> &&
> +            tun_sock_writeable(tun, tfile)));
>                 mask |= EPOLLOUT | EPOLLWRNORM;
> 
>         if (tun->dev->reg_state != NETREG_REGISTERED)
> 
> Does this make more sense?
> 
> Thanks

I just understood the motivation for doing it on EIO.
Maybe there's a reason it makes sense here as well,
but it's far from obvious. I suggest you repost adding
an explanation in the comment. The original patch will
be fine with an explanation as well.
Jason Wang May 22, 2018, 4 a.m. UTC | #14
On 2018年05月22日 11:46, Michael S. Tsirkin wrote:
> On Tue, May 22, 2018 at 11:22:11AM +0800, Jason Wang wrote:
>>
>> On 2018年05月22日 06:08, Michael S. Tsirkin wrote:
>>> On Mon, May 21, 2018 at 11:47:42AM -0400, David Miller wrote:
>>>> From: Jason Wang <jasowang@redhat.com>
>>>> Date: Fri, 18 May 2018 21:00:43 +0800
>>>>
>>>>> We return -EIO on device down but can not raise EPOLLOUT after it was
>>>>> up. This may confuse user like vhost which expects tuntap to raise
>>>>> EPOLLOUT to re-enable its TX routine after tuntap is down. This could
>>>>> be easily reproduced by transmitting packets from VM while down and up
>>>>> the tap device. Fixing this by set SOCKWQ_ASYNC_NOSPACE on -EIO.
>>>>>
>>>>> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
>>>>> Cc: Eric Dumazet <edumazet@google.com>
>>>>> Fixes: 1bd4978a88ac2 ("tun: honor IFF_UP in tun_get_user()")
>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> I'm no so sure what to do with this patch.
>>>>
>>>> Like Michael says, this flag bit is only checks upon transmit which
>>>> may or may not happen after this point.  It doesn't seem to be
>>>> guaranteed.
>> The flag is checked in tun_chr_poll() as well.
>>
>>> Jason, can't we detect a link up transition and respond accordingly?
>>> What do you think?
>>>
>> I think we've already tried to do this, in tun_net_open() we call
>> write_space(). But the problem is the bit may not be set at that time.
>>
>> A second thought is to set the bit in tun_chr_poll() instead of -EIO like:
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index d45ac37..46a1573 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1423,6 +1423,13 @@ static void tun_net_init(struct net_device *dev)
>>          dev->max_mtu = MAX_MTU - dev->hard_header_len;
>>   }
>>
>> +static bool tun_sock_writeable(struct tun_struct *tun, struct tun_file
>> *tfile)
>> +{
>> +       struct sock *sk = tfile->socket.sk;
>> +
>> +       return (tun->dev->flags & IFF_UP) && sock_writeable(sk);
>> +}
>> +
>>   /* Character device part */
>>
>>   /* Poll */
>> @@ -1445,10 +1452,9 @@ static __poll_t tun_chr_poll(struct file *file,
>> poll_table *wait)
>>          if (!ptr_ring_empty(&tfile->tx_ring))
>>                  mask |= EPOLLIN | EPOLLRDNORM;
>>
>> -       if (tun->dev->flags & IFF_UP &&
>> -           (sock_writeable(sk) ||
>> -            (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags)
>> &&
>> -             sock_writeable(sk))))
>> +       if (tun_sock_writeable(tun, tfile) ||
>> +           (!test_and_set_bit(SOCKWQ_ASYNC_NOSPACE, &sk->sk_socket->flags)
>> &&
>> +            tun_sock_writeable(tun, tfile)));
>>                  mask |= EPOLLOUT | EPOLLWRNORM;
>>
>>          if (tun->dev->reg_state != NETREG_REGISTERED)
>>
>> Does this make more sense?
>>
>> Thanks
> I just understood the motivation for doing it on EIO.
> Maybe there's a reason it makes sense here as well,
> but it's far from obvious. I suggest you repost adding
> an explanation in the comment. The original patch will
> be fine with an explanation as well.
>

Ok, let me add explanation on both code and commit log.

Thanks
diff mbox series

Patch

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d45ac37..1b29761 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1734,8 +1734,10 @@  static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 	int skb_xdp = 1;
 	bool frags = tun_napi_frags_enabled(tun);
 
-	if (!(tun->dev->flags & IFF_UP))
+	if (!(tun->dev->flags & IFF_UP)) {
+		set_bit(SOCKWQ_ASYNC_NOSPACE, &tfile->socket.flags);
 		return -EIO;
+	}
 
 	if (!(tun->flags & IFF_NO_PI)) {
 		if (len < sizeof(pi))